Sunday, October 16, 2022

DE Series- 4

How to write Efficient code in python?


As a part of Data Engineering Series, we have already covered part-1(Data Engineering-Introduction) ,part-2(Basic Python) and part-3 (Advance Python). As a continuity of previous post on Advance python, we are going to see how to writing efficient code in python in this post

In python, Enumerate is used to write efficient python code. Many a times we need to keep a count of iterations. Python’s enumerate takes a collection i.e iterable, adds counter to it and returns it as an enumerate object

Syntax :

enumerate(iterable, start=0)

Implementation —

"""
Enumerate : Use enumerate() function : Python’s enumerate takes a collection i.e iterable, adds counter to it and returns it as an enumerate object.
"""
countries = ['USA','Canada','Singapore','Taiwan']
enum_countries = enumerate(countries)
enumerate_countries = enumerate(countries,5)
print(list(enumerate_countries))
print(type(enumerate_countries))

Output —

[(5, 'USA'), (6, 'Canada'), (7, 'Singapore'), (8, 'Taiwan')]
<class 'enumerate'>

Implementation 2 —

countries = ['USA','Canada','Singapore','Taiwan']
for i,item in enumerate(countries):
print(i,item)

Output —

0 USA
1 Canada
2 Singapore
3 Taiwan

In python, Zip takes one or more iterables(list,tuples etc) and aggregates them into tuple and returns the iterator object

Syntax :

zip(*iterators)

Implementation —

# Use Zip : Zip takes one or more iterables and aggregates them into # tuple and returns the iterator objectname = ["Steve","Paul","Brad"]
roll_no = [4,1,3]
marks = [20,40,50]
mapped = zip(name,roll_no,marks)
mapped = set(mapped)
print(mapped)

Output —

{('Brad', 3, 50), ('Steve', 4, 20), ('Paul', 1, 40)}

To make code work faster use builtin functions and libraries like map() which applies a function to every member of iterable sequence and returns the result.

Implementation —

"""
Map function : In Python, map() function applies the given function #to each item of a given iterable construct (i.e lists, tuples etc) and returns a map object.
"""
numbers =(100,200,300)
result = map(lambda x:x+x,numbers)
total = list(result)
print(total)

Output —

[200, 400, 600]

NumPy arrays are homogeneous and provide a fast and memory efficient alternative to Python lists.NumPy arrays vectorization technique, vectorize operations so they are performed on all elements of an object at once which allows the programmer to efficiently perform calculations over entire arrays.

Implementation —

import numpy as np
def reciprocals(values):
output = np.empty(len(values))
for i in range(len(values)):
output[i] = 1.0/values[i]
return output
values = np.random.randint(1,15,size=6)
reciprocals(values)

Output —

array([0.25      , 0.5       , 0.1       , 0.16666667, 0.14285714,
0.07142857])

To swap the variables, use multiple assignment

Implementation —

# Use multiple assignmentf_name,l_name,city = "Steve","Paul","NewYork"print(f_name,l_name,city)#To swap variablea = 5 
b = 10
a,b = b,a
print(a,b)

Output —

Steve Paul NewYork
10 5

Use Comprehensions

Implementation —

#List Comprehensionlist_two = [5,10,15,20,20,40,50,60]
new_list = [x**3 for x in list_two]
print(new_list)
#Dictionary Comprehensiondict_one = [1,2,3,4]
new_dict = {x:x**2 for x in dict_one if x%2 ==0}
print(new_dict)

Output —

[125, 1000, 3375, 8000, 8000, 64000, 125000, 216000]
{2: 4, 4: 16}

Membership : To check if membership of a list, it’s generally faster to use the “in” keyword

Implementation —

days = ["sunday","monday","tuesday"]
for d in days:
print('Today is {}'.format(d))
print('tuesday' in days)
print('friday' in days)

Output —

Today is sunday
Today is monday
Today is tuesday
True
False

Counter : Counter is one of the high performance container data types

Implementation —

from collections import Counter
sample_dict = {'a':4,'b':8,'c':2}
print(Counter(sample_dict))

Output —

Counter({'b': 8, 'a': 4, 'c': 2})

Python Itertools are fast, memory efficient functions — a collection of constructs for handling iterators.

Implementation —

import itertools
for i in itertools.count(30,4):
print(i)
if i>30:
break

Output —

30
34

Implementation 2 —

import itertools
countries =[("West","USA"), ("East","Singapore"),("West","Canada"),("East","Taiwan")]
iter_one = itertools.groupby(countries,lambda x:x[0])
for key,group in iter_one:
result = {key:list(group)}
print(result)

Output —

{'West': [('West', 'USA')]}
{'East': [('East', 'Singapore')]}
{'West': [('West', 'Canada')]}
{'East': [('East', 'Taiwan')]}

Use sets to remove duplicates

Implementation —

s1 = {1,2,4,6,0,3,2,1,7,4,3}
s1.add(10)
s1.update([12,13])
print(s1)

Output —

{0, 1, 2, 3, 4, 6, 7, 10, 12, 13}

Use Generators

Range ( range()) uses lazy evaluation, so instead of range() use xrange() which returns the generator object

Implementation —

def test_sequence():
num = 0
while num<10:
yield num
num+=1

for i in test_sequence():
print(i,end=",")

Output —

0,1,2,3,4,5,6,7,8,9,

Practice writing idiomatic code as it will make your code run faster

Examine Runtime of your code snippet

Implementation —

%timeit ('x=3; L=[x**n for n in range(20)]')

Output —

12.9 ns ± 0.894 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

No comments:

Post a Comment

Spark- Window Function

  Window functions in Spark ================================================ -> Spark Window functions operate on a group of rows like pa...