How to write Efficient code in python?
As a part of Data Engineering Series, we have already covered part-1(Data Engineering-Introduction) ,part-2(Basic Python) and part-3 (Advance Python). As a continuity of previous post on Advance python, we are going to see how to writing efficient code in python in this post
In python, Enumerate is used to write efficient python code. Many a times we need to keep a count of iterations. Python’s enumerate takes a collection i.e iterable, adds counter to it and returns it as an enumerate object
Syntax :
enumerate(iterable, start=0)
Implementation —
"""
Enumerate : Use enumerate() function : Python’s enumerate takes a collection i.e iterable, adds counter to it and returns it as an enumerate object.
"""countries = ['USA','Canada','Singapore','Taiwan']
enum_countries = enumerate(countries)enumerate_countries = enumerate(countries,5)
print(list(enumerate_countries))
print(type(enumerate_countries))
Output —
[(5, 'USA'), (6, 'Canada'), (7, 'Singapore'), (8, 'Taiwan')]
<class 'enumerate'>
Implementation 2 —
countries = ['USA','Canada','Singapore','Taiwan']
for i,item in enumerate(countries):
print(i,item)
Output —
0 USA
1 Canada
2 Singapore
3 Taiwan
In python, Zip takes one or more iterables(list,tuples etc) and aggregates them into tuple and returns the iterator object
Syntax :
zip(*iterators)
Implementation —
# Use Zip : Zip takes one or more iterables and aggregates them into # tuple and returns the iterator objectname = ["Steve","Paul","Brad"]
roll_no = [4,1,3]
marks = [20,40,50]mapped = zip(name,roll_no,marks)
mapped = set(mapped)
print(mapped)
Output —
{('Brad', 3, 50), ('Steve', 4, 20), ('Paul', 1, 40)}
To make code work faster use builtin functions and libraries like map() which applies a function to every member of iterable sequence and returns the result.
Implementation —
"""
Map function : In Python, map() function applies the given function #to each item of a given iterable construct (i.e lists, tuples etc) and returns a map object.
"""numbers =(100,200,300)
result = map(lambda x:x+x,numbers)
total = list(result)
print(total)
Output —
[200, 400, 600]
NumPy arrays are homogeneous and provide a fast and memory efficient alternative to Python lists.NumPy arrays vectorization technique, vectorize operations so they are performed on all elements of an object at once which allows the programmer to efficiently perform calculations over entire arrays.
Implementation —
import numpy as np
def reciprocals(values):
output = np.empty(len(values))
for i in range(len(values)):
output[i] = 1.0/values[i]
return outputvalues = np.random.randint(1,15,size=6)
reciprocals(values)
Output —
array([0.25 , 0.5 , 0.1 , 0.16666667, 0.14285714,
0.07142857])
To swap the variables, use multiple assignment
Implementation —
# Use multiple assignmentf_name,l_name,city = "Steve","Paul","NewYork"print(f_name,l_name,city)#To swap variablea = 5
b = 10a,b = b,a
print(a,b)
Output —
Steve Paul NewYork
10 5
Use Comprehensions
Implementation —
#List Comprehensionlist_two = [5,10,15,20,20,40,50,60]
new_list = [x**3 for x in list_two]
print(new_list)#Dictionary Comprehensiondict_one = [1,2,3,4]
new_dict = {x:x**2 for x in dict_one if x%2 ==0}
print(new_dict)
Output —
[125, 1000, 3375, 8000, 8000, 64000, 125000, 216000]
{2: 4, 4: 16}
Membership : To check if membership of a list, it’s generally faster to use the “in” keyword
Implementation —
days = ["sunday","monday","tuesday"]
for d in days:
print('Today is {}'.format(d))
print('tuesday' in days)
print('friday' in days)
Output —
Today is sunday
Today is monday
Today is tuesday
True
False
Counter : Counter is one of the high performance container data types
Implementation —
from collections import Counter
sample_dict = {'a':4,'b':8,'c':2}
print(Counter(sample_dict))
Output —
Counter({'b': 8, 'a': 4, 'c': 2})
Python Itertools are fast, memory efficient functions — a collection of constructs for handling iterators.
Implementation —
import itertools
for i in itertools.count(30,4):
print(i)
if i>30:
break
Output —
30
34
Implementation 2 —
import itertools
countries =[("West","USA"), ("East","Singapore"),("West","Canada"),("East","Taiwan")]iter_one = itertools.groupby(countries,lambda x:x[0])
for key,group in iter_one:
result = {key:list(group)}
print(result)
Output —
{'West': [('West', 'USA')]}
{'East': [('East', 'Singapore')]}
{'West': [('West', 'Canada')]}
{'East': [('East', 'Taiwan')]}
Use sets to remove duplicates
Implementation —
s1 = {1,2,4,6,0,3,2,1,7,4,3}
s1.add(10)
s1.update([12,13])
print(s1)
Output —
{0, 1, 2, 3, 4, 6, 7, 10, 12, 13}
Use Generators
Range ( range()) uses lazy evaluation, so instead of range() use xrange() which returns the generator object
Implementation —
def test_sequence():
num = 0
while num<10:
yield num
num+=1
for i in test_sequence():
print(i,end=",")
Output —
0,1,2,3,4,5,6,7,8,9,
Practice writing idiomatic code as it will make your code run faster
Examine Runtime of your code snippet
Implementation —
%timeit ('x=3; L=[x**n for n in range(20)]')
Output —
12.9 ns ± 0.894 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
No comments:
Post a Comment