Ever got annoyed when you want a list of items from another sequence but had to write a for/while loop? Did you know you could have done it in one line? I am not just taking about list, you can do this trick with any standard sequence like dictionary, set and any custom generators.
Executable Codesandbox…
Comprehension, you say?
Comprehension in python is just a syntactic sugar for a loop and .append() statement.
Let’s say we have database query of blog posts with attribute of publish_at, by design this is a way this system supports scheduled blog post and if current datetime is latest than this, then response will contain filtered blog post items.
from datetime.datetime import now
blogs_filtered = [
(blog.title(), blog.category()) # what to save
for blog in blogs # what to iterate
if blog.publish_at <= now() # any filtering condiitonal
]
As you can see, we iterate over query results, blogs
and if publish_at < now()
i.e. if publish_at is latest than now()
we save the query’s title
& category
. If we didn’t use comprehensions, then the snippet would be like,
from datetime.datetime import now
blogs_filtered = []
for blog in blogs: # what to iterate
if blog.publish_at <= now(): # any filtering condiitonal
blogs_filtered.append(
(blog.title(), blog.category()) # what to save
)
See? Comprehensions saves you lines, which means less byte-code to process thus comparatively faster execution; less mental power to comprehend what the snippet does and also can be written in one line 😎.
Not just list but dicts also. E.g., we have 2 lists one should be key and other should be its corresponding value, given that the position of these items are already in place.
keys = [...] # some keys
vals = [...] # some values
assert(len(keys) == len(vals)) # just in case
# dict comprehension
dict_needed = {k:v for k,v in zip(keys,vals)}
If not for comprehensions, then you would have gone like this:
keys = [...] # some keys
vals = [...] # some values
assert(len(keys) == len(vals)) # just in case
# already seen for loop, thus let's use while loop
i = len(keys)
dict_needed = {}
while i:
dict_needed[keys[i]] = vals[i]
i -= 1
Which one is concise? Let me know in the comments. Let’s examine the syntax.
The Syntax
List Comprehension
list_out = [element for element in sequence if condition]
Here, sequence
can be any generator or iterator and element
is each element in the sequence
; and it is committed to the output is condition
evaluates to True
.
src/list-comp.py from the codesandbox.
Dictionary Comprehension
dict_out = {k:v for k,v in dict if condition}
Same as list comprehension but using tuple unpacking, we can separate the key and value of each record and operate on them; conditional block operates the same though.
src/dict-comp.py from the codesandbox.
Set Comprehension
set_out = {i for i in sequence if condition}
Looks like combination of list and dictionary, eh? Functionality is the same.
So
Comprehension in a nutshell being: -
operation on
el
,an iterator lexemes,
for el in els
,a conditonal,
if condition
, that filters out commition ofel
based on returned boolean (optional),surrounded by either
[]
or{}
to indicate what datatype to produce.
Complex Patterns
Creating hashmap of titles and their category
Assume, we have list of Blog
s whose title
have to be indexed according to corresponding category
. If Blog
is defined as…
class Blog:
title: str
category: str
# other attributes
def __init__(self, t, c):
self.title = t
self.category = c
def __repr__(self):
return self.title.lower()
And let’s create a helper function to generate n no. of blogs.
def _gen_blogs(n):
from random import randint
return [Blog('blog title'+str(i), str(randint(0,100))) for i in range(1,n+1)]
You can clearly tell that list comprehension is used. This is a fairly good example when a syntactic sugar being more of a pain in the finger. When I am faced with such a gibberish I used to format it in 3+ lines based of the abstract lexemes, see code-block-1.
def _gen_blogs(n):
from random import randint
return [
Blog(
'blog title'+str(i), str( randint(0,100) )
) # what to commit
for i in range(1,n+1) # what to iterate
# no conditional
]
Right away a bad practice of defining import statement in the local function. Then readability being shot down by comprehensions.
If we disect the return statement, we are iterating to arithmetic progression of common difference 1, i.e. 1,2,3,...,n-2,n-1,n
. Each iteration i
takes each value. Then for each iteration, a Blog
instance is created with title of blog title {i}
, e.g. for 3rd iteration title will be blog title 3
; And each instance will have the category of a pseudo-random integer generated in between 0 and 100 (exclusive).
Then generation of map will be…
def map_title_to_category():
blogs = _gen_blogs(4) # generate 4 blogs, play with 4 if you want 😁
out = {blog.title:blog.category for blog in blogs} # mapping
map_result(blogs, out)
Try to understand the block,
out = {blog.title:blog.category for blog in blogs}
Hope you can, it ain’t much. If you are stuck, use the comments, I will reply.
You can actually chain them.
def alpha_pos():
m = {a:p for a,p in zip([chr(i) for i in range(97,123)], [p for p in range(1, 27)]) if a not in "a.e.i.o.u".split(".")}
map_result([[chr(i) for i in range(97,123)], [p for p in range(1, 27)]], m) # just for pretty output
src/chain-comp.py
It is the chain that makes it hard to read, not the logic. Try to interpret on your own, and if you need help, I am just a comment away. Hint: make each comprehensions a separate variable and analyze.
With a pinch of salt & a punch of lime
To be efficient, let’s save user configuration in bitmask instead of Json etc.
If you think JSON is better way to save data, the watch this short video.
By design we are saving 4 toggles (but you can go wild) with 4 bits, each being,
dark-mode data-saver-on auto-play auto-update
1|0 1|0 1|0 1|0
Altogether there are 16 permutations. If we are doing analysis, where we need to decide whether we keep data-saver algorithm, you are tasked with finding no of users, you can go about doing this as,
users = [] # our specimen
num = len([u for u in users if u.conf | 0b0100]) # extremly simplified for brevity
In another scenario, say we are changing our database design and decided to not let users have period .
in the end of the username. To save the day and show dominance let’s use a one-liner even though we could have gone with readable script of poly-liner 😎.
# Lazy loading the billion users
def get_users():
i = 0
while i < 3 * 10**12:
# this is mapped in the data presentation layer,
# not that we are indexing off of a list
yield User[i]
i += 1
for user in get_users():
user.uname = user.uname if user.uname[-1] != '.' else user.uname[:-1]
user.commit() # or whatever
Can be…
_ = map(lambda u: u.commit(), [lambda u: u.uname[:-1] if u.uname[-1] == '.' else u.uname for user in (User[i] for i in range(3 * 10**12 + 1))])
…like this.
now show dominance 😎.
Epilogue
That’s all I can think of. If you are intrigued and interested, make sure you follow me on Medium or Twitter for follow up.
See you in another one, till then it’s me the BE, signing off 👋.