Lately I came across a problem for which I had to group objects of a single list by the value of a property of these objects. After trying some things I settled on the groupby
function from the itertools
module.
Lets consider the following example, we have a Letter class with two properties: a string to specify the character and a boolean to specify wether the letter is a vowel or not. We want to group Letter objects based on their vowel
property.
This class looks as
the following, the __repr__
is just there to have a neat string representation when we later print out the objects.
class Letter:
def __init__[self, char: str, vowel: bool] -> None:
self.char = char
self.vowel = vowel
def __repr__[self] -> str:
return self.char
Next we create a list with Letter objects. Not the entire alphabet, the first couple letters will do.
letters = [
Letter['a', True],
Letter['b', False],
Letter['c', False],
Letter['d', False],
Letter['e', True],
]
Now we group our letters
list containing Letter
objects based on the value of their .vowel
property. This grouping is done by the key function we pass to groupby
as a second argument, the list itself being the first argument.
from itertools import groupby
sorted_letters = sorted[letters, key=lambda letter: letter.vowel]
grouped = [list[result] for key, result in groupby[
sorted_letters, key=lambda letter: letter.vowel]]
print[grouped]
# => [[b, c, d], [a, e]]
One gotcha
is that the list, or any other iterable, needs to be sorted before passing it to the groupby
function. Otherwise your groups end up with a segmented result. To illustrate see the following example where the sorting is commented out. As you see the groupby
makes the groups consecutively.
from itertools import groupby
# letters = sorted[letters, key=lambda letter: letter.vowel]
grouped = [list[result] for key, result in groupby[
letters, key=lambda letter: letter.vowel]]
print[grouped]
# => [[a], [b, c, d], [e]]
Our entire code sample looks like:
from itertools import groupby
class Letter:
def __init__[self, char: str, vowel: bool] -> None:
self.char = char
self.vowel = vowel
def __repr__[self] -> str:
return self.char
letters = [
Letter['a', True],
Letter['b', False],
Letter['c', False],
Letter['d', False],
Letter['e', True],
]
sorted_letters = sorted[letters, key=lambda letter: letter.vowel]
grouped = [list[result] for key, result in groupby[
sorted_letters, key=lambda letter: letter.vowel]]
print[grouped]
# => [[b, c, d], [a, e]]