By: Mark Mruss
Iterators, iterables, and generators are features handled so wall by Python that people programming in other languages cannot help but drool over. Fortunately for us, creating iterators, iterables and generators is a relatively simple task. This article introduces the concepts of iterators, iterables, and generators and illustrates how easy it is to add them to your code.
- Iteration in Python
- An Initial Example
- Creating An Iterator
- Looking More Closely At The Iterator
- The Upside And Downside Of Iterators
- Looking Closely At The Generator
- But What About Iterables?
- Creating An Iterable Object
In this article I’m going to introduce three related Python features: iterators, iterables, and generators. Generators are easy to define, they are functions that create and return an iterator. Iterators and iterables on the other hand, are easier to use than they are to define. An iterable object is a “container object capable of returning its members one at a time.” An iterator object is “An object representing a stream of data. Repeated calls to the iterator’s next() method return successive items in the stream. When no more data is available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again.” You can think of the difference between the two in this way: an iterable object can be iterated over multiple times, whereas an iterator object can only be iterated over once. In general an iterable produces an iterator every time something wants to iterate over its data.
Note: Classes that define the
__getitem__ function are also considered iterables, but since that falls outside the scope of this article, it will not be covered here.
In this tutorial, I will begin by discussing iterators, the most basic concept. Then I will move onto generators, and finish by discussing iterables, the most wide open topic of the three.
Iterators objects are used in Python in order to iterate over an objects data. For example, we all know how to do this in Python when we work with lists:
my_list = [1,2,3] for num in my_list: print num
This code will iterate over the list object
my_list and print out all of the list items , i.e., the numbers 1, 2, and 3. Iterating over sequences in this simple and transparent manner happens to be one of my favourite features of Python.
According to our definition above, lists are iterables since you can iterate over them multiple times. In fact, each time you iterate over a list you are actually using a listiterator iterator object produced by the list.
It may not be immediately clear to you when you should add an iteration support to a class. However, the more you work with Python the more you’ll find instances when doing just this is very useful, sometimes the only advantage is cleaner looking code. One nice thing about iterators (and generators too) is that the processing for each item happens as you need it. Instead of collecting all of the data into a list and then running through the list, you will collect each item as you need it. This might not seem like a large difference but imagine if there were tens of thousands of items to process? What if you were collection your data from an online source? Performing all of your processing up front may take a very long time, especially if you only wanted the first few items.
In order to explain iteration further, let’s look at a simple example task where we might use iterators. For this example we will create a class that takes a string of characters as input and then converts each character into its byte value. If we were NOT going to use iterators we might do something like what is found in Listing 1.
class ByteValue(object): def __init__(self, data): self.data = data def to_bytes(self): bytes =  for char in self.data: bytes.append(ord(char)) return bytes
This code is pretty simple. We have a data member, named
data, that we use to store the string that was used to initialize the class. In the
to_bytes function we loop through the string, converting each character to its byte value using the built in
ord function. We store each byte value in a list and once we have collected all of the values we return that list.
When we run the following:
bv = ByteValue("abcdef") for byte in bv.to_bytes(): print byte
we would get this as our output:
97 98 99 100 101 102
Let’s convert this into an iterator. Making your class into an iterator requires adding “two methods, which together form the iterator protocol.” The two functions needed are: 1) the
__iter__ function; and, 2) the
next function. The
__iter__ function will return the object itself, while the
next function will return the next item. The
next function is where the actual iteration work occurs. The
next function iterates by returning the next item in the “sequence” each time it is called for as long as there is a “next” item. When there are no more items to iterate over, the
next function must raise the
StopIteration exception to halt the iteration.
To be clear, in order to make your class an iterator you need to do two things:
1) Add an
__iter__ function that returns the object itself (
2) Add a
next function that returns the next item in the sequence each time it is called. When there are no more items in the sequence, the
next function raises a
StopIteration exception signal the end of the iteration.
For those of you still confused, the following example will help illustrate how iterators work. If we were to convert our ByteValue class into an iterator object, it might look something like Listing 2.
class ByteValue(object): def __init__(self, data): self.data = data self.current_item = 0 def __iter__(self): return self def next(self): if (self.current_item == len(self.data)): raise StopIteration else: byte_value = ord(self.data[self.current_item]) self.current_item += 1 return byte_value
Let’s compare the code in Listings 1 and Listing 2 in detail, focusing on the iterator in Listing 2. The first difference is the addition of the data member
current_item to the class, initialized in the
current_item serves as a counter and keeps track of the current character in the string while we iterate over it. The counter must have class scope since the iterator works through successive calls to the
next function. If
current_item were local to the
next function, its value would be reset with each subsequent call, and would not be of much use.
The second difference between the listings is the addition in Listing 2, of the
__iter__ function where we return
The final addition to Listing 2 is the
next function, where we first check to see if
current_item is equal to the length of our string. If
current_item is equal to the strings length we raise the
StopIteration exception to signal the end of the iteration because we have no more characters left to iterate over. If there are more characters to iterate over we calculate the byte value of the current character, increase our
current_item counter, and then return the byte value.
Note: Notice that
current_item is only initialized when the ByteValue object is created. This happens because according to the iteration protocol, “once an iterator’s next() method raises StopIteration, it will continue to do so on subsequent calls.” If we were to re-initialize
current_item we would then be able to iterate over the iterator more than once breaking the iteration protocol.
Now that we have converted our class into an iterator we can use it as follows:
for byte in ByteValue("abcdef"): print byte
Doing so would result in:
97 98 99 100 101 102
Notice that we do not store the instance of our ByteValue class in a variable. Doing so would be useless because since ByteValue is an iterator it is only good for one pass of the data. If ByteValue were an iterable (returning an iterator object when
__iter__ was called) it would make sense to keep an instance around because we could iterate over the instance more than once. We will look at creating iterables later on in this article.
Let’s look at what is happening in more detail by examining what is happening behind the scenes during the iteration process. In order to illustrate what is happening in the for loop I will demonstrate the order in which things are being called behind the scenes.
The first step in the iteration process is to call to the
__iter__ function in order to get the iterator object that will perform the iteration. Notice that this works on iterator and iterable objects, since iterators returns themselves and iterables return an iterator.
bv = ByteValue("abcdef") iterator = bv.__iter__()
Now that we have the iterator object, we start iterating by calling the
next function in order to get the next value:
This executes the
next function which will return the byte value of character in the string with which we are currently working. Since this is the first call to the
current_item will be zero and we will calculate the byte value of the first character in our string (‘a’) resulting in:
If we continue the iteration process by calling the
next function six more times we would get the results shown in Listing 3. Notice that we have now made a total of seven calls to the
next function, one more then the number of characters in our string. I’m running the python code from a file (iter.py found in Listing 4). Depending on how you are running it, you might get slightly different results. However, the most important thing to observe in this example is the exception that is raised.
97 98 99 100 101 102 Traceback (most recent call last): File "Listing4.py", line 34, in ? main() File "Listing4.py", line 30, in main print iterator.next() File "Listing4.py", line 13, in next raise StopIteration StopIteration
#!/usr/bin/env python class ByteValue(object): def __init__(self, data): self.data = data self.current_item = 0 def __iter__(self): return self def next(self): if (self.current_item == len(self.data)): raise StopIteration else: byte_value = ord(self.data[self.current_item]) self.current_item += 1 return byte_value return self.data def main(): for v in ByteValue("abc"): if v in ByteValue("abc"): print "We have a %d" % v bv = ByteValue("abcdef") iterator = bv.__iter__() print iterator.next() print iterator.next() print iterator.next() print iterator.next() print iterator.next() print iterator.next() print iterator.next() if __name__ == "__main__": # Someone is launching this directly main()
For the most part you won’t be calling an iterator object’s
next functions manually, instead you’ll probably just let the for loop do it all for you.
Now that our class is an iterator object we can use any of the built-in functions and methods that work on iterators and iterables, such as: the
list functions, to name a few.
In the first example the
bytes function returned a list object that we could work with. If we want a list instead of an iterator for any reason, it will be as simple as using the
list function, which takes an iterator as a parameter and returns a list:
If we want a sorted list:
If we want the sum of all of the bytes:
While very useful, iterators do have some downsides to them. The most obvious is that they are only good for one pass over the data. They also generally require you to add extra data members to your class in order to keep track of your current iterator position. Depending on what you are iterating over, this process can become quite complex. Iterators also only allow you to perform one “type” of iteration, i.e., in one direction or over one piece of internal data. You might address this by adding flags to your class but this will further clutter the class and decrease readability. So how can you simply add multiple types of iteration to a class? Enter the generator!
A generator is a function that creates, or generates, an iterator. In order for a function to become a generator it must return a value using the
Generators are interesting because they are functions, yet execution does not run through them as it does in a normal function. The first time execution enters a generator function it will start at the beginning of the function and continue until the
yield keyword is encountered. When the iteration continues, execution will continue in the generator function on the statement immediately following the
yield keyword. All local variables in the function will remain intact. If the yield statement occurs within a loop, execution will continue within the loop as though execution had not been interrupted.
Continuing with the ByteValue example, let’s add a generator function named
reverse that can be used to iterate through the byte vales of the string in reverse order:
def reverse(self): current_item = len(self.data) while (current_item > 0): current_item -= 1 yield ord(self.data[current_item])
So what’s going on here? The first thing to notice is that we did not add anymore data members to our class, this generator is a self-contained unit. Secondly, since this is a generator, our counter
current_item can be a local variable.
reverse function the first step is to initialize
current_item, which represents the current character in the string, to be equal to the length our string. We initialize it to the length of the string instead of zero since we are iterating through the string in reverse. Next we have a while loop that loops while
current_item is greater than zero. We then subtract one from our counter, to give us the current character to process. Finally, we yield the byte value of the current character.
Note: We subtract one from our counter the first time through the loop because Python is zero-based and the length of a list minus one gives us the position of the last item in the list. In our examples we have used the following string:
When we calculate the length of the string we get 6. We then subtract 1 from that number, leaving 5, which is the index of the last number in the string.
Making use of our new generator function, we run the following:
bv = ByteValue("abcdef") for byte in bv.reverse(): print byte
We get our favourite byte values in reverse:
102 101 100 99 98 97
Let’s take a detailed look at what is happening in the generator in the same way that we did earlier with the iterator object. The first thing that happens when you call a generator function is NOT the execution of the actual function, rather, it is the creation of a generator object. Running the following:
bv = ByteValue("abcdef") gen = bv.reverse() print gen
will result in:
<generator object at 0xb7d8e04c>
This demonstrates that, as stated above, the first call to our generator function does not return the byte value of that last character in the string, instead it creates a generator object. In fact the first time a generator function is called, the actual function is not executed at all. A generator object is “what Python uses to implement generator iterators. They are normally created by iterating over a function that yields values”.
Once we have a generator object we can start calling its
next function (sound familiar?) to perform the action iteration. An example of this can be found in Listing 5. The results of this execution can be found in Listing 6. The results will seem very familiar to you, especially the
#!/usr/bin/env python class ByteValue(object): def __init__(self, data): self.data = data self.current_item = 0 def __iter__(self): return self def next(self): if (self.current_item == len(self.data)): raise StopIteration else: byte_value = ord(self.data[self.current_item]) self.current_item += 1 return byte_value return self.data def reverse(self): current_item = len(self.data) while (current_item > 0): current_item -= 1 yield ord(self.data[current_item]) def main(): bv = ByteValue("abcdef") gen = bv.reverse() print gen.next() print gen.next() print gen.next() print gen.next() print gen.next() print gen.next() print gen.next() if __name__ == "__main__": # Someone is launching this directly main()
102 101 100 99 98 97 Traceback (most recent call last): File "Listing5.py", line 40, in ? main() File "Listing5.py", line 36, in main print gen.next() StopIteration
If we want to look at the contents of the generator object we could use the following code:
And getting the following results:
['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', 'gi_frame', 'gi_running', 'next']
Notice that the generator object contains the
next functions making the generator object itself an iterator. Because the generator itself is an iterator object, the same built-in iterator functions I mentioned earlier can also be used on our generators:
It is also important to remember that your generator or iterator does not have to perform only “dumb” iteratation, simply moving through some sort of an internal list. Rather, it can make decisions just like any other block of code.
For example, letÃ¢Â€Â™s say that in our
reverse generator we want to use the byte value 99 as an end condition. We can do something similar to the example found in Listing 7.
def reverse(self): current_item = len(self.data) while (current_item > 0): current_item -= 1 value = ord(self.data[current_item]) if (value == 99): return yield value
In Listing 7 if the byte value equals 99 we return from our generator function using the
return keyword. Since we didn’t yield anything this will cause the
StopIteration exception to be fired halting the iteration.
Be careful about getting too smart with your iteration because you cannot return any information from a generator you can only yield it. So if you tried to execute the code in Listing 8, in an attempt to return one last value before quitting (or if you wanted to return a success or failure code) you will get the following error when the
return value line of code is executed:
SyntaxError: 'return' with argument inside generator
def reverse(self): current_item = len(self.data) while (current_item > 0): current_item -= 1 value = ord(self.data[current_item]) if (value == 99): return value yield value
I’m sure that by now you are wondering about the iterable objects that I mentioned at the start of this tutorial. As you have probably guessed, simply making your class an iterator isn’t that useful unless it is only performing one task like our ByteValue example. Since your classes will generally be performing more than one task, you will likely want to make your class an iterable object rather than an iterator object. To recap, iterable objects return an iterator object when their
__iter__ function is called, which allows for multiple passes over their data.
Since the definition of an iterable object is an object that returns an iterator object when its
__iter__ function is called, creating an iterable object can be done in a variety of ways. Two options that come to mind are: 1)creating an iterator helper class to perform the iteration and, 2) using a generator function. I prefer using a generator function since it keeps the functionality within the main class.
See Listing 9 for an example of creating an iterarable object using a generator. You will see that we have replaced the
next function with the
forward function. The
forward function is a generator that iterates through the data in the “forward” direction. In the
__iter__ function we return the results of a call to the
forward function, a generator object. Since generator objects contain the iterator protocol and are, in fact iterators, by returning one from our
__init__ function we have successfully created an iterable.
class ByteValue(object): def __init__(self, data): self.data = data def __iter__(self): #We are an iterable, so return our iterator return self.forward() def forward(self): #The forward generator current_item = 0 while (current_item < len(self.data)): byte_value = ord(self.data[current_item]) current_item += 1 yield byte_value def reverse(self): #The reverse generator current_item = len(self.data) while (current_item > 0): current_item -= 1 yield ord(self.data[current_item]) def main(): bv = ByteValue("abc") for v in bv: if v in bv: print "We have a %d" % v if __name__ == "__main__": main()
Now that we have an iterable object, we can iterate over it as many times as we want. We can even have fun with nested iteration:
bv = ByteValue("abcdef") for value in bv: print value for second_value in bv: print second_value
This concludes our introduction to iterators, iterables, and generators. I hope I have demonstrated the immense power and flexibility that they provide. In general, making your object an iterable object and/or using generators allows for more flexibility than simple iterator objects provide. For most complex classes or complex sets of data, multiple iterations are a given.
With all of the praise that I have heaped upon iterators, iterables, and generators, it’s important to remember that they are not a panacea and should not be used in every case where a sequence of items is needed. There are many instances where a returning a list is the desired result. This being said, iterators, iterables, and generators are extremely useful and provide a great way to loop through data.