Iterators, Iterables, and Generators! Oh, my!


By: Mark Mruss

Note: This article was first published the January 2008 issue of Python Magazine

Iterators, iterables, and generators are features handled so wall by Python that people programming in other languages cannot help but drool over. Fortunately for us, creating iterators, iterables and generators is a relatively simple task. This article introduces the concepts of iterators, iterables, and generators and illustrates how easy it is to add them to your code.

  1. Introduction
  2. Iteration in Python
  3. An Initial Example
  4. Creating An Iterator
  5. Looking More Closely At The Iterator
  6. The Upside And Downside Of Iterators
  7. Generators
  8. Looking Closely At The Generator
  9. But What About Iterables?
  10. Creating An Iterable Object
  11. Conclusion

Introduction

In this article I’m going to introduce three related Python features: iterators, iterables, and generators. Generators are easy to define, they are functions that create and return an iterator. Iterators and iterables on the other hand, are easier to use than they are to define. An iterable object is a “container object capable of returning its members one at a time.”[1] An iterator object is “An object representing a stream of data. Repeated calls to the iterator’s next() method return successive items in the stream. When no more data is available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again.”[1] You can think of the difference between the two in this way: an iterable object can be iterated over multiple times, whereas an iterator object can only be iterated over once. In general an iterable produces an iterator every time something wants to iterate over its data.

Note: Classes that define the __getitem__ function are also considered iterables, but since that falls outside the scope of this article, it will not be covered here.

In this tutorial, I will begin by discussing iterators, the most basic concept. Then I will move onto generators, and finish by discussing iterables, the most wide open topic of the three.

Iteration in Python

Iterators objects are used in Python in order to iterate over an objects data. For example, we all know how to do this in Python when we work with lists:

my_list = [1,2,3]
for num in my_list:
	print num

This code will iterate over the list object my_list and print out all of the list items , i.e., the numbers 1, 2, and 3. Iterating over sequences in this simple and transparent manner happens to be one of my favourite features of Python.

According to our definition above, lists are iterables since you can iterate over them multiple times. In fact, each time you iterate over a list you are actually using a listiterator iterator object produced by the list.

It may not be immediately clear to you when you should add an iteration support to a class. However, the more you work with Python the more you’ll find instances when doing just this is very useful, sometimes the only advantage is cleaner looking code. One nice thing about iterators (and generators too) is that the processing for each item happens as you need it. Instead of collecting all of the data into a list and then running through the list, you will collect each item as you need it. This might not seem like a large difference but imagine if there were tens of thousands of items to process? What if you were collection your data from an online source? Performing all of your processing up front may take a very long time, especially if you only wanted the first few items.

An initial example

In order to explain iteration further, let’s look at a simple example task where we might use iterators. For this example we will create a class that takes a string of characters as input and then converts each character into its byte value. If we were NOT going to use iterators we might do something like what is found in Listing 1.

Listing 1

class ByteValue(object):

	def __init__(self, data):
		self.data = data

	def to_bytes(self):
		bytes = []
		for char in self.data:
			bytes.append(ord(char))
		return bytes

This code is pretty simple. We have a data member, named data, that we use to store the string that was used to initialize the class. In the to_bytes function we loop through the string, converting each character to its byte value using the built in ord function. We store each byte value in a list and once we have collected all of the values we return that list.

When we run the following:

bv = ByteValue("abcdef")
for byte in bv.to_bytes():
	print byte

we would get this as our output:

97
98
99
100
101
102

Creating an Iterator

Let’s convert this into an iterator. Making your class into an iterator requires adding “two methods, which together form the iterator protocol.”[2] The two functions needed are: 1) the __iter__ function; and, 2) the next function. The __iter__ function will return the object itself, while the next function will return the next item. The next function is where the actual iteration work occurs. The next function iterates by returning the next item in the “sequence” each time it is called for as long as there is a “next” item. When there are no more items to iterate over, the next function must raise the StopIteration exception to halt the iteration.

To be clear, in order to make your class an iterator you need to do two things:

1) Add an __iter__ function that returns the object itself (self)

2) Add a next function that returns the next item in the sequence each time it is called. When there are no more items in the sequence, the next function raises a StopIteration exception signal the end of the iteration.

For those of you still confused, the following example will help illustrate how iterators work. If we were to convert our ByteValue class into an iterator object, it might look something like Listing 2.

Listing 2

class ByteValue(object):

	def __init__(self, data):
		self.data = data
		self.current_item = 0

	def __iter__(self):
		return self

	def next(self):
		if (self.current_item == len(self.data)):
			raise StopIteration
		else:
			byte_value = ord(self.data[self.current_item])
			self.current_item += 1
			return byte_value

Let’s compare the code in Listings 1 and Listing 2 in detail, focusing on the iterator in Listing 2. The first difference is the addition of the data member current_item to the class, initialized in the __init__ function. current_item serves as a counter and keeps track of the current character in the string while we iterate over it. The counter must have class scope since the iterator works through successive calls to the next function. If current_item were local to the next function, its value would be reset with each subsequent call, and would not be of much use.

The second difference between the listings is the addition in Listing 2, of the __iter__ function where we return self.

The final addition to Listing 2 is the next function, where we first check to see if current_item is equal to the length of our string. If current_item is equal to the strings length we raise the StopIteration exception to signal the end of the iteration because we have no more characters left to iterate over. If there are more characters to iterate over we calculate the byte value of the current character, increase our current_item counter, and then return the byte value.

Note: Notice that current_item is only initialized when the ByteValue object is created. This happens because according to the iteration protocol, “once an iterator’s next() method raises StopIteration, it will continue to do so on subsequent calls.”[2] If we were to re-initialize current_item we would then be able to iterate over the iterator more than once breaking the iteration protocol.

Now that we have converted our class into an iterator we can use it as follows:

for byte in ByteValue("abcdef"):
	print byte

Doing so would result in:

97
98
99
100
101
102

Notice that we do not store the instance of our ByteValue class in a variable. Doing so would be useless because since ByteValue is an iterator it is only good for one pass of the data. If ByteValue were an iterable (returning an iterator object when __iter__ was called) it would make sense to keep an instance around because we could iterate over the instance more than once. We will look at creating iterables later on in this article.

Looking more closely at the Iterator

Let’s look at what is happening in more detail by examining what is happening behind the scenes during the iteration process. In order to illustrate what is happening in the for loop I will demonstrate the order in which things are being called behind the scenes.

The first step in the iteration process is to call to the __iter__ function in order to get the iterator object that will perform the iteration. Notice that this works on iterator and iterable objects, since iterators returns themselves and iterables return an iterator.

bv = ByteValue("abcdef")
iterator = bv.__iter__()

Now that we have the iterator object, we start iterating by calling the next function in order to get the next value:

print iterator.next()

This executes the next function which will return the byte value of character in the string with which we are currently working. Since this is the first call to the next function current_item will be zero and we will calculate the byte value of the first character in our string (‘a’) resulting in:

97

If we continue the iteration process by calling the next function six more times we would get the results shown in Listing 3. Notice that we have now made a total of seven calls to the next function, one more then the number of characters in our string. I’m running the python code from a file (iter.py found in Listing 4). Depending on how you are running it, you might get slightly different results. However, the most important thing to observe in this example is the exception that is raised.

Listing 3

97
98
99
100
101
102
Traceback (most recent call last):
  File "Listing4.py", line 34, in ?
    main()
  File "Listing4.py", line 30, in main
    print iterator.next()
  File "Listing4.py", line 13, in next
    raise StopIteration
StopIteration

Listing 4

#!/usr/bin/env python

class ByteValue(object):

	def __init__(self, data):
		self.data = data
		self.current_item = 0
	def __iter__(self):
		return self

	def next(self):
		if (self.current_item == len(self.data)):
			raise StopIteration
		else:
			byte_value = ord(self.data[self.current_item])
			self.current_item += 1
			return byte_value
		return self.data

def main():
    for v in ByteValue("abc"):
        if v in ByteValue("abc"):
            print "We have a %d" % v

    bv = ByteValue("abcdef")
    iterator = bv.__iter__()
    print iterator.next()
    print iterator.next()
    print iterator.next()
    print iterator.next()
    print iterator.next()
    print iterator.next()
    print iterator.next()

if __name__ == "__main__":
	# Someone is launching this directly
	main()

For the most part you won’t be calling an iterator object’s __iter__ or next functions manually, instead you’ll probably just let the for loop do it all for you.

The upside and downside of Iterators

Now that our class is an iterator object we can use any of the built-in functions and methods that work on iterators and iterables, such as: the sum, tuple, sorted, and list functions, to name a few.

In the first example the bytes function returned a list object that we could work with. If we want a list instead of an iterator for any reason, it will be as simple as using the list function, which takes an iterator as a parameter and returns a list:

list(ByteValue("abcdef"))

If we want a sorted list:

sorted(ByteValue("abcdef"))

If we want the sum of all of the bytes:

sum(ByteValue("abcdef"))

While very useful, iterators do have some downsides to them. The most obvious is that they are only good for one pass over the data. They also generally require you to add extra data members to your class in order to keep track of your current iterator position. Depending on what you are iterating over, this process can become quite complex. Iterators also only allow you to perform one “type” of iteration, i.e., in one direction or over one piece of internal data. You might address this by adding flags to your class but this will further clutter the class and decrease readability. So how can you simply add multiple types of iteration to a class? Enter the generator!

Generators

A generator is a function that creates, or generates, an iterator. In order for a function to become a generator it must return a value using the yield keyword.

Generators are interesting because they are functions, yet execution does not run through them as it does in a normal function. The first time execution enters a generator function it will start at the beginning of the function and continue until the yield keyword is encountered. When the iteration continues, execution will continue in the generator function on the statement immediately following the yield keyword. All local variables in the function will remain intact. If the yield statement occurs within a loop, execution will continue within the loop as though execution had not been interrupted.

Continuing with the ByteValue example, let’s add a generator function named reverse that can be used to iterate through the byte vales of the string in reverse order:

def reverse(self):
	current_item = len(self.data)
	while (current_item > 0):
		current_item -= 1
		yield ord(self.data[current_item])

So what’s going on here? The first thing to notice is that we did not add anymore data members to our class, this generator is a self-contained unit. Secondly, since this is a generator, our counter current_item can be a local variable.

In the reverse function the first step is to initialize current_item, which represents the current character in the string, to be equal to the length our string. We initialize it to the length of the string instead of zero since we are iterating through the string in reverse. Next we have a while loop that loops while current_item is greater than zero. We then subtract one from our counter, to give us the current character to process. Finally, we yield the byte value of the current character.

Note: We subtract one from our counter the first time through the loop because Python is zero-based and the length of a list minus one gives us the position of the last item in the list. In our examples we have used the following string:

abcdef
012345

When we calculate the length of the string we get 6. We then subtract 1 from that number, leaving 5, which is the index of the last number in the string.

Making use of our new generator function, we run the following:

bv = ByteValue("abcdef")
for byte in bv.reverse():
	print byte

We get our favourite byte values in reverse:

102
101
100
99
98
97

Looking closely at the Generator

Let’s take a detailed look at what is happening in the generator in the same way that we did earlier with the iterator object. The first thing that happens when you call a generator function is NOT the execution of the actual function, rather, it is the creation of a generator object. Running the following:

bv = ByteValue("abcdef")
gen = bv.reverse()
print gen

will result in:

<generator object at 0xb7d8e04c>

This demonstrates that, as stated above, the first call to our generator function does not return the byte value of that last character in the string, instead it creates a generator object. In fact the first time a generator function is called, the actual function is not executed at all. A generator object is “what Python uses to implement generator iterators. They are normally created by iterating over a function that yields values”.[3]

Once we have a generator object we can start calling its next function (sound familiar?) to perform the action iteration. An example of this can be found in Listing 5. The results of this execution can be found in Listing 6. The results will seem very familiar to you, especially the StopIteration exception.

Listing 5

#!/usr/bin/env python

class ByteValue(object):

	def __init__(self, data):
		self.data = data
		self.current_item = 0
	def __iter__(self):
		return self

	def next(self):
		if (self.current_item == len(self.data)):
			raise StopIteration
		else:
			byte_value = ord(self.data[self.current_item])
			self.current_item += 1
			return byte_value
		return self.data

	def reverse(self):
		current_item = len(self.data)
		while (current_item > 0):
			current_item -= 1
			yield ord(self.data[current_item])

def main():

	bv = ByteValue("abcdef")
	gen = bv.reverse()
	print gen.next()
	print gen.next()
	print gen.next()
	print gen.next()
	print gen.next()
	print gen.next()
	print gen.next()

if __name__ == "__main__":
	# Someone is launching this directly
	main()

Listing 6

102
101
100
99
98
97
Traceback (most recent call last):
  File "Listing5.py", line 40, in ?
    main()
  File "Listing5.py", line 36, in main
    print gen.next()
StopIteration

If we want to look at the contents of the generator object we could use the following code:

print dir(gen)

And getting the following results:

['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', 'gi_frame', 'gi_running', 'next']

Notice that the generator object contains the __iter__ and next functions making the generator object itself an iterator. Because the generator itself is an iterator object, the same built-in iterator functions I mentioned earlier can also be used on our generators:

print sum(bv.reverse())

It is also important to remember that your generator or iterator does not have to perform only “dumb” iteratation, simply moving through some sort of an internal list. Rather, it can make decisions just like any other block of code.

For example, let’s say that in our reverse generator we want to use the byte value 99 as an end condition. We can do something similar to the example found in Listing 7.

Listing 7

def reverse(self):
	current_item = len(self.data)
	while (current_item > 0):
		current_item -= 1
		value = ord(self.data[current_item])
		if (value == 99):
			return
		yield value

In Listing 7 if the byte value equals 99 we return from our generator function using the return keyword. Since we didn’t yield anything this will cause the StopIteration exception to be fired halting the iteration.

Be careful about getting too smart with your iteration because you cannot return any information from a generator you can only yield it. So if you tried to execute the code in Listing 8, in an attempt to return one last value before quitting (or if you wanted to return a success or failure code) you will get the following error when the return value line of code is executed:

SyntaxError: 'return' with argument inside generator

Listing 8

def reverse(self):
	current_item = len(self.data)
	while (current_item > 0):
		current_item -= 1
		value = ord(self.data[current_item])
		if (value == 99):
			return value
		yield value

But what about iterables?

I’m sure that by now you are wondering about the iterable objects that I mentioned at the start of this tutorial. As you have probably guessed, simply making your class an iterator isn’t that useful unless it is only performing one task like our ByteValue example. Since your classes will generally be performing more than one task, you will likely want to make your class an iterable object rather than an iterator object. To recap, iterable objects return an iterator object when their __iter__ function is called, which allows for multiple passes over their data.

Creating an Iterable object

Since the definition of an iterable object is an object that returns an iterator object when its __iter__ function is called, creating an iterable object can be done in a variety of ways. Two options that come to mind are: 1)creating an iterator helper class to perform the iteration and, 2) using a generator function. I prefer using a generator function since it keeps the functionality within the main class.

See Listing 9 for an example of creating an iterarable object using a generator. You will see that we have replaced the next function with the forward function. The forward function is a generator that iterates through the data in the “forward” direction. In the __iter__ function we return the results of a call to the forward function, a generator object. Since generator objects contain the iterator protocol and are, in fact iterators, by returning one from our __init__ function we have successfully created an iterable.

Listing 9

class ByteValue(object):

	def __init__(self, data):
		self.data = data

	def __iter__(self):
		#We are an iterable, so return our iterator
		return self.forward()

	def forward(self):
		#The forward generator
		current_item = 0
		while (current_item < len(self.data)):
			byte_value = ord(self.data[current_item])
			current_item += 1
			yield byte_value

	def reverse(self):
		#The reverse generator
		current_item = len(self.data)
		while (current_item > 0):
			current_item -= 1
			yield ord(self.data[current_item])

def main():
    bv = ByteValue("abc")
    for v in bv:
        if v in bv:
            print "We have a %d" % v

if __name__ == "__main__":
    main()

Now that we have an iterable object, we can iterate over it as many times as we want. We can even have fun with nested iteration:

bv = ByteValue("abcdef")
for value in bv:
	print value
	for second_value in bv:
		print second_value

Conclusion

This concludes our introduction to iterators, iterables, and generators. I hope I have demonstrated the immense power and flexibility that they provide. In general, making your object an iterable object and/or using generators allows for more flexibility than simple iterator objects provide. For most complex classes or complex sets of data, multiple iterations are a given.

With all of the praise that I have heaped upon iterators, iterables, and generators, it’s important to remember that they are not a panacea and should not be used in every case where a sequence of items is needed. There are many instances where a returning a list is the desired result. This being said, iterators, iterables, and generators are extremely useful and provide a great way to loop through data.

[1] http://docs.python.org/tut/node18.html
[2] http://docs.python.org/lib/typeiter.html
[3] http://docs.python.org/api/gen-objects.html

selsine

del.icio.us del.icio.us

28 Responses to “Iterators, Iterables, and Generators! Oh, my!”

  1. reine
    Says:

    Indentation 4 spaces anyone??? (and no tabs!)

  2. peyroux
    Says:

    Python Magazine January 2008 : http://www.pythonmagazine.com/c/issue/view/66

  3. Michael Dillon
    Says:

    Personally, I think more than two spaces of indentation is just a waste of good white space.

  4. selsine

    selsine
    Says:

    Hi reine and Michael,

    Thanks for the comments, the listings I had on my system were older files that used tabs instead of spaces. I didn’t think anyone would mind so I just pasted the listings. If I get a chance I’ll edit them.

  5. selsine

    selsine
    Says:

    Hi peyroux,

    Yes you are correct, as I say at the top of this article, I originally wrote this for the January 2008 issue of Python Magazine. I’ll update the link so that it points to the actual issue.

  6. Ted
    Says:

    A few typos in the first paragraph:

    >Iterators, iterables, and generators are features handles so wall …

    Should be \handled so well\.

    Later…

    >Fortunately for us, creating iterators, iterablesm

    Drop the extra m.

    The rest is less typo-riffic. It probably just more introduction than the little intro paragraph.

  7. Ted
    Says:

    And then I typos the heck out of my post. Fun.

  8. selsine

    selsine
    Says:

    Thanks for the info Ted, I’ve made the changes.

  9. anon
    Says:

    Maybe it would be better to use try: … except IndexError: … instead of an If in the next function.

  10. Tom
    Says:

    Thanks for this – thought you’d like to know that the links on the home page are broken – at least in Chrome – EG. http://www.learningpython.com/#CreatingAnIterator

  11. iterators, iterables, generators - ShineIT
    Says:

    [...] iterables, generators 原文出处:http://www.learningpython.com/2009/02/23/iterators-iterables-and-generators-oh-my/ 原文作者:Mark Mruss [...]

  12. Guillaume Aubert
    Says:

    Thanks for this article.

    I was missing some bits regarding the generators and now it is crystal clear.

    I will recommend it

  13. jarav
    Says:

    Hi,
    Thanks for the article. Am just discovering the joy of python programming.

    I am confused about why “iterators…are only good for one pass over the data.”. Surely that depends on how we define the ‘next’ function. Suppose we define the ‘next’ function in Listing 2 like this( i reset the counter just before StopIteration is raised ):

    def next(self):
    if (self.current_item == len(self.data)):
    # reset counter
    self.current_item = 0
    raise StopIteration
    else:
    byte_value = ord(self.data[self.current_item])
    self.current_item += 1
    return byte_value

    we will be able to reuse the iterator. In the ‘generator’ object, I guess this happens automatically since the ‘counter’ is local to the generator object.

  14. jarav
    Says:

    Sorry for that last question. You have said in your article that the iterator protocol demands that once StopIteration is raised, it should continue to be raised.

  15. Rune
    Says:

    Very informative and well written article!

  16. ksamuel
    Says:

    Makes me think of this tuto :

    http://stackoverflow.com/questions/231767#answer-231855

  17. SoftwareExplorer
    Says:

    You said “handled so *wall*” instead of “handled so *well*”. Ted mentioned this, but he was talking about a few other errors two, so you probably didn’t notice the change.

    Thanks for these tutorials. They are the best I have found so far.

  18. Cillia johnson
    Says:

    This is great. I was having difficulties because of spaces in my indentation, but that has now been solved. It took me days to figure the indentation problem out as a beginner :)

    Nice tutorial BTW ..

  19. PC Repair
    Says:

    Just wondering why we have not seen you in a long time.
    I personally have missed your amazing step-by-step tutorial.

    We hope you’re fine and wish you the very best.
    Good luck

  20. Henry Dominik
    Says:

    He seems to have resumed blogging again. Check out the post he made in January of this year. I’m happy he’s back :)

  21. selsine

    selsine
    Says:

    Hi Henry,

    Thanks for the kind words. I’m trying to resume blogging as best I can but a one moth old eats up a lot of free time! Not to mention moving cities and a new house!

    I have something that I’m hoping to get up soon, and hopefully in the future there will be less time between posts!

    mark

  22. Matt Thiessen
    Says:

    A good explanation and examples of iterator of a class. Helped me solve my issue. Thanks

  23. spititan
    Says:

    This is a good article, but I just want to point out that a few things in this article is actually a bit misleading.

    An iterator actually does not need to have an __iter__ method. next() is the only required method. The ‘in’ operator of Python actually calls __iter__ method on an iterable, then call next() method on the returned iterator.

    See the following example code

    #!/usr/bin/python2.6

    class ByteValue(object):
    def __init__(self, data):
    self.data = data

    class Iterator(object):
    def __init__(self, data):
    self.data = data
    self.index = 0

    def next(self):
    if self.index == len(self.data):
    raise StopIteration
    else:
    byte_value = ord(self.data[self.index])
    self.index += 1
    return byte_value

    def __iter__(self):
    return ByteValue.Iterator(self.data)

    bv = ByteValue(“abcdefg”)
    for byte_value in bv:
    print(byte_value)

    for byte_value in bv:
    print(byte_value)

    In this example, ByteValue is interable, the iterator for him is ByteValue.Iterator, which does not have __iter__ method.

  24. Sumudu Fernando
    Says:

    @spititan: Your example works, but the language does require iterators to implement __iter__. What you have defined is almost an iterator, but technically isn’t one. The reason this requirement exists is that it makes it easy to write functions that can accept either an iterable or an iterator, the same way that built-ins such as sum can.

    More precisely, if we write a loop like:

    for x in iter(foo):
    pass

    then foo can be either an iterable or an iterator.

    One thing about the examples in the article is that they are slightly more complicated than necessary. For example, I would write:

    def forward(self):
    for c in self.data:
    yield ord(c)

    and

    def reverse(self):
    for c in reversed(self.data):
    yield ord(c)

    (though maybe the reversed built-in did not exist when the article was written)

  25. Noel
    Says:

    Wonderful write-up. I enjoy writing and so I appreciate the total clarity and the just-the-right-sentences you provided along the way to make sure to convey nuance.

    Marvellous!

  26. Mailhos
    Says:

    Please fix: “Iterators, iterables, and generators are features handled so wall by Python”. Should write “well” not “wall”.

  27. Interview questions for pyton and php programmers | {% mmrahman.co.uk %}
    Says:

    [...] Iterators http://www.learningpython.com/2009/02/23/iterators-iterables-and-generators-oh-my/ [...]

  28. Eugenia
    Says:

    It is not only the trendy designs but then the durability also where it appeals to all.
    They are available in a range concerning leather and fabric
    upholstery.

Leave a Reply

 

Popular Posts