RSS reader – Part Three – Generator Class


Please remember to read part one and part two.

Classes and Generators

All right, now that we have split our RSS reader up into functions, we’re going to go one step further and put our code into a class.

We’re also going to do something a bit more advanced with our class and create a generator. The reason that we are going to do this is because python has such nice iteration handling and because in the future we’ll probably want to handle each RSS item individually rather then simply dumping it out to the terminal.

Generators are basically a type of iterator, except their syntax is slightly different. In fact anything you can accomplish in a generator you can accomplish in a standard iterator.

Here is an example generator that yields all the factors of the specified number:

def factors (num):
	count = 1;
	while count < = num/2:
		if (num % count == 0):
			yield count
		count = count + 1
	yield num

The special statements that make this function a generator are those two yield statements. (The second yield statement is simply to return the number itself, since a number is always a factor of itself.) The yield statement basically returns the data to the caller and the next time the function is iterated on, the function continues exactly where it left off.

You call the generator in the following way:

for res in factors(30):
	print res

With the following results:


1
2
3
5
6
10
15
30

Code

As you can see having something like this would be very useful in our RSS reader. In order to accomplish this the first new addition to the code that we created in part two is the RSSItem class. The RSSItem class basically represents one RSS item. For the time being it is going to contain only the title and the description. The nice thing about having this in a separate class is that we can add new members at any time without affecting existing code:

class RSSItem:
	"""This is an RSS item, it contain all the RSS info like Tile and Description"""
	def __init__(self,title="",description=""):
		self.title = title
		self.description = description

You'll notice that the RSSItem class has no functions besides the __init__ which takes two optional parameters the title and the description, and then sets two data members equal to those values.

The next thing that we are going to define is out main class, we'll call it RSSReader for lack of a better name:

class RSSReader:
	"""This class is an RSS reader, it should have a better docstring"""

Here is the __init__ function, which takes one parameter, the RSSUrl. The __init__ function also does some "pre-processing" and gets the XML document:

def __init__(self,RSSUrl):
	"""Initialize the class"""
	self.RSSUrl = RSSUrl;
	self.xmldoc = self.GetXMLDocument(RSSUrl)
	if (not self.xmldoc):
		print "Error Getting XML Document!"

The RSSUrl class has two member variables, RSSUrl and xmldoc. You'll also notice that the __init__ function calls a member function of RSSReader called GetXMLDocument, which does exactly what it says:

def GetXMLDocument(self,RSSUrl):
	"""This function reads in a RSS URL and then"""
	"""returns the XML document on success"""
	url_info = urllib2.urlopen(RSSUrl)
	xmldoc = None
	if (url_info):
		xmldoc = minidom.parse(url_info)
	else	:
		print "Error Getting URL"
	return xmldoc

The next function that I'm going to show you is the generator, and how simply it actually is:

def GetItems(self):
	"""Generator to get items"""
	for item_node in self.xmldoc.documentElement.childNodes:
			if (item_node.nodeName == "item"):
				"""All right we have an item"""
				rss_item = self.CreateRSSItem(item_node)
				yield rss_item

You'll see that all this generator does is iterate through all of the nodes in the XML document until it comes to a node named "Item". Once a node named "item" is encountered a member function called CreateRSSItem (Which creates an RSSItem) is called and then that RSSItem is yielded to the caller.

CreateRSSItem is another simply function:

def CreateRSSItem(self,item_node):
	"""Create an RSS item and return it"""
	title = self.GetChildText(item_node,"title")
	description = self.GetChildText(item_node,"description")
	return RSSItem(title,description)

All this function does is call the member function GetChildText for the "title" and "description" values. GetChildText basically searches through the passed xml node's children searching for a match to the second parameter, and if one is found that item's text is returned.

Then CreateRSS item call the constructor of the RSSItem class and returns the created class: return RSSItem(title,description)

There are only two more functions left, both of which are (as you might have guessed) very simply. This is the beauty of using functions, you get to break up the code into manageable and reusable tasks. The two functions that are left are GetChildText and GetItemText.

Where GetChildText searched through childNodes for a matching item, GetItemText simply returns an XML node’s text:

def GetItemText(self,xml_node):
	"""Get the text from an xml item"""
	text = ""
	for text_node in xml_node.childNodes:
		if (text_node.nodeType == Node.TEXT_NODE):
			text += text_node.nodeValue
	return text

def GetChildText(self, xml_node, child_name):
	"""Get a child node from the xml node"""
	if (not xml_node):
		print "Error GetChildNode: No xml_node"
		return ""
	for item_node in xml_node.childNodes:
		if (item_node.nodeName==child_name):
			return self.GetItemText(item_node)
	"""Return Nothing"""
	return ""

All that’s left is to show how we create an instance of the RSSReader class, and then use the generator to iterate through all of the RSSItems:

if __name__ == "__main__":
	rss_reader = RSSReader('http://rss.slashdot.org/Slashdot/slashdot')
	for rss_item in rss_reader.GetItems():
		if (rss_item):
			print rss_item.title
			print ""
			print rss_item.description
			print ""

Here is the code in its entirety, it call also be downloaded from here:

#! /usr/bin/env python

import urllib2
from xml.dom import minidom, Node

class RSSItem:
	"""This is an RSS item, it contain all the RSS info like Tile and Description"""
	def __init__(self,title="",description=""):
		self.title = title
		self.description = description

class RSSReader:
	"""This class is an RSS reader, it should have a better docstring"""
	
	def __init__(self,RSSUrl):
		"""Initialize the class"""
		self.RSSUrl = RSSUrl;
		self.xmldoc = self.GetXMLDocument(RSSUrl)
		if (not self.xmldoc):
			print "Error Getting XML Document!"
		
	def GetXMLDocument(self,RSSUrl):
		"""This function reads in a RSS URL and then"""
		"""returns the XML documentn on success"""
		url_info = urllib2.urlopen(RSSUrl)
		xmldoc = None
		if (url_info):
			xmldoc = minidom.parse(url_info)
		else	:
			print "Error Getting URL"
		return xmldoc
	
	def GetItemText(self,xml_node):
		"""Get the text from an xml item"""
		text = ""
		for text_node in xml_node.childNodes:
			if (text_node.nodeType == Node.TEXT_NODE):
				text += text_node.nodeValue
		return text
	
	def GetChildText(self, xml_node, child_name):
		"""Get a child node from the xml node"""
		if (not xml_node):
			print "Error GetChildNode: No xml_node"
			return ""
		for item_node in xml_node.childNodes:
			if (item_node.nodeName==child_name):
				return self.GetItemText(item_node)
		"""Return Nothing"""
		return ""
	
	def CreateRSSItem(self,item_node):
		"""Create an RSS item and return it"""
		title = self.GetChildText(item_node,"title")
		description = self.GetChildText(item_node,"description")
		return RSSItem(title,description)
	
	def GetItems(self):
		"""Generator to get items"""
		for item_node in self.xmldoc.documentElement.childNodes:
				if (item_node.nodeName == "item"):
					"""Allright we have an item"""
					rss_item = self.CreateRSSItem(item_node)
					yield rss_item
					
if __name__ == "__main__":
	rss_reader = RSSReader('http://rss.slashdot.org/Slashdot/slashdot')
	for rss_item in rss_reader.GetItems():
		if (rss_item):
			print rss_item.title
			print ""
			print rss_item.description
			print ""
selsine

del.icio.us del.icio.us

7 Responses to “RSS reader – Part Three – Generator Class”

  1. learning python » Blog Archive » Creating a GUI in Python using Tkinter - Part 2
    Says:

    [...] The GUI that we are going to start creating is the GUI that we will eventually use for the RSS reader that I am creating. [...]

  2. Andre
    Says:

    Any idea why the following feeds don’t work?

    http://africa.reuters.com/business/news/rss.xml
    http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml

    The RSS feed you use in the example works fine, but the two mentioned above does not give me any output.

    Regards,

    Andre

  3. physicians mutual term insurance
    Says:

    physicians mutual term insurance…

    similitude Byronism appeals boring?collect,…

  4. master in healthcare administration
    Says:

    master in healthcare administration…

    cremates Scarborough Africa?convincingly:subtle berated …

  5. vatts
    Says:

    hey it’s great script, but how can i inplement that in non-libary python IRC bot (i have already take care for sceleton, i need the inside things :<) and how would i just add feeds into feedlist.xml so bot reads from there?

  6. selsine

    selsine
    Says:

    Hi Vatts,

    No idea really, I’m not familiar with python IRC bots.

    Best of luck.

  7. hey
    Says:

    wooooww!! man this is the most helpful and detailed wesite i have come across, as i am also trying to make a RSS feed reader!!

    THANKKKKKKKSS soo much dude!

    cheers

Leave a Reply

 

Popular Posts