Introducing Descriptors and Properties


Note: This article was first published the May 2008 issue of Python Magazine

Introducing Descriptors and Properties

Mark Mruss

New-style classes were introduced to Python with the release of Python 2.2. And with these new-style classes came descriptors and properties. This article will introduce the descriptor protocol, descriptors, and properties.

Introduction

New-style classes were introduced to Python with the release of Python 2.2. A new-style class is any class that is derived from the object base class. New-style classes give Python programmers many new (and initially confusing) features. One such feature is the descriptor protocol, and more specifically descriptors themselves.

Descriptors give Python programmers the ability to easily and efficiently create “managed attributes”. Managed attributes can be thought of as attributes that are not accessed directly. Instead their access is “managed” by something else, generally a class or a function.

If you haven’t come across this before you are probably wondering why one would want to manage attribute access? One reason might be that you don’t want people to be able to delete the attribute. Another reason may be that you need to ensure that your attribute data is always valid. Or perhaps attribute x is based on attribute y, so every time the value of y changes you want to update the value of x. From these few examples you can see the many possible cases where you might want to control access to certain attributes.

For those of you familiar with other programming languages, this type of access is often referred to as “getters and setters”. In many language, implementing “getters and setters” means using private variables and public functions that get and set the variable’s value. Since Python doesn’t (really) have private variables, the descriptor protocol is basically a built-in and Python-ic way to way to achieve something similar.

This article will introduce you to the descriptor protocol, descriptors, and properties. It will focus on demonstrating how to use them to create managed attributes. Since the descriptor protocol requires new-style classes, all of the examples in this article require Python 2.2 or newer.

A few definitions

Before moving forward, it is important to understand a few related terms. These terms will introduce some basic concepts and help you follow along with the remainder of the article.

descriptor protocol – The following three methods make up the descriptor protocol: __get__, __set__, and __delete__.

descriptor – An “object attribute with binding behavior, one whose attribute access has been overridden by methods in the descriptor protocol.” [1] In other words an “attributes whose usage resembles attribute access, but whose implementation uses method calls.”[2]

data descriptor – A descriptor with the __get__ and __set__ methods of the descriptor protocol defined.

non-data descriptor – A descriptor with only the __get__ method of the descriptor protocol defined. “Python methods (including staticmethod() and classmethod()) are implemented as non-data descriptors.” [3]

property – A built-in type that implements the descriptor protocol and allows you to easily create data descriptors.

Don’t worry if you don’t fully understand these definitions, the remainder of this article will hopefully clarify any confusion you have.

The Descriptor Protocol

Let’s take a closer look at the descriptor protocol and see how we can use it to create a descriptor. As previously mentioned, the descriptor protocol is made up of three methods: __get__, __set__, and __delete__. These methods have specific signatures and they are as follows, where self is the class that owns the methods:

__get__(self, instance, owner)
__set__(self, instance, value)
__delete__(self, instance)

These three methods represent the three basic operations that you perform on attributes in general: querying the value of that attribute; assigning a value to it; and, (very rarely) deleting it. They work as follows:

  • The __get__ method is called when the attributes value is being queried. The __get__ method should return the (computed) attribute value or raise an AttributeError exception.” [4] This is where access to the attribute’s value is managed.
  • The __set__ method is used in the assignment operation. It is called when we want to set the attribute value. This is where you can control what values, or types of values are being assigned to your attribute.
  • Finally, the __delete__ method is called when we want to delete the attribute. Here you can (rarely) decide whether or not to delete the attribute.

There are also three different parameters passed to the three methods (excluding the standard self parameter for methods that belong to a class):

  • owner – This “is always the owner class.” [5] This means that it is the actual class, and not an instance of the class. So if the descriptor is in a class called MyClass, owner will be that class.
  • instance – An instance of class type owner. It is “the instance that the attribute was accessed through” [6], or None if the attribute is being accessed through the class (owner) instead of an instance.
  • value- The value that the attribute is being set to.

This difference between owner and instance might be a bit confusing, so let’s look at a quick example. Let’s say we have a descriptor my_descriptor in the class MyClass, if we were to run the following code:

my_class_instance = MyClass()
print my_class_instance.my_descriptor

The second line queries the descriptors value and results in the __get__ method being called with the instance parameter being my_class_instance. The owner parameter will be the MyClass class.

Note: Notice that we treat the descriptor my_descriptor as though it is a normal attribute. We don’t call print my_class_instance.my_descriptor.__get__(my_class, MyClass). This is what was meant by: “attributes whose usage resembles attribute access, but whose implementation uses method calls.”[7]

If the following code were run:

print MyClass.my_descriptor

The __get__ method will again be called. This time the instance parameter will be None and the owner parameter will be the MyClass class.

Note: Only the __get__ method has the owner parameter. This means that it is the only function in the descriptor protocol that can be accessed through the class. Setting and deleting the descriptor through the class actually changes what the variable is. For example, if we tried to assign the numeric value 2 to a descriptor variable using the class, we would not access the descriptors __set__ method. Instead we would change the type of the variable from a descriptor to an integer with the value of 2.

A Simple Descriptor Example

A simple “transparent” descriptor example can be found in Listing 1. The first thing to notice in the code is that both the SimpleDescriptor and MyClass classes are “new-style” classes because they are derived from object. This is important because, as mentioned above, descriptors only work with “new-style” classes. The second point to notice is that the descriptor has class scope as opposed to instance scope. There are also two extra print statements included in the code. They are there to let us follow the execution a little more easily.

Listing 1

class SimpleDescriptor(object):

    def __get__(self, instance, owner):
        # Check if the value has been set
        if (not hasattr(self, "_value")):
            raise AttributeError
        print "Getting value: %s" % self._value
        return self._value

    def __set__(self, instance, value):
        print "Setting to %s" % value
        self._value = value

    def __delete__(self, instance):
        del(self._value)

class MyClass(object):
    my_value = SimpleDescriptor()

Using Listing 1 to execute the following code:

my_instance = MyClass()
my_instance.my_value = 416
print my_instance.my_value

The output would be the following:

Setting to 416
Getting value: 416
416

As you can see the second line (my_instance.data_descriptor = 416) calls the __set__ method and sets the _value attribute. When we call print my_instance.data_descriptor the __get__, method is called and the _value attribute is returned.

The Problem with the Simple Example

The previous example may look like a perfectly good descriptor, but there is something wrong with it. Take a look at what happens when we runs this new code:

my_instance = MyClass() #Create the first instance
my_instance.my_value = 416 #Set its' value
my_second_instance = MyClass() #Create the second instance
my_second_instance.my_value = 204 #Set its' value
print my_instance.my_value #What was the fist instance's value?

We get the following output:

Setting to 416
Setting to 204
Getting value: 204
204

Notice that when we set the second instance’s my_value descriptor to be 204, we are also setting the first instances. This is because my_value has class scope, so both instances (and the class itself) share the same instance of the SimpleDescriptor class. Since SimpleDescriptor only stores one value, they all actually share the same value. We will get the same results if we check what the classes value of my_value is:

my_instance = MyClass() #Create the first instance
my_instance.my_value = 416 #Set its' value
my_second_instance = MyClass() #Create the second instance
my_second_instance.my_value = 204 #Set its' value
print MyClass.my_value #What's the classes value?

We get the following results:

Setting to 416
Setting to 204
Getting value: 204
204

In the last line of the code (print MyClass.my_value #What's the classes value?) we simply use the class (ignoring both instances) in order to get the value of my_data. This will be an instance where the __get__ function will be called with the instance parameter set to None.

Note: This is not a problem if you want to share the value across all instances.

The Solution to the Problem

In order to get around this issue, you have to remember that if you want values unique to each instance your descriptors must store values that are unique to each instance. This can be in the instance itself, in dictionary in the descriptor, or perhaps in a text file. Just make sure that the value is unique to each instance. Though it seems like a simple solution, in practice a suitable solution for all cases is difficult to implement. As a result you should probably pick a specific implementation for your specific situations.

Using each instance as a key, one can store the value in a dictionary in the descriptor itself. The problem with this solution should be obvious to programmers familiar with Python dictionaries: only immutable types can be used as keys. This is fine if you know what object you are working with, but what about in the future when you want to add a descriptor to your sub-classed list?

Another solution is to store the value in the instance itself. You can do this by easily adding the value to the instance’s __dict__. The limitation is that the descriptor need to be given a suitable key so that there is no collision with anything already in the instance’s __dict__.

Listing 2 shows a solution to the problem from the previous section where the value is stored in the instance’s __dict__. Values are indexed using a key name provided during the creation of the descriptor. Aside from where we store the value, there is little difference between Listing 1 and Listing 2. Notice that the value name is a converted into a string. This is done to ensure that it can be used as a key.

Listing 2

class FixedDescriptor(object):

def __init__(self, value_name):
    self.value_name = str(value_name)

def __get__(self, instance, owner):
    if (instance is None):
        raise AttributeError
    elif (not instance.__dict__.has_key(self.value_name)):
        raise AttributeError
    return instance.__dict__[self.value_name]

    def __set__(self, instance, value):
        print "Setting to %s" % value
        instance.__dict__[self.value_name] = value

    def __delete__(self, instance):
        if (instance.__dict__.has_key(self.value_name)):
            del(instance.__dict__[self.value_name])


class MyClass(object):
    my_value = FixedDescriptor("__value")

The major usage difference between Listing 1 and Listing 2 is the fact that the __get__ method no longer works at class level; it only works at instance level. This should be obvious since this solution stores the value in the instance. If there is no instance, we have nowhere to store the value! If you attempt to access the descriptor at class level (the first if statement) an attribute error will be raised:

def __get__(self, instance, owner):
    if (instance is None):
        raise AttributeError
    elif (not instance.__dict__.has_key(self.value_name)):
        raise AttributeError
    return instance.__dict__[self.value_name]

Easy Data Descriptors with Properties

The descriptor provided in Listing 2 will work for many situations but for my money the easiest way to implement data descriptors is to use the property type. I would recommend using it unless you need the same descriptor across many different classes or attributes (i.e. for type validation). Properties can be thought of as a simple and easy way to create data descriptors. The property type implements the descriptor protocol and gets around the problem of where to store the descriptor value in an easy way: it lets the class that created the descriptor deal with it.

The full signature of the property function is as follows:

property(fget=None, fset=None, fdel=None, doc=None)

Where fget, fset, fdel are methods that will be called when the __get__, __set__, and __del__ members of the descriptor protocol are called. The doc parameter is a string that will be used as the docstring for the descriptor. If the doc parameter is not specified, the docstring of the fget method is used.

The signature of the fget, fset, fdel functions are as follows:

fget(instance)
fset(instance, value)
fdel(instance)

In the three signatures, instance is a reference to the object that owns the property attribute. This is the same instance that gets passed to the descriptor protocol. Since instance is the first parameter of each method, these are, for all intents and purposes, member methods of a class. The value parameter is the same as the value parameter that is passed to the __set__ method. It is what we are setting the attribute to.

The basic way to implement properties can be seen in Listing 3. As you can see, what we do for a property is very similar to what we did for our initial descriptor in Listing 1. We set up the three functions necessary for the descriptor protocol, and then use them to create our property attribute. Creating the property is very simple:

Listing 3

class MyClass(object):
    #Create the fget, fset, and fdel methods
    def __get_value(self):
        print "Getting value: %s" % self.__value
        return self.__value

    def __set_value(self, value):
        print "Setting to %s" % value
        self.__value = value

    def __del_value(self):
        del(self.__value)

    #Create the property
    my_value = property(__get_value
            , __set_value
            , __del_value
            , "This is my property")

    def __init__(self, value=0):
        self.my_value = value
my_value = property(__get_value
        , __set_value
        , __del_value
        , "This is my property")

Make Data Attributes Data-Descriptors

A common reason to use descriptors is to create read-only attributes. This means allowing callers to get (or access) an attribute’s value, but not allowing them to set its’ value. At first glance it seems as though one can accomplish this by using descriptors with only the __get__ method defined, i.e. a “non-data descriptor”.

While this seems to create a read-only descriptor attribute, it actually does nothing of the sort. Instead of using the descriptor’s __set__ function for the assignment operation (since no __set__ function is defined), the default Python assignment operation is used. This means that instead of stopping the assignment operation, a new attribute will be created in the instance (remember that descriptors have class scope) having the same name as the descriptor attribute and taking precedence in future operations.

This may be a bit confusing so let’s look at the example in Listing 4. It’s very much like our previous descriptor examples except it does not specify the __set__ and __del__ functions. Let’s look at what happens when we try to use this as though it were a read-only attribute:

Listing 4

class WonkyDescriptor(object):
    """This Descriptor isn't read only"""
    def __init__(self, value_name):
        self.value_name = str(value_name)

    def __get__(self, instance, owner):
        if (instance is None):
            raise AttributeError
        elif (not instance.__dict__.has_key(self.value_name)):
            raise AttributeError
        print "Getting a value"
        return instance.__dict__[self.value_name]

class MyClass(object):

    my_value = WonkyDescriptor("_value")

    def __init__(self, value):
        #initial value
        self._value = value
my_instance = MyClass(23) #Create the first instance
print my_instance.my_value
my_instance.my_value = "Don't set me!"
print my_instance.my_value

When we run this, we get the following output:

Getting a value
23
Don't set me!

As you can see we did not succeed in creating a read-only attribute, instead something else happened. The “Getting a value” string is printed out when we print out my_value for the first time. This means that we are accessing the value through the descriptor. We then set the value of the descriptor attribute and print out that new value. Since “Getting a value” isn’t printed out a second time, we know that the descriptor’s __get__ method is not accessed when the second print statement is called.

Let’s take a closer look at what is happening using the following code, which prints out the instance’s __dict__:

my_instance = MyClass(23) #Create the first instance
print my_instance.__dict__
my_instance.my_value = "Don't set me!"
print my_instance.__dict__

This results in the following, which explains what is happening:

{'_value': 23}
{'_value': 23, 'my_value': "Don't set me!"}

When we first access the __dict__ we see our value indexed by the _value key that we told the descriptor to use. Now look at the second __dict__. The old value is still there, but so is a new value my_value. When we assigned “Don’t set me!” to my_value, we didn’t overwrite the descriptor attribute at class scope, we created a new instance attribute! And it is not exactly a read-only attribute.

In order to create a read-only attribute, you need to create a data descriptor where the __set__ method raises an AttributeError exception. An easy way to do this is to create a property and only set the fget function. Instead of using the descriptor in Listing 4, we can construct a read-only attribute as follows:

def __get_value(self):
    return self._value
my_value = property(__get_value)

Conclusion

I hope that after reading this article you can see how powerful descriptors (especially data-descriptors) can be. Once you get the hang of them, they are a great way to implement “getters and setters” and read-only attributes. Data-Descriptors are also very useful for performing validation during the assignment operation, i.e. ensuring that a value remains a specific type, or in a specific form.

The more you play with descriptors and use them in your code, the more you’ll see how sophisticated and useful they can be. That being said, it’s important to remember that unless you have a good reason to use descriptors you probably don’t need them. There’s no need to sacrifice the readability of your code or the dynamism of Python just for the sake of using descriptors. But when you have a real reason for managed attributes, you’ll find that descriptors are more than up to the task.

[1] http://docs.python.org/ref/descriptor-invocation.html
[2] http://www.python.org/download/releases/2.2/descrintro/#property
[3] http://docs.python.org/ref/descriptor-invocation.html
[4] http://docs.python.org/ref/descriptors.html
[5] http://docs.python.org/ref/descriptors.html
[6] http://docs.python.org/ref/descriptors.html
[7] http://www.python.org/download/releases/2.2/descrintro/#property

selsine

del.icio.us del.icio.us

Leave a Reply

 

Popular Posts