Subclassing Built-In Types






Subclassing Built-In Types

There is one more twist in the stack and set story, before we move on to some more classical data structures. In recent Python releases, it is also possible to subclass built-in datatypes such as lists and dictionaries, in order to extend them. That is, because datatypes now look and feel just like customizable classes, we can code unique datatypes that are extensions of built-ins, with subclasses that inherit built-in tool sets. For instance, here are our stack and set objects coded in the prior sections, revised as customized lists (the set union method has been simplified slightly here to remove a redundant loop):

class Stack(list):
    "a list with extra methods"
    def top(self):
        return self[-1]

    def push(self, item):
        list.append(self, item)

    def pop(self):
        if not self:
            return None                 # avoid exception
        else:
            return list.pop(self)

class Set(list):
    " a list with extra methods and operators"
    def _ _init_ _(self, value=[]):      # on object creation
        list._ _init_ _(self)
        self.concat(value)

    def intersect(self, other):         # other is any sequence type
        res = []                        # self is the instance subject
        for x in self:
            if x in other:
                res.append(x)
        return Set(res)                 # return a new Set

    def union(self, other):
        res = Set(self)                 # new set with a copy of my list
        res.concat(other)               # insert uniques from other
        return res

    def concat(self, value):            # value: a list, string, Set...
        for x in value:                 # filters out duplicates
           if not x in self:
                self.append(x)

    # len, getitem, iter inherited, use list repr
    def _ _and_ _(self, other):   return self.intersect(other)
    def _ _or_ _(self, other):    return self.union(other)
    def _ _str_ _(self):          return '<Set:' + repr(self) + '>'

class FastSet(dict):
    pass    # this doesn't simplify much

The stack and set implemented in this code are essentially like those we saw earlier, but instead of embedding and managing a list, these objects really are customized lists. They add a few additional methods, but they inherit all of the list object's functionality.

This can reduce the amount of wrapper code required, but it can also expose functionality that might not be appropriate in some cases. In the following self-test code, for example, we're able to sort and insert into stacks and reverse a set, because we've inherited these methods from the list object. In most cases, such operations don't make sense for the data structures in question, and the wrapper class approach of the prior sections may still be preferred:

def selfTest( ):
    # normal use cases
    stk = Stack( )
    print stk
    for c in 'spam': stk.push(c)
    print stk, stk.top( )
    while stk: print stk, stk.pop( )
    print stk, stk.pop( )

    print
    set = Set('spam')
    print set, 'p' in set
    print set & Set('slim')
    print set | 'slim'
    print Set('slim') | Set('spam')

    # downside? these work too
    print
    stk = Stack('spam')
    print stk
    stk.insert(1, 'X')     # should only access top
    print stk
    stk.sort( )             # stack not usually sorted
    print stk

    set = Set('spam')
    set.reverse( )          # order should not matter
    print set, set[1]

if _ _name_ _ == '_ _main_ _':
    selfTest( )

When run, this module produces the following results on standard output; we're able to treat the stack and set objects like lists, whether we should or not:

[]
['s', 'p', 'a', 'm'] m
['s', 'p', 'a', 'm'] m
['s', 'p', 'a'] a
['s', 'p'] p
['s'] s
[] None

<Set:['s', 'p', 'a', 'm']> True
<Set:['s', 'm']>
<Set:['s', 'p', 'a', 'm', 'l', 'i']>
<Set:['s', 'l', 'i', 'm', 'p', 'a']>

['s', 'p', 'a', 'm']
['s', 'X', 'p', 'a', 'm']
['X', 'a', 'm', 'p', 's']
<Set:['m', 'a', 'p', 's']> a

Subclassing built-in types has other applications, which may be more useful than those demonstrated by the preceding code. Consider a queue, or ordered dictionary, for example. The queue could take the form of a list subclass with get and put methods to insert on one end and delete from the other; the dictionary could be coded as a dictionary subclass with an extra list of keys that is sorted on insertion or request. Similarly, the built-in Python bool Boolean datatype is implemented as a subclass that customizes the integer int with a specialized display format (true is like integer 1, but it prints itself as the word True).

You can also use type subclassing to alter the way built-in types behave; a list subclass could map indexes 1..N to built-in indexes 0..N-1, for instance:

# map 1..N to 0..N-1, by calling back to built-in version

class MyList(list):
    def _ _getitem_ _(self, offset):
        print '(indexing %s at %s)' % (self, offset)
        return list._ _getitem_ _(self, offset - 1)

if _ _name_ _ == '_ _main_ _':
    print list('abc')
    x = MyList('abc')               # _ _init_ _ inherited from list
    print x                         # _ _repr_ _ inherited from list
    print x[1]                      # MyList._ _getitem_ _
    print x[3]                      # customizes list superclass method
    x.append('spam'); print x       # attributes from list superclass
    x.reverse( );      print x

% python typesubclass.py
['a', 'b', 'c']
['a', 'b', 'c']
(indexing ['a', 'b', 'c'] at 1)
a
(indexing ['a', 'b', 'c'] at 3)
c
['a', 'b', 'c', 'spam']
['spam', 'c', 'b', 'a']

This works, but it is probably not a good idea in general. It would likely confuse its usersthey will expect Python's standard 0..N-1 indexing, unless they are familiar with the custom class. As a rule of thumb, type subclasses should probably adhere to the interface of the built-in types they customize.

The New Built-In Set Datatype

Python has a way of conspiring to make my book examples obsolete over time. Beginning in version 2.4, the language sprouted a new built-in set datatype, which implements much of the set functionality that we coded in the set examples of this chapter (and more). It is implemented with some of the same algorithms we used, but it is available on all Pythons today.

Built-in set usage is straightforward: set objects are created by calling the new built-in set function, passing in an iterable/sequence for the initial components of the set (sets are also available in 2.3, but the set creation call must be imported from a module):

>>> x = set('abcde')
>>> y = set('bdxyz')
>>> x
set(['a', 'c', 'b', 'e', 'd'])

>>> 'e' in x                            # membership
True
>>> x - y                               # difference
set(['a', 'c', 'e'])
>>> x | y                               # union
set(['a', 'c', 'b', 'e', 'd', 'y', 'x', 'z'])
>>> x & y                               # intersection
set(['b', 'd'])

Interestingly, just like the dictionary-based optimized set we coded, built-in sets are unordered and require that all set components be hashable (immutable). In fact, their current implementation is based on wrapped dictionaries. Making a set with a dictionary of items works, but only because set uses the dictionary iterator, which returns the next key on each iteration (it ignores key values):

>>> x = set(['spam', 'ham', 'eggs'])   # list of immutables
>>> x
set(['eggs', 'ham', 'spam'])

>>> x = set([['spam', 'ham'], ['eggs']])
Traceback (most recent call last):
  File "<pyshell#30>", line 1, in -toplevel-
    x = set([['spam', 'ham'], ['eggs']])
TypeError: list objects are unhashable

>>> x = set({'spam':[1, 1], 'ham': [2, 2], 'eggs':[3, 3]})
>>> x
set(['eggs', 'ham', 'spam'])

Built-in sets also support operations such as superset testing, and they come in two flavors: mutable and frozen (and thus hashable, for sets of sets). For more details, see the set type in the built-in types section of the Python library manual.

The set examples in this chapter are still useful as demonstrations of general data structure coding techniques, but they are not strictly required for set functionality in Python today. In fact, this is how Python tends to evolve over time: operations that are coded manually often enough wind up becoming built-in tools. I can't predict Python evolution, of course, but with any luck at all, the 10th edition of this book might be just a pamphlet.




 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows