16  Python Data Model

Python Data Model

Now that we understand classes, we can take a deeper look at how Python’s built in types work.

This allows us to better understand their usage, and create our own classes that work just like the built-ins.

We’ve seen that Python uses dunder methods to provide syntactical sugar, allowing us to use nicer syntax to write things like:

l = [1, 2, 3] * 2
l[4]

# instead of
l = list(1, 2, 3)
l = l.repeat(2)
l.get_item_at_index(4)

Python Data Model Docs: https://docs.python.org/3/reference/datamodel.html

Emulating Collections & Sequences

Collections

  • Have a length: len(obj)
  • Can be iterated over: for item in obj
  • Can query for membership: item in obj

Sequences

  • Everything a collection can do
  • Can be indexed: obj[0]
You Write… Python Calls…
len(obj) obj.__len__()
for item in obj obj.__iter__()
item in obj obj.__contains__(item)
obj[i] obj.__getitem__(i)
obj[i] = x obj.__setitem__(i, x)
del obj[i] obj.__delitem__(i)

Numeric Operators

You Write… Python Calls…
x + y x.__add__(y)
x - y x.__sub__(y)
x * y x.__mul__(y)
x / y x.__truediv__(y)
x // y x.__floordiv__(y)
x % y x.__mod__(y)
x ** y x.__pow__(y)
x @ y x.__matmul__(y)

Reverse / Reflected / Right Operators

These operators are called if the attempt to call on the left-hand-side fails with a NotImplemented or is not present.

You Write… Python Calls…
x + y y.__radd__(x)
x - y y.__rsub__(x)
x * y y.__rmul__(x)
x / y y.__rtruediv__(x)
x // y y.__rfloordiv__(x)
x % y y.__rmod__(x)
x ** y y.__rpow__(x)
x @ y y.__rmatmul__(x)

Comparison Operators

You Write… Python Calls…
x < y x.__lt__(y)
x <= y x.__le__(y)
x > y x.__gt__(y)
x >= y x.__ge__(y)
x == y x.__eq__(y)
x != y x.__ne__(y)

Building our StaticArray

To demonstrate, we’ll implement a sequence type seen in other languages known as a static array:

  • A static array is a sequence type where there is a fixed capacity to number of items the collection can hold.
  • Resizing of the array is not allowed after initialization.

We will define a class StaticArray that will allow use of built-in operators.

We’ll be able to use it like this:

>>> from static_array import StaticArray
>>> sa = StaticArray([1, 2, 3])
# should produce the following output:
>>> print(sa * 2) # repetition using *
[1, 2, 3, 1, 2, 3]
>>> print(sa[1])  # indexing using []
2

Dual-Purpose Constructor using isinstance

from collections.abc import Iterable

class StaticArray:
    def __init__(self, init_val, capacity=5):
        if isinstance(init_val, Iterable):
            self.items = list(init_val)
            self.capacity = len(self.items)
        else:
            self.items = [init_val] * capacity
            self.capacity = capacity
sa = StaticArray([1, 2, 3])
print(sa)
<__main__.StaticArray object at 0x104e56480>
sa = StaticArray(0, 5)
print(sa)
<__main__.StaticArray object at 0x104e56a80>

Adding a __repr__

class StaticArray:
    def __init__(self, init_val, capacity=5):
        if isinstance(init_val, Iterable):
            self.items = list(init_val)
            self.capacity = len(self.items)
        else:
            self.items = [init_val] * capacity
            self.capacity = capacity

    def __repr__(self):
        return f"StaticArray({self.items})"
sa = StaticArray([1, 2, 3])
print(sa)
StaticArray([1, 2, 3])
sa = StaticArray(0, 5)
print(sa)
StaticArray([0, 0, 0, 0, 0])

Collection & Sequence Methods

Here we’re using an underlying list, so our methods are quite simple:

class StaticArray:
    def __init__(self, init_val, capacity=5):
        if isinstance(init_val, Iterable):
            self.items = list(init_val)
            self.capacity = len(self.items)
        else:
            self.items = [init_val] * capacity
            self.capacity = capacity

    def __repr__(self):
        return f"StaticArray({self.items})"

    def __str__(self):
        return f"StaticArray({self.items})"

    def __len__(self):
        return self.capacity

    def __contains__(self, item):
        return item in self.items

    def __getitem__(self, index):
        if index >= self.capacity or index < -self.capacity:
            raise IndexError("Index out of range")
        return self.items[index]

    def __setitem__(self, index, val):
        if index >= self.capacity or index < -self.capacity:
            raise IndexError("Index out of range")
        self.items[index] = val

    def __delitem__(self, index):
        raise NotImplementedError("StaticArray does not support deletion")
sa = StaticArray([1, "hi", 3.14, True])
len(sa)
4
42 in sa
"hi" in sa
True
sa[3]
True
try:
    sa[42] = "hello"
except Exception as e:
    print(repr(e))
IndexError('Index out of range')

Iteration Revisited

Remember that we have iterables, and iterators.

Objects like lists, tuples, and strings are iterable.

To keep track of the position within a given iteration (for loop, calls to next), Python uses an iterator.

ll = [1, 2, 3, 4]
iterator = iter(ll)
print("iterator 1 next()", next(iterator))
print("iterator 1 next()", next(iterator))
iterator2 = iter(ll)
print("iterator 2 next()", next(iterator2))
print("iterator 1 next()", next(iterator))
iterator 1 next() 1
iterator 1 next() 2
iterator 2 next() 1
iterator 1 next() 3

To be iterable, a class needs an __iter__ method that returns an iterator.

An iterator is an object with a __next__ method that returns the next item in the iteration. It should raise StopIteration when there are no more items.

Common Pattern: If a class only needs to be iterable once, it can return itself as the iterator, thus fulfilling both roles.

for i in iterable:
    print(i)

iterator = iter(iterable)
while True:
    print(next(iterator))
class SimpleRange:
    def __init__(self, n):
        self.current = 0
        self.n = n

    def __iter__(self):
        print("iter has been called")
        return self

    def __next__(self):
        if self.current >= self.n:
            print("at the end")
            raise StopIteration
        else:
            print(f"next was called, moving {self.current} to {self.current+1}")
            self.current += 1
            return self.current - 1

    def __repr__(self):
        return f"SimpleRange({self.n}, current={self.current})"
sr = SimpleRange(3)
for i in sr:
    for j in sr:
        print(i, j)
iter has been called
next was called, moving 0 to 1
iter has been called
next was called, moving 1 to 2
0 1
next was called, moving 2 to 3
0 2
at the end
at the end
sr = SimpleRange(5)
siter = iter(sr)
print(siter)
iter has been called
SimpleRange(5, current=0)
siter is sr
True
next(siter)
print(siter)
next was called, moving 0 to 1
SimpleRange(5, current=1)

Iteration Advice

  1. Do not implement the __next__() in a class that should only be an iterable.
  2. In order to support multiple traversals, the iterator must be a seperate object.
  3. A common design pattern is to delegate iteration to a seperate class that is iterable.

For example, defining an StaticArrayIterator class that is in charge iterating through the objects within an StaticArray object.

# Adding __iter__ to StaticArray
class StaticArrayIterator:
    def __init__(self, values):
        self.values = values
        self.position = 0

    def __next__(self):
        if self.position >= len(self.values):
            raise StopIteration
        item = self.values[self.position]
        self.position += 1
        return item

    def __repr__(self):
        return f"iterating over {self.values}, at position {self.position}"


class StaticArray:
    def __init__(self, capacity, initial=None):
        self._items = [initial] * capacity
        self._capacity = capacity
        self._iter_position = 0

    @classmethod
    def from_iterable(self, iterable):
        new_array = StaticArray(len(iterable))
        for idx, item in enumerate(iterable):
            new_array._items[idx] = item
        return new_array

    def __repr__(self):
        # __repr__ is the unambiguous string representation
        # of an object
        return f"StaticArray({self._capacity}, {self._items})"

    def __str__(self):
        return repr(self._items)

    # Sequence Operations

    def __len__(self):
        return self._capacity

    def __contains__(self, x):
        return x in self._items

    def __getitem__(self, i):
        if i >= self._capacity or i < -self._capacity:
            raise IndexError  # an invalid index
        return self._items[i]

    def __setitem__(self, i, x):
        if i >= self._capacity or i < -self._capacity:
            raise IndexError  # an invalid index
        self._items[i] = x

    def __delitem__(self, i):
        raise NotImplementedError("Cannot delete from a static array")

    # Iterable Operations
    def __iter__(self):
        return StaticArrayIterator(self._items.copy())
sa = StaticArray(5, 2)
sa[0] = 1
sa[1] = 2
sa[2] = 3
sa[3] = 4
sa[4] = 5
print(sa)
for x in sa:
    for y in sa:
        print(x, y)
[1, 2, 3, 4, 5]
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5
4 1
4 2
4 3
4 4
4 5
5 1
5 2
5 3
5 4
5 5

Bonus: More Dunder Methods

Context Managers / with

We also saw this idea of needing to clean up after ourselves when we used with to open files.


with open(filename) as f:
    # do things with f
    g(f)
# f is guaranteed to be closed even if 
# exceptions are raised within with block
class DatabaseConnection:
    def __init__(self, username, password):
        # connect to database
        self.username = username
        self.password = password
        self.connected = True

    def __enter__(self):
        print("__enter__")
        # must return self!
        return self

    def __exit__(self, exc_type, exc_val, exc_traceback):
        print("__exit__")
        if exc_type:
            print("rolling back changes")
        self.connected = False

    def query(self, sql):
        print("ran query", sql)

    def __repr__(self):
        return f"Connection connected={self.connected}"
db = DatabaseConnection("hello", "world")
db.query("SELECT * FROM users;")

try:
    # do something dangerous
    1 / 0
except Exception as e:
    print(repr(e))

# our connection is possibly left in a broken state
print(db)
ran query SELECT * FROM users;
ZeroDivisionError('division by zero')
Connection connected=True
with DatabaseConnection("hello", "world") as db:
    # __enter__
    db.query("SELECT * from users;")
    1 / 0
    # __exit__
# changes were rolled back, and our connection is safe
db.connected
True

Callable Objects Examples

Functions have a few attributes like __name__ and __doc__ that we can use to introspect on them.

def add(x, y):
    """Adds two numbers"""
    return x + y


print(add.__name__)
print(add.__doc__)

x = add
add
Adds two numbers
x.__name__
'add'
class Example:
    def __init__(self, name):
        self.name = name
        self.num_calls = 0
    def __call__(self, *args):
        print(self.num_calls)
        self.num_calls += 1
        print(self.name, "got", args)

example = Example("one")
two = Example("two")
example(1, 2, 3)
0
one got (1, 2, 3)
two()
0
two got ()

They also have a __call__ method that allows us to make our own objects callable. For example:

class Memoized:
    def __init__(self, func):
        self.cache = {}
        self.wrapped_func = func

    def __call__(self, *args):
        if args not in self.cache:
            self.cache[args] = self.wrapped_func(*args)
        return self.cache[args]
@Memoized
def expensive_func(a, b, c):
    print("running expensive_func")
    return a + b + c

#expensive_func = Memoized(expensive_func)

print(expensive_func(1, 2, 3))
print(expensive_func(1, 2, 3))
running expensive_func
6
6
class PartialFunc:
    # simplified functools.partial

    def __init__(self, func, *args, **kwargs):
        self.func = func
        self.args = args
        self.kwargs = kwargs

    def __call__(self, *args, **kwargs):
        temp_kwargs = self.kwargs.copy()
        temp_kwargs.update(kwargs)
        return self.func(*(self.args + args), **temp_kwargs)

    @property
    def __name__(self):
        return f"{self.func.__name__}(args={self.args} kwargs={self.kwargs})"

    @property
    def __doc__(self):
        return self.func.__doc__
def add(x, y):
    """Adds two numbers"""
    return x + y

add_5 = PartialFunc(add, 5)
print(add_5(10))

print(add_5.__name__)
print(add_5.__doc__)
15
add(args=(5,) kwargs={})
Adds two numbers