ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinA Lifetime In Python

Overload Journal #133 - June 2016 + Programming Topics   Author: Steve Love
Resource management is important in any language. Steve Love demonstrates how to use context managers in Python.

Variables in Python generally have a lifetime of their own. Or rather, the Python runtime interpreter handles object lifetime with automated garbage collection, leaving you to concentrate on more important things. Like Resource lifetime, which is much more interesting.

Python provides some facilities for handling the deterministic clean-up of certain objects, because sometimes it’s necessary to know that it has happened at a specific point in a program. Things like closing file handles, releasing sockets, committing database changes – the usual suspects.

In this article I will explore Python’s tools for managing resources in a deterministic way, and demonstrate why it’s easier and better to use them than to roll your own.

Why you need it

Python, like many other languages, indicates runtime errors with exceptions, which introduces interesting requirements on state. Exceptions are not necessarily visible directly in your code, either. You just have to know they might occur. Listing 1 shows a very basic (didactic) example.

def addCustomerOrder( dbname, customer, order ):
  db = sqlite3.connect( dbname )               (1)
  db.execute( 'INSERT OR REPLACE INTO customers \
    (id, name) VAlUES (?, ?)', customer )      (2)
  db.execute( 'INSERT INTO orders (date, custid,\
    itemid, qty) VALUES (?, ?, ?, ?)', order ) (3)
  db.commit()                                  (4)
  db.close()                                   (5)
			
Listing 1

If an exception occurs between lines (1) and (3), the data won’t get committed to the database, the connection to the database will not be closed, and will therefore ‘leak’. This could be a big problem if this function or other functions like it get called frequently, say as the backend to a large web application. This wouldn’t be the best way to implement this in any case, but the point is that the db.execute() statement can throw all kinds of exceptions.

You might then try to explicitly handle the exceptions, as shown in Listing 2, which ensures the database connection is closed even in the event of an exception. Closing a connection without explicitly committing changes will cause them to be rolled back.

def addCustomerOrder( dbname, customer, order ):
  db = sqlite3.connect( dbname )
  try:
    db.execute( 'INSERT OR REPLACE \
      INTO customers \
      (id, name) VAlUES (?, ?)', customer )
    db.execute( 'INSERT INTO orders \
      (date, custid, itemid, qty) \
      VALUES (?, ?, ?, ?)', order )
    db.commit()
  finally:
    db.close()
			
Listing 2

It is a bit messy, and introduces some other questions such as: what happens if the sqlite3.connect method throws an exception? Do we need another outer-try block for that? Or expect clients of this function to wrap it in an exception handler?

Fortunately, Python has already asked, and answered some of these questions, with the Context Manager. This allows you to write the code shown in Listing 3.

def addCustomerOrder( dbname, customer, order ):
  with  sqlite3.connect( dbname ) as db:
    db.execute( 'INSERT OR REPLACE \
      INTO customers \
      (id, name) VAlUES (?, ?)', customer )
    db.execute( 'INSERT INTO orders \
      (date, custid, itemid, qty) \
      VALUES (?, ?, ?, ?)', order )
  db.close()
			
Listing 3

The connection object from the sqlite3 module implements the Context Manager protocol, which is invoked using the with statement. This introduces a block scope, and the Context Manager protocol gives objects that implement it a way of defining what happens when that scope is exited.

In the case of the connection object, that behaviour is to commit the (implicit, in this case) transaction if no errors occurred, or roll it back if an exception was raised in the block.

Note the explicit call to db.close() outside of the with statement’s scope. The only behaviour defined for the connection object as Context Manager is to commit or roll back the transaction when the scope is exited. This construct doesn’t say anything at all about the lifetime of the db object itself. It will (probably) be garbage collected at some indeterminate point in the future.

You can do it too

This customer database library might have several functions associated with it, perhaps including facilities to retrieve or update customer details, report orders and so on. Perhaps it’s better represented as a type, exposing an interface that captures those needs. See Listing 4 for an example.

class Customers( object ):
  def __init__( self, dbname ):
    self.db = sqlite3.connect( dbname )

  def close( self ):
    self.db.close()

  def addCustomerOrder( self, customer, order ):
    self.db.execute( 'INSERT OR REPLACE \
      INTO customers (id, name) \
      VAlUES (?, ?)', customer )
    self.db.execute( 'INSERT INTO orders \
      (date, custid, itemid, qty) \
      VALUES (?, ?, ?, ?)', order )

  # Other methods...

with Customers( dbname ) as db:
    db.addCustomerOrder( customer, order )
db.close()
			
Listing 4

Unfortunately, the line containing the with statement provokes an error similar to this:

    File "customerdb.py", line 21, in <module>
      with Customers( dbname ) as db:
  AttributeError: __exit__

You can’t use with on just any type you create. It’s not a magic wand, either: the changes won’t get committed to the database if commit() is not called! However, the Context Manager facility isn’t limited to just those types in the Python Standard Library. It’s implemented quite simply, as seen in Listing 5.

class Customers( object ):
  def __init__( self, dbname ):
    self.dbname = dbname

  def __enter__( self ):
    self.db = sqlite3.connect( self.dbname )
    return self

  def __exit__( self, exc, val, trace ):
    if exc:
      self.db.rollback()
    else:
      self.db.commit()
    return None

  def close( self ):
    self.db.close()

  def addCustomerOrder( self, customer, order ):
    self.db.execute( 'INSERT OR REPLACE \
      INTO customers (id, name) \
      VAlUES (?, ?)', customer )
    self.db.execute( 'INSERT INTO \
      orders (date, custid, itemid, qty) \
      VALUES (?, ?, ?, ?)', order )

  # Other methods...

with Customers( dbname ) as db:
  db.addCustomerOrder( customer, order )
db.close()
			
Listing 5

The __init__() method is still there, but just saves the name away for later use. When the with statement is executed, it calls the object’s __enter__() method, and binds the return to the as clause if there is one: in this case, the db variable. The main content of the original construction method has been moved to the __enter__() method. Lastly, when the with statement block scope is exited, the __exit__() method of the managed object is called. If no exceptions occurred in the block, then the three arguments to __exit__() will be None. If an exception did occur, then they are populated with the type, value and stack trace object associated with the exception. This implementation essentially mimics the behaviour of the sqlite3 connection object, and rolls back if an exception occurred.

Returning a false-value indicates to the calling code that any exception that occurred inside the with block should be re-raised. Returning None counts – and is only explicitly specified here for the purposes of explaining it. A Python function with no return statement is implicitly None. Returning a true-value indicates that any such exception should be suppressed.

Object vs. Resource Lifetime

These two concepts are frequently, and mistakenly, used interchangeably. The lifetime of an object is the time between its creation and its destruction, which is usually the point at which its memory is freed. The lifetime of the resource is tied to neither of those things, although it’s very often sensible to associate the object with its resource when the object is created (i.e. in its __init__() method). You cannot know, for all intents and purposes, when the object lifetime ends, but you can know – and can control – when the resource lifetime ends. Python’s Context Manager types and the associated with statement give you that control.

You may have heard that Python objects can have a destructor – the __del__() method. This special method is called when the object is garbage collected, and it allows you to perform a limited amount of last-chance cleanup. A common misapprehension is that invoking del thing will call the __del__() method on thing if it’s defined. It won’t.

Consistent convenience

Having to explicitly close the connection after the block has exited is a bit of a wart. We could decide that our own implementation of the __exit__() method invokes close() on the connection object having either committed or rolled back the changes, but there is a better way.

The contextlib module in the Python Standard Library provides some convenient utilities to help with exactly this, including the closing function, used like this:

  from contextlib import closing
  with closing( Customers( dbname ) ) as db:
    db.addCustomerOrder( customer, order )

It will automatically call close() on the object to which it’s bound when the block scope is exited.

Python File objects also have a Context Manager interface, and can be used in a with statement too. However, their behaviour on exit is to close the file, so you don’t need to use the closing utility for file objects in Python.

  with open( filename ) as f:
    contents = f.read()

So much for consistency! It’s a little odd having to know the internal behaviour of a given type’s Context Manager implementation (and the documentation isn’t always clear on which types in the Standard Library are Context Managers), but sometimes the price of convenience is a little loss of consistency.

To reiterate the point about lifetime, even though the connection and file objects in the previous two examples have been closed, the lifetimes of the objects has not been affected.

When one isn’t enough

Sometimes it’s useful to associate several resources with a single Context Manager block. Suppose we want to be able to import a load of customer order data from a file into the database using the facility we’ve already made.

In Python 3.1 and later, this can be achieved like this:

  with closing( Customers( dbname ) ) as db,  \
    open( 'orders.csv' ) as data:
    for line in data:
      db.addCustomerOrder( parseOrderData( line ) )

If you’re stuck using a version of Python earlier than that, you have to nest the blocks like this:

with closing( Customers( dbname ) ) as db:
  with open( 'orders.csv' ) as data:
    for line in data:
      db.addCustomerOrder( parseOrderData( line ) )

Either syntax gets unwieldy very quickly with more than two or three managed objects. One approach to this is to create a new type that implements the Context Manager protocol, and wraps up multiple resources, leaving the calling code with a single with statement on the wrapping type, as shown in Listing 6.

class WrappedResources( object ):
  def __init__( self, dbname, filename ):
    self.dbname = dbname
    self.filename = filename

  def __enter__( self ):
    self.db = sqlite3.connect( self.dbname )
    self.data = open( self.filename )

  def __exit__( self, *exceptions ):
    if not any( exceptions ): self.db.commit()

  def close( self ):
    self.data.close()
    self.db.close()

  def addCustomerOrder( customer, order ):
    pass # do the right thing here

with closing( WrappedResource( dbname, fname ) ) \
  as res:
  for line in res.data:
    res.addCustomerOrder( parseOrderData( line ) )
			
Listing 6

That really is a little clunky, however you look at it, since it’s fairly obvious that the class has multiple responsibilities, and exposes the managed objects publicly, amongst other things. There are better ways to achieve this, and we will return to this shortly.

Common cause

Having implemented a (basic) facility to import data from a file to our database, we might like to extend the idea and optionally read from the standard input stream. A simple protocol for this might be to read sys.stdin if no filename is given, leading to code like this:

  with options.filename and \
    open( options.filename ) or sys.stdin as input:
    # do something with the data

That’s all very well, but is a little arcane, and closing the standard input handle when it completes might be considered bad manners. You could go to all the bother of reinstating the standard input handle, or redirecting it some other way, but that too seems more complicated than what is required.

Python’s contextlib module has another handy utility to allow you to use a generator function as a Context Manager, without going to the trouble of creating a custom class to implement the protocol. It is used to decorate a function, which must yield exactly one value to be bound to the as clause of a with statement. Actions to perform when the block is entered are put before the yield, actions to perform when the block is exited are put after the yield. It follows the basic pattern shown in Listing 7:

(1) will be called when the with statement is entered. It’s the equivalent of the __enter__() method

(2) will be called when the block is exited. It’s the equivalent of the __exit__() method

import contextlib

@contextlib.contextmanager
def simpleContext():
  doPreActionsHere()          (1)
  yield managed_object
  doPostActionsHere()         (2)
			
Listing 7

This allows us to define a couple of factory functions for our inputs, as shown in Listing 8.

import contextlib

def openFilename():
  return open( options.filename )

@contextlib.contextmanager
def openStdIn():
  yield sys.stdin

opener = options.filename and openFilename \
  or openStdIn
with opener() as f:
  pass   # Use f
			
Listing 8

Since opening a ‘real’ file returns an object that is already a Context Manager, the function for that isn’t decorated. Likewise, since we do not want to perform any action on the sys.stdin object on exit, that function has no behaviour after the yield.

It should be clear that the Context Manager protocol is more general purpose than just for performing some clean-up action when leaving a scope. Exception safety is the primary purpose of the Context Managers, but the __enter__() and __exit__() methods can contain any arbitrary behaviour, just as the decorated function can perform any actions before and after the yield statement. Examples include tracking function entry and exit, and logging contexts such as those Chris Oldwood shows in C# [Oldwood].

Many and varied

As previously mentioned, it’s sometimes necessary to manage multiple resources within a single block. Python 3.1 and later support this by allowing multiple Context Manager objects to be declared in a single with statement, but this becomes cluttered and unmanageable quickly. You can, as we demonstrated, create your own Context Manager type, but that too can be less than ideal. Once again, Python 3.3 answers the question with another contextlib utility, the ExitStack.

It manages multiple Context Manager objects, and allows you to declare them in a tidy (and indentation-saving) manner. See Listing 9.

with contextlib.ExitStack() as stack:
  f = stack.enter_context( open( \
    options.filename ) )
  db = stack.enter_context( sqlite3.connect( \
    options.dbname ) )
			
Listing 9

Objects have their __exit__() method called, in the reverse order to which they were added, when the block is exited.

The ExitStack can manage a runtime-defined collection of context managers, such as this example taken directly from the Python 3.4 documentation [Python]:

  with ExitStack() as stack:
    files = [ stack.enter_context( open( fname ) ) \
    for fname in filenames ]
    # All opened files will automatically be closed
    # at the end of the with statement, even if
    # attempts to open files later in the list raise
    # an exception

Conclusion

Python’s Context Managers are a convenient and easy-to-use way of managing Resource Lifetimes, but their utility goes beyond that, due to the flexible way they are provided. The basic idea is not a new one – even in Python, where it was first introduced in version 2.5 – but some of these facilities are only available in later versions of the language. The examples given here were tested using Python 3.4.

Exception safety facilities like the Python Context Manager are common to many languages that feature the use of exceptions to indicate errors, because this introduces the need for some local clean-up in the presence of what is (in effect) a non-local jump in the code. They are, however, useful for things beyond this need, and Python provides several useful utilities to help manage the complexity this brings.

References

[Python] Python 3.4 Documentation. https://docs.python.org/3.4/library/contextlib.html

[Oldwood] Oldwood, Chris. Causality, http://chrisoldwood.com/articles/causality.html

Overload Journal #133 - June 2016 + Programming Topics