Objects In Databases

Overview

As Object Oriented (OO) practitioners we deal with objects. The perennial question of how to make our objects persistent has been addressed in many ways. More often than not persistence is achieved using traditional means - we stream binary data out to a file, preceded by some identifier that identifies the type/class of the object being persisted.

This is fine for data structures that are used as entities, but where we are interested only in subsets of large volumes of data this becomes an unwieldy mechanism (especially if the subsets need to be determined dynamically). It is at this level that database concepts come to the fore. We would like to apply Relational Database (RDBMS) concepts to our objects: cursors (for iterating), SQL (for filtering, ordering and amending data) etc.

Impedance Mismatch

Historically RDBMSs store data in tables that have a fixed number of columns. We could use one column per attribute and have a separate row for each object instance, but this makes it difficult to flexibly store object hierarchies that make use of inheritance and polymorphism.

For example, imagine you have a base class Fruit, from which you derive Apple and Orange. There are a number of possible ways of using tables to represent this:

Store all instances of objects derived from Fruit in a Fruit Table. In this case you may have a sparse table - not all columns (which represent attributes) will apply to all subclasses of fruit.
Store instances of each concrete class in a separate table. In this case you have two tables - an Apple table and an Orange table. However it is now difficult to iterate over all Fruit, because you would need two cursors.
Store attributes introduced by each class in a separate table. Now you have three tables - Fruit, Apple and Orange. However you now need to perform a join operation to reconstruct Apples or Oranges (to combine the data with attributes from the Fruit table). No matter how you address these problems you have to convert between an in-memory representation of objects and persistent rows in database table(s). This is what is called Impedance Mismatch (IM to some!).

Apart from these storage issues there are some other important issues

Object identity. Relational databases typically use Primary Keys to uniquely distinguish rows within a table (these keys are also used to implement links between objects). Objects do not necessarily have unique attributes - we might establish identity in a C++ program by comparing the address of the object, for example.
Aggregation. Objects often contain other objects or collections of objects. How would we map this to relational tables?
Querying. When dealing with objects we often want to know about its relationship to other objects - how would we do this with SQL.
Schema evolution. We may need to change the object model as part of ongoing maintenance or enhancements - how would we do this?

Solutions

There appear to be roughly four (overlapping) solution areas:

Provide a custom mapping layer between the OO application and the RDBMS.
Use a 3rd party Object Relational (OR) mapping tool.
Use an Object Oriented Database (ODBMS)
Use one of the next generation Object/Relational Databases (ORDBMS) a.k.a. Universal Databases.

Custom Mapping

It is possible to create your own mapping layer. In this way you will be able to configure/customise each class mapping to best fit your application's usage model, but you will expend a great deal of effort. Typically a project that undertakes its own mapping layer can expect to expend around 40% of all development and maintenance effort on the mapping layer.

Further, many of the features that commercial products offer would be missing, such as an object-aware query language.

A good text to consult if you are considering this route is [ HEIN97 ]

OR Mapping Tools

There are a number of tools available commercially that attempt to automate the mapping process above. Through source code 'annotations' (such as macros) the developer can identify which classes are persistent-capable, and which pointers refer to persistent-objects. Some OR mapping tools generate a unique identifier for each persistent object, but others require the application to provide one (this is necessary to maintain object identity).

Classes can be persistent either dependently or independently - that is to say some objects may be persistent in their own right, while other objects will be persistent only as aggregates/composites of an independently persistent object.

Some common features are :

The persistent pointer constructs are often implemented as smart-pointer technology, thus saving the cost of database access if the object is not referenced.
Most OR Mapping tools also provide some persistent collections and iterators.
A mapping generation tool parses the source code (or IDL) to generate the mapping layer. For some tools it is possible to customise the mapping process, specifying which table model to use for specific classes.
The runtimes typically provide some caching functionality.

The mapping layer may interact with the RDBMS through native drivers or some generic interface, such as ODBC. The native drivers should give better performance, but there is the maintenance risk of whether the OR Mapping provider will continue to track releases of your chosen RDBMS.

The sophistication of the OR Mapping tools varies markedly. Some require little extra coding, others require lots; some have visual tools for customising the mapping, others have source file based solutions; some have great problems handling polymorphism intelligently, others deal with it in their stride.

OR Mapping tools that are produced by companies that also market OODBMSs, may provide a unified API. This allows you to develop to the API and decide at a later date whether to deploy on an RDBMS, an OODBMS, or both. (The OR Mapping API available will most likely be a subset of the full OODBMS API, since some functionality is simply not implementable on top of an RDBMS).

OODBMS

Solving the Impedance Mismatch problem led to the development of dedicated object-oriented databases (OODBMS). In most of these systems the developer essentially does little more than specify which classes can be persistent. (In fact the developer also has to replace various constructs with equivalent constructs provided by the database library - such as pointers to, or collections of, persistent objects).

Each database vendor developed its own methods of specifying persistence, along with widely variant feature sets. Since OODBMSs are typically used as solutions to specific problems there was initially no standardisation of object models or APIs among the vendors. This hindered the adoption of OODBMSs so much that the Object Database Management Group (ODMG), an independent standards organisation, was set up in the early 1990s. The members were mainly the vendors themselves so the ODMG standards are more OMG-like than ISO-like - that is they specify a lot of functionality, much of which is optional. This allows most OODBMS vendors to claim some measure of ODMG compliance.

Almost all OODBMSs are based on a Client/Server architecture (even when both Client and Server reside on the same physical machine). The user application generally accesses an object by starting from a 'known' object and then navigating a network of relationships from that object - 'known' objects are specified/named by the application code.

Object identity is generally preserved by an Object Identifier (OID). This is a unique identifier that is generated by the OODBMS when the object is created, and is used to represent links (pointers) between persistent objects. Some OIDs are simply unique numbers, but page serving OODBMSs (see below) often encode object location within the OID. This makes it difficult to move the object once it has been created.

Although all OODBMSs have architectural differences (idiosyncracies) there are two basic philosophies - Page Server and Object Server.

Page Server OODBMS are often based upon more conventional storage technologies. The OODBMS Server retrieves data in pages, so the clustering of objects on a page becomes important. Also, locking is typically performed at the page level, which can limit concurrent data access. Because the servers deal in pages they are typically 'thin' servers - they perform minimal processing on the objects contained in a page.
Object Server OODBMS are object aware. The physical location of an object is not important and locking is performed at the object level. However, the finer granularity of access and locking does incur some overhead that can impair performance. Object Servers are typically 'fatter' servers - they may be able to perform some object processing on the server.

The decision whether to choose a Page or Object Server will depend on understanding the data model of your aplication. If the patterns of data access are predictable and an efficient clustering mechanism can be specified then it is likely that a Page Server may perform better. If access patterns are unpredictable it will be difficult to develop a clustering strategy and an Object Server may be a better choice. The lock granularity may also be a crucial element in your choice. It is important to develop your object model first before benchmarking some candidate database solutions - only then can you tell how a particular product will perform.

ORDBMS

This is the area that I can say least about, simply because I know least about it.

The prevalence of OO methodologies and languages has given rise to many OODBMSs and OR Mapping tools. This hasn't escaped the notice of the large RDBMS vendors and they, quite rightly, have attempted to address this new area of technology. To get a view on the discussion (albeit an old view) it's worth having a look at the Third-Generation Database System Manifesto [ STON90 ] (a rebuttal to the Object-Oriented Database System Manifesto [ ATKI89 ] - and also [ MAIE91 ]).

Various OR technologies have been incorporated into household/enterprise RDBMS. I am sure they have achieved some measure of success, however there is as yet no standard, so a solution targeted at one RDBMS will not be portable to others. There is a revamp of SQL-92 on the way, called SQL3, and it will have an object flavour. Various tradeoffs have been made to make it compatible with traditional RDBMSs and its design is at odds with that put forward by the ODMG. It must be stressed though that SQL3 is still some way from becoming an accepted standard.

However, due to the huge market penetration and acceptance of RDBMS products many analysts predict that ORDBMSs (derived from existing leading RDBMSs) will increasingly sideline OODBMSs [ MCCL97 ]

Sources Of Information

The Object Data Management Group: ( http://www.odmg.org/ ) is an independent standards organisation. It recently broadened its charter to include all aspects of object data persistence.
Doug Barry & Associates: ( http://www.odbmsfacts.com/ ) publish the Object Database Fact Book. It comes in three volumes each covering one of OODBMS, ORDBMS and OR Mapping Tools. The book ships in elctronic format only and each volume costs over $1500. The books are comprised almost exclusively of checklist tables. The entries are not always well described and can be confusing. The Object Database Handbook [ BARR96 ] describes some of the issues in greater detail.
Dr. Akmal B. Chaudri: ( http://www.soi.city.ac.uk/~akmal/html.dir/home.html ) is an academic and a consultant. His home page has many useful links.
Comp.objects.databases: ( news:comp.object.database ) is a low volume, low noise USENET group.

Bibliography

[ATKI89] The Object-Oriented Database System Manifesto , Malcolm Atkinson et al., ALTAIR Tech Report No. 30-89

[BARR96] The Object Database Handbook , Douglas K. Barry, Wiley,0-471-14718-4

[CATT97] The Object Database Standard: ODMG 2.0, ed . R.G.G. Cattell & Douglas K. Barry, Morgan Kaufmann, 1-55860-463-4

[CHAU98] Object Databases in Practise , Akmal B. Chaudri, HP/Prentice-Hall, 0-13-899725-X

[COOP97] Object Databases: An ODMG Approach , Richard Cooper, International Thomson Computer Press, 1-85032-294-5

[EMBLE98] Object Database Development (Concepts and Principles) , David W. Embley, Addison-Wesley, 0-201-25829-3

[HEIN97] Building Scalable Database Applications , Peter M. Heinckiens, Addison-Wesley, 0-201-31013-9

[JORD97] C++ Object Databases (Programming with the ODMG Standard) , David Jordan, Addison-Wesley, 0-201-63488-0

[MAIE91] Comments on the "Third-Generation Database System Manifesto , David Maier, Oregon Graduate Institute

[MCCL97] Object Database vs. Object-Relational Databases, Steve McClure, IDC Bulletin 14821E

[STON90] Third-Generation Database System Manifesto , Michael Stonebraker et al., SIGMOD Record 19:3