When designing something, we make a careful study of the situation/business (domain analysis). Sometimes the business is so broad that it's best split up into smaller areas. I prefer to split the business focus into sub-domains (e.g. forecasting, risk-analysis) and the technology into sub-systems (e.g. databases, GUIs).
Divide and Rule.
One way to broadly partition a system is to divide it into its application domain(s) (which deals with elements relevant to the application / business domains) and its solution domain (programming language and design patterns used) [See MPD and GENPROG for more details on Domains].
Ultimately the system will be implemented as software elements : source modules, functions, classes, templates and DLLs etc. Even when using a technique like divide and conquer, things can get confusingly interwoven very quickly.
One simple solution is to note everything on Class Responsibility Cards (CRCs).
Class:
Singular nouns make the
best
|
Collaborators
|
Responsibility:
|
Paraphrasing a C++ Report (Sept '91) article on CRCs:
Goal of the CRC process is a collection of cards that:
Are an informally documented set of abstractions for classes that could be applied to the problem being studied.
Improve the participant's understanding & confidence in how an object-oriented architecture might be applied to problems in the domain being studied / analysed.
The CRC process.
-
Identify candidate classes by identifying singular nouns & putting them on CRC cards.
-
Add some responsibilities (some arise from the nature of the class, some from the systems input/output or pre/post condition requirements.
-
Simulate the behaviour of the system using your CRC cards in walkthroughs. New classes will be added. Old classes will be dropped/modified. The walkthroughs continue until the CRC cards stabilise. If they don't stabilise, the designers have some misconception(s) about the problem domain or requirements.
-
Do more walkthroughs but consider the system's exceptions. Need to be careful you don't design beyond the scope of the system.
-
Classes with no responsibilities should be removed.
-
Classes with only one responsibility may be best removed (& that responsibility placed in another class).
-
A full card is a candidate for breaking into two or more classes that divide responsibilities.
-
Inheritance and CRC cards.
-
Sometimes inheritance will be obvious - set the cards out as a hierarchy.
-
When variation within the hierarchy is not significant for a scenario, all the derived-class cards can be hidden under the base class card.
-
It might emerge only after the designers recognise similarities in the responsibilities of a set of classes.
-
See the Liskov Substitutability Principle further on in this article.
With CRCs, you end up with a collection of cards in which you've split your system into classes, given meaningful names to the classes, decided what individual classes do and how the classes interact. These CRCs can be presented and updated at key meetings between stakeholders (key system architects, designers, customers).
CRCs are very flexible - but taken literally they will bias / restrict us to an Object Oriented solution. A more balanced approach is to adapt CRCs to identify other software elements (e.g. standalone functions - algorithms) as well. When do you use a class (member function) and when do you use a free standing function? See " Intuitive Multi-Paradigm Design : Do I use a member function or a non-member function " [Bruntlett, Overload 38] for further discussion on this.
If CRCs aren't sufficient, supplement them with other, more formal, models for:
-
Structure (static model).
-
Behaviour (dynamic model).
Finding classes.
I ask the question "if the domain had a language, what would its built-in (fundamental) types be?" This identifies my fundamental / value types . They tend to be written to:
-
complement the existing built-in types for a clearer solution or
-
implement built-in types fundamental / native to the application domain(s).
For further discussion see [ SUB ], the CHECKS (Whole-Value) and STARS pattern languages [ PLOP ] and QUANTITY [ ANAPAT ].
I also ask "if the domain were a play, what would the characters be?" This identifies my entities - they each have a set of goals (agenda) and they instigate actions to meet those ends, collaborating with other characters. Entities have significant identity and are typically represented using an instance of a Whole-Part class.
A user-defined type is one of:
-
A class.
-
An enumeration of a set of numeric constants.
A class is written to implement:
-
A represention of something found in an area of interest (domain).
-
A mechanism for manipulating other types.
Class as mechanism.
A mechanism class is used to manipulate objects (and, in some cases, types) as part of the solution. Case studies of common O.O. mechanisms ( Abstract Factory , Proxy etc) have been documented as design patterns, expanding the solution domain, implementing one or more predetermined conventional interfaces .
Class as representation.
A class represents something. Whether that thing is a value or entity depends on the viewer / manipulator's perspective. It may in be implemented in terms of other classes (Whole-Part). Its qualities of analysis, design and implementation are judged according to the qualities of abstract/concrete, granularity (sufficiency) and collaborations (DIP, LSP, OCP).
Type properties : Identity, State, Behaviour.
An object (instance of a type) lies somewhere along three axes of identity, state and behaviour [ OOADWA ]. Each different kind of UDT can be placed on these axes. This may help point out similarities & differences (commonalities & variabilities) between them.
In value based programming , a value has no significant identity (it has transparent identity) - for instance, neither 15 Kilowatt hours nor the colour green have significant identity. You're only interested in its state and behaviour.
STL containers are an example of value based programming. For instance (by default) vector<>::resize() invokes the default constructor to create objects when the vector is created and invokes the copy constructor when moving a vector in memory after it has been resized.
"Value types do not tend to live in class hierarchies, although they may well live in shallow type hierarchies (classified by conversions, operator overloading, mutability and templates)" [KH - Kevlin Henney].
Strong typing in a system enforces a type's domain-specific rules avoiding absurd situations where sunset time is added to energy in KWh. Overly strong type-checking traps the developer in a straitjacket.
"A distinction between value and exception types is that although they seem similar at first sight, the consequences are different. For instance, identity is transparent for both, and behaviour is unsophisticated (i.e. just at the wrapper level). However, values are stateful whereas exceptions may or may not be stateful, and exceptions tend to live in class hierarchies. That is a classification view, and in terms of consequences we see this in the fact that although exception classes in C++ require copy ctors, they do not require assignment operators (in fact, these are a bad idea)
There are other consequences, but this gives a reasonable starting point in understanding how something is used affects how it is built, and so that by understanding the nature of the object (identity, etc) and naming this, we can tie together language features into a more meaningful description than a shopping list. Voilà, patterns." [KH].
"In addition … there's purpose, relationships, creation/lifetime and movement … its possible to over-classify beyond the point of usefulness or stability - you don't really want objects hopping from type to type at the slightest shift of design." [KH].
Parameters : Value or Reference semantics?
In a function call, many objects can be passed by value (value semantics) with little impact on the meaning of a program. It might use more memory, it might take more time to run. If the object isn't changed and it is larger than a pointer, passing it by const reference uses less processor time and memory than passing it by value.
With reference (indirection) semantics, objects are passed by reference (indirectly, by pointer or a reference) instead of by value. If a function must modify its parameters (i.e. have output parameters), either a reference or pointer parameter can be used.
Personally, I prefer using a pointer because the syntax is different and highlights the potential data-flow to someone reading the code. Others dislike this approach because a pointer can be NULL and this implies that the output parameter is optional whereas a reference parameter should never be and indicates the parameter is mandatory.
"There is another reason to favour references over pointers which has little to with the null affordance: values represent immediate as opposed to indirect content. Pointers emphasise identity whereas by ref emphasises content. Also, operator overloading is meaningful for value types but not reference types, and this guideline is reinforced by the awkwardness of attempting to use overloaded operators through a pointer." [KH]
Type relations & interactions.
At Kevlin Henney's JaCC lecture on substitutability [ SUB ], he revealed that " a type hierarchy defines a classification relationship between types based on extension and substitutability ". And that " vertical delegation moves execution responsibility up and down a Class hierarchy ". Moving on to the subject of object interactions " An Object Hierarchy expresses a structure over which horizontal delegation acts, with significant objects cascading responsibility . "
It also brought to my attention the notion of a type heterarchy - " where types can be defined as readily convertible between each other… an Object Heterarchy defines a set of object relationships where there is no definite root ". Heterarchies can be non-lossy - e.g. degrees and radians - or lossy - conversions between the built-in numeric types.
Types and interface.
So far I've been dealing with abstraction as a measure of a type's independence from details. Just as everything is beginning to settle down, I'd like to introduce another definition:
Abstraction can also be broken down as gathering things together (aggregation) and putting a barrier around them (encapsulation) through which interactions independent of implementation details take place (interface).
When judging if a UDT is abstract or concrete, we are evaluating its interface.
When judging the sufficiency (granularity) of a UDT, we are evaluating its aggregation and encapsulation.
A class implements one or more interfaces. Classes that implement the same type have something in common. Those classes can be used interchangeably wherever their common type/interface is used.
Qualities
Abstraction
Abstraction is a way to understand a complex thing - concentrate on its significant aspects and ignore its less significant details.
"Abstraction is selective ignorance" - Andrew Koenig.
The degree of a user-defined type's abstraction can be placed somewhere between two ends of a goldilocks scale - "When is a type concrete and when is it abstract?"
A type is abstract if it is suitably independent of implementation details. The end user can describe its use in terms of the problem being solved (i.e. with little or no regard to implementation details).
Abstraction | Comment |
---|---|
Too hot | Too abstract to be usable. System has gaseous form & is hard to grasp. |
Just right | Just right. |
Too cold | Too detail-specific to be usable. "Strict modeling of the real world leads to a system that reflects today's realities but not necessarily tomorrow's" [ GoF ]. |
Abstract data type.
An abstract data type hides a certain level of implementation details from the user. For useful abstract data types, the user wants to ignore these details. The user relies on the type's interface so that it can ignore these details.
Good abstractions blur details that would stop us from seeing the overall picture. If you blur the details too much, it becomes a fog, preventing us from seeing anything, prompting Jim Coplien to posit, "Abstraction is evil".
If the problem being solved dictates that the only appropriate methods for an abstract data type is a comprehensive list of Set/Get methods, maybe its time to step back and ask, "Am I being too abstract here? Should I use a simple aggregate type?"
If you consider that a class is written to solve a particular problem, it is possible to classify member variables as one of:
-
Application-specific (something the end user will be aware of and probably interested in).
-
Solution-specific (something that can be removed in a different version of the class without the user having to do anything different).
An abstract data type may have a mixture of both application-specific members and solution-specific members. A simple aggregate type should only have public member variables.
Use of a simple aggregate type can be a symptom of insufficient abstraction due to inadequate analysis or insufficient information. Whatever the reason, we're being too concrete and should be implementing an abstract data type.
"Modelarity is the degree of correspondence between problem and solution" (Kevlin Henney)
Concrete type.
A concrete type exposes a certain level of implementation details to the user. For useful concrete data types, the user expects to know these details and is relying on those details.
Good concrete types are something physically representing details of the problem you are solving. Getting something written and running is the focus - refactoring takes a back seat here. This isn't all bad - the experience in writing a concrete type for a problem is a good way to pick up the experience needed to write an abstract data type for the problem later - and that will replace the concrete type.
The lack of initial effort for concrete types can be quite beguiling and you may find yourself wading through concrete data types during development - just remember concrete eventually sets, impeding progress somewhat. In the long term, concrete crumbles - does this make concrete evil? No, just remember you build floors not seven league boots out of concrete.
Principles.
-
Granularity (sufficiency)
-
Liskov Substitutability Principle
-
Open-Closed Principle
Type granularity (sufficiency).
An abstraction viewed in isolation may appear to be correct. We need to consider its granularity , substitutability and dependencies within the system(s) it will be used. Granularity is one way to judge the quality of a classification using the principles of loose coupling (low interconnectedness) and high cohesion (intraconnectedness [ Design ]). Is the classification too coarse (doing too much, too large to be comprehended and should be split up). Or is it too fine (doing too little, difficult to understand as there are too many bits to comprehend - some of the bits should be merged with others)?
Granularity | Comment |
---|---|
Too course | Assumes too much responsibility |
Just right | Weakly coupled. Highly cohesive. |
Too fine | Too little. Shirks responsibility. |
Type substitutability - the LSP.
Analysis may identify groups (families) of similar types used interchangeably in similar circumstances. For example a system may have a couple of database classes used interchangeably, depending on what kind of database server is being accessed.
To paraphrase the Liskov Substitutability Principle [ LSP ] - if a program can use objects of type S instead of objects of type T without changing the behaviour of the program, then type S is a subtype of type T. This applies to all sorts of polymorphism - compile time [ SUB ] as well as runtime.
The Open-Closed Principle was introduced by Bertrand Meyer in his book "Object-Oriented Software Construction" (see [ OCP ] for more details). Bertrand Meyer states:
"Software entities (Classes, Modules, Functions, etc.) should be open for extension, but closed for modification."
Type dependencies - the DIP.
The dependency inversion principle (DIP) is a set of guidelines used to judge dependencies [ DIP ]:
High level modules should not depend on low-level modules. Both should depend upon abstractions.
Abstractions should not depend on details. Details should depend on abstractions.
Context, Principles, Good practice.
Good practice / rules of thumb are specific to a context based on factors including language, runtime environment, application domain.
The language used (for example C vs C++) determines the mechanisms available to implement abstractions. The additional language support of C++ allows much more ambitious abstractions to be implemented than in C. This has a cost, though - abstractions implemented in C don't have to cater for the complications (overloading, exceptions) of C++.
"The appropriateness of an abstraction is often, somewhat cyclically, dependent on the way that it is to be realised" (Kevlin Henney).
Similar comparisons can be made with Java or the different generations of C++.
Changing runtime environment (single tasking to multi-tasking, standalone to network, single CPU to SMP, single-threading to multi-threading) means certain idioms no longer work
Application domain and implementation organisation affects the life cycle of the project - long term projects will want longer lasting abstractions. According to Conway's Law [ PLOP ], there is a relationship between an organisation and the systems it produces - the structure of the organisation is reflected in the architecture of the systems it produces and vice versa.
Good practice often fails to survive radical paradigm shifts in the same way that civilisations fail to handle Outside Context Problems ( Excession , Iain M. Banks). Moving to a radically new context, previous good practice may be inappropriate - context is a force that affects abstractions. Principles, in their abstract form, are relative to the context. Principles, when applied to a context become part of that context - change the context and you have to rethink how the principle applies to the new context.
Archetypes.
Labelling a UDT an 'abstract data type' or 'concrete data type' is about as helpful as saying that a doughnut is a "fresh doughnut" or a "stale doughnut". It would be nice to know what kind of doughnut you are dealing with (jam, confectioner's custard, bubble gum etc). Similarly it would be nice to be able to specify what kind of UDT you are talking about (aggregate, alias, meta, Whole-Part) and whether or not it is intended for use as an entity, a value or both.
There are many ways to categorise types, here is a list I've built up while writing this article: aggregate, alias, creation, entity, exception, meta, resource / service, usage, fundamental / value, whole-part.
Evolution, entropy, seasons.
Systems have seasons during their life-cycle. A system in summer has abstract data types that have evolved from concrete data types. A system in winter has abstract data types that have decayed to concrete data types.
Brittle interface = Brittle type.
Something (e.g. a concrete type, abstract type, interface etc) is brittle if it exposes overly implementation-specific details to the user. This causes it to "break" when reasonable change occurs.
Concrete types vs Alias types.
Using the previous view of brittleness, a brittle concrete type exposes too many implementation details to the user. For example, in a particular application, a class may expect to handle " NoOfBooks " as an integer of some kind, currently a short int .
Directly specifying " short int " for any NoOfBooks in the interface is brittle. All it takes is a modest change in application requirements (dealing with a lot more books) to trigger an awful lot of error-prone editing. This has bitten me before so I try to avoid it. Currently, any solution to this conundrum is a compromise.
Concrete data type (brittle interface).
void DoIt (short int BkCount);
Disadvantages:
-
Brittle in the face of change. Some may disagree that change will happen.
Advantages:
-
Clarity. Some may disagree about this.
If a built-in type does exactly what we want in an application, why use an alias? Initially an alias is a little awkward - it is more thinking for the developer to do. However, as soon as something changes (for instance we decide to use an int instead of a short int the alias type is easily updated to accommodate the change.
Aliased data type (supple, up to a point).
The simplest solution is to wrap it up as " typedef short int NoOfBooks ". This is an improvement - NoOfBooks can now be changed by updating a single typedef and recompiling everything. That is supple enough for many uses.
A typedef is not a real UDT , it is only an alias for a type - this leads to problems with function overloading. Consider this contrived example:
typedef short int NoOfBooks; typedef short int NoOfScreens; void DoIt (NoOfBooks BkCount); void DoIt (NoOfScreens ScrCount);
Problem. As far as function overloading is concerned, the two functions are identical.
If overloaded functions are absolutely necessary, there is a solution but it's not pretty.
Something like:
struct NoOfBooks { unsigned int Value; };
Disadvantages:
-
May produce less optimum code.
-
Unusual syntax e.g Found.Value++ instead of Found++ .
-
You can tie yourself up in knots if you overuse suppleness.
-
You have to think up a meaningful name. If you cannot think of one, perhaps (1) you are being too abstract or (2) you have not understood the domain.
Advantages:
-
Supple
-
Function overloading can be used.
-
Your fundamental types can be placed in hierarchies. Polymorphism and RTTI can be used. This sort of power demands careful use.
If an alias type isn't flexible enough, replace it with an abstract data type.
Class: Alias |
Collaborators
|
Responsibility:
|
Aggregate types.
Collections of objects exist for various reasons.
Library aggregate types - STL containers.
Although it is possible to use STL containers directly (e.g. vector<MyType> ), this is brittle and in most cases a typedef should be used, for reasons already discussed.
Brittle concrete use of containers:
typedef int CustomerCount; void Profile::ApplyCustomerNumberForecast( const vector<CustomerCount> &rvCustomerCount, const vector<TDate> &rvCustomerForecastDate, const size_t NoOfDates );
Aliased use of containers:
typedef vector<CustomerCount> CustomerCountTable; typedef vector<TDate> DateTable; void Profile::ApplyCustomerNumberForecast( const CustomerCountTable &rCustomerCountTable, const DateTable &rCustomerForecastDateTable, const size_t NoOfDates );
If an alias type is insufficient, consider using a Whole-Part type instead.
Simple aggregate types - struct/class.
The C++ struct/class mechanism is usually used to implement abstract data types. It can also be used as a very simple container to hold abstract data types - by using it as a simple C-style struct with all the data members public with no member functions.
I sometimes regard simple aggregate types as bureaucratic data types - they only exist to push forms of data from one place to another, applying arbitrary rules. Whatever the reasons for the type, I tend to reconsider my analysis and wonder if I have allocated behaviour / responsibility properly.
If an aggregate type is really justified, it worth considering Coad & Nicola's "behaviour across a collection" principle [from their book Object Oriented Programming] " let a collection do just the work that applies across a collection. Push work down into each part. "
Encapsulated Aggregation - Whole-Part
Lots of UDTs are "Whole-parts". Here is the definition:
"The Whole-Part design pattern helps with the aggregation of components that together form a semantic unit. An aggregate component, the Whole, encapsulates its constituent components, the Parts, organizes their collaboration, and provides a common interface to its functionality. Direct access to the Parts is not possible" [ POSA ].
Class: Whole |
Collaborators
|
Responsibility:
|
Class: Part |
Collaborators
|
Responsibility:
|
The Whole-Part pattern in POSA is well worth reading. It includes an implementation guide (p 231), which may clear a mental block or two. It describes three variants - unfortunately some of their names clash with STL terminology.
STL friendly term | Original term |
---|---|
Assembly-Parts | Assembly-Parts |
Bag-Contents | Container-Contents |
Collection / Container - Part | Collection-Members |
Assembly-Parts is equivalent to an encapsulated database record (although databases don't have to be involved).
Collection-members is equivalent to encapsulating an STL style container - there are a variable amount of members, all of the same type. The Collection provides an interface through which all or some of the members. For example, I've written nhh::Profile which encapulates a year's worth of energy usage figures - individual members aren't accessed - a Profile is loaded from the database with Profile::LoadFromDatabase() , manipulated using mathematical member functions and then saved to the database.
Bag-Contents is identical to Collection-members , with one important exception - the type of each Member can be different. When different types are in the "Bag", I prefer to call it a Blend-Part .
The Whole-Part definition refers to some of the GoF Design Patterns.
The Whole can act as a Mediator (273) between parts, so that different types of Part can be handled, yielding a more flexible Whole-Part .
The Whole can act as a Façade (185), providing a simpler interface to the Parts . Unlike a normal Whole-Part , access to each Part is possible, allowing more sophisticated users to take advantage of the richer functionality underneath the surface.
Composite (163) is an extremely flexible Whole-Part . Essentially, there is no distinction between Whole and Part objects. Each potential Part implements the collection handling interface of Whole as well as the interface of Part . As a result, each Part can be dynamically made up of other Parts . As is usual for this kind of thing, the typical example is graphics. Composite can be used so that routines can handle individual shapes or composite shapes, with no extra demands made on the code calling the shape's member functions.
Meta types - distant relations.
C++ has built-in facilities for meta operations on types ( typeid , dynamic_cast<> , derived class pointer polymorphism). These facilities and the existing class mechanism can be built on to provide meta facilities more focused on your problem domain.
Some development environments already provide user defined meta types as building blocks for frameworks - CObject (MFC), TObject, TMetaClass (VCL).
Design Patterns.
The C++ class keyword is used to implement both User Defined Types (UDT) and O.O. Design Patterns. How do you tell them apart? Well, they overlap so much (and patterns are so broad) that any separation would be a bit of a diversion and a little artificial… For more comprehensive details, read " Pattern-Oriented Software Architecture " and supplement it with " Design Patterns ".
Industrial revolutions.
A programming language provides mechanisms for implementing and using types. Some Design Patterns (as in GoF, POSA) are mechanisms that combine the existing mechanisms to allow types to be used more indirectly (i.e. later binding). Prior to the industrial revolution in Europe, production lay in the hands of guilds and apprentices. Mass production relied on standard, interchangeable parts, processed by machines (mechanisms). Design Patterns were a software revolution that identified the kind of machinery needed. However, each pattern (machine) had to be hand crafted by experts. Who will mass-produce the machines (design patterns)?
The next stage of that revolution is taking place. Generic and Generative programming are a way of mass-producing the machines (Design Patterns). STL has made GoF's iterators commonplace and provided a standard interface. More developments are underway (boost.org is a forum for extending the library and already some potential contributors are harnessing C++'s powerful constructs to make other Design Patterns commonplace -e.g. Loki by Andrei Alexandrescu provides generic implementations of the Visitor and Factory design patterns [ DESWC ].
Mass-production led to widespread standardisation and greater demand for inter-operability. The existing C++ mechanisms are very useful but a fair amount of meta-object code has to be implemented by hand. To tackle this, C++ Builder provides a new keyword, __closure, which is a form of generalised function pointer which also has an object pointer associated with it. Loki provides generalised functors, which provide similar functionality as __closure without relying on non-standard extensions
"at the expense of a slightly less elegant syntax and a small penalty in efficiency. However, Loki's approach is more general because it accommodates pointers to functions and functors, in addition to native pointers to member functions." - Andrei Alexandrescu.
Acknowledgements.
This is more of a report than an original article - it has benefited greatly from the ideas & knowledge of Kevlin Henney, Phil Bass, Mark Radford and Ewan Milne passed on in JaCC lectures, e-mail conversations and during proof-reading of this article.
Further Reading
[SUB] Substitutability. Kevlin Henney. JaCC September 1999, and Overload 39.
[MPD] "Multi-Paradigm DESIGN for C++"
[LSP] "The Liskov Substitution Principle" Robert C. Martin, C++ Report March 1996. Available on the back issues CD or from www.objectmentor.com/publications/lsp.pdf
[DIP] "The Dependency Inversion Principle" Robert C. Martin, C++ Report June 1996. Available on the back issues CD or can be downloaded from www.objectmentor.com/publications/dip.pdf
[OCP] "Open-Closed Principle" Robert Martin, C++ Report. www.objectmentor.com/publications/ocp.pdf
Further reading (C++, Design)
[CPL3] C++ Programming Language (3e) Bjarne Stroustrup.
[Design] "Design : Concepts and Practices" Kevlin Henney JaCC September 1999.
[OOADWA] Object-Oriented Analysis and Design with Applications (2e) Grady Booch.
[SciEng] Scientific & Engineering C++ (Barton & Nackman)
Further reading (Patterns)
Patterns document good practice. Quite a few focus on type interactions, object interactions and allocation of behaviour.
[POSA] Pattern-Oriented Software Architecture: A system of patterns (Buschmann, Meunier, Rohnert, Sommerlad, Stal).
[GoF] Design Patterns (Gamma, Helm, Johnson, Vlissides).
[PHl] Pattern Hatching (Vlissides)
[PLOP] [PLOP] Pattern Languages of Program Design (Vol 1-4, Lots O'People).
[ANAPAT] Analysis Patterns. A second edition is being worked on - see www.MartinFowler.com .
Further reading (Generic programming)
[GENPROG] Generative Programming : Methods, Tools, and Applications (Czarnecki & Eisenecker)
[DESWC] Design with C++ (Andrei Alexandrescu, Addison-Wesley 2001). Describes and builds the Loki library.