Abstract
Coupling and cohesion are non-functional properties of a system's built structure. They have measurable effects on the ease of comprehensibility, compilability, testing, and maintenance of any system. In C++, decoupling begins at a physical level with code structure. Surrounding the syntax, any system in development has a structure of source and header files that lives in a file system. This physical structure is often neglected to the detriment of any system: It is sometimes impressive (and depressive) to see a clean logical design crushed by the brute force of a poorly considered physical structure.
The patterns in this paper document idiomatic practices for organizing C++ source and header files. They are connected into a language.
Source Dependency and Complexity Management
Dependencies should be managed throughout the runtime, construction time, and design time of a system. Coupling and cohesion define, respectively, inter-connectedness and intra-connectedness of components and their interfaces, where component refers generally to physical elements such as files or logical elements such as classes. It is these quantities that must be managed if an architecture is to be stable and resilient in the face of change: supporting natural growth and evolution, as well as out of the box fitness for purpose and buildability.
On the whole a designer should strive to minimize dependencies between elements of a system. This should not be at the cost of making elements non-cohesive. They should be as loosely coupled as is meaningful, and this will lead to a more supple component structure. In turn this should lead to a more maintainable and stable system. When something is stable it may be depended upon without concern.
Practices for the physical organization of the source code support or lead to decoupling practices that are focused on managing class dependencies and object relationships [ Barton-1994 , Carroll-1995 , Geisler1999 , Henney1999 , Henney2000 , Murray1993 , Riehle2000 , Sutter2000 ]. The focus in this paper is on physical organization more than on the logical concepts represented within a program.
Organization of Source Code
Large-scale structure in C++ programs is dominated by use of the preprocessor for file inclusion. However, the relationship between physical source, such as source and header files [ 1 ] , and logical source, such as declarations and definitions, is not always a happy one. The same can be said of Bjarne Stroustrup's view of this C legacy in C++: "I'd like to see Cpp abolished" [ Stroustrup1994 ]. Evidently there would be little love lost here if this were to happen.
The preprocessor is blunt tool for expressing module relationships, based as it is on semantically-unconstrained textual inclusion:
There is no language-recognized module structure for a system, so physical partitioning is notionally optional and based strongly on habit and convention - always a topic for heated debate. As perceived by the programmer, this leads to unnecessary incidental complexity when working with source - whether it is writing it, compiling it, or versioning it.
If unmanaged, build times can rise exponentially with the number of source and header files. In smaller projects this offers plenty of opportunities for coffee breaks and meetings, but on larger projects it introduces the bind of overnight builds.
An unwanted side effect of preprocessor usage is that legitimate code may be terrorized and harassed by wild macros.
These are common forces and consequences in development. Unmanaged physical dependencies thus make systems harder to understand and harder to build, putting the brakes on rapid, incremental development and turnaround. The pattern language presented in this paper documents and links a set of practices that balance these forces.
The Patterns
Table 1 lists the patterns described here in thumbnail form, providing a summary of the problem and solution for each. Other key patterns used in this paper are described toward the end in thumbnail form in Table 2.
Name | Problem | Solution |
---|---|---|
Cohesive Header | What partitioning criteria should be used in allocating abstractions to header files in a system? | Organize a header around a single concept, such as a class or set of services. The header should declare or include everything it needs to be self-contained, but no more. |
Source/Header Correspondence | What should be the relationship between header files that declare the abstractions in a system and the source files that define them? | For each header define a corresponding source file with the definitions to the declared abstractions. The source file includes the header and has the same base name as the header. |
Idempotent Header | What steps should be taken to deal with header files that include common header files, which may lead to multiple inclusion problems? | A header file should have the same effect, no matter how many times it is included. Use a Header Guard to ensure that its contents are expanded out only the first time a header is included in a translation unit. |
Header Guard | What mechanism can be used to prevent multiple inclusion of a header file leading to multiple expansion of its contents, and hence multiple definition problems? | Test for the definition of a macro - based on the capitalized version of the header's base name - only including the content of the header if it is undefined. Define the macro only if the content is included. |
Minimal Header | How can a Cohesive Header include no more information than is strictly necessary when its classes and functions mention types defined in other header files? | Include only the headers that define non-class types and the headers for classes whose definition is genuinely required, e.g. base classes and data members by value. For each other class that is used, e.g. as arguments or as pointers, provide a Forward Declaration. |
Forward Declaration | How can a class be referred to in a header without causing its full definition to be included for compilation? | Declare only the class name in the header file, placing it after any included files. In the defining source file include the appropriate header file for all forward-declared types. |
Forward Declaring Header | How can Forward Declarations for complex types, such as template definitions with defaulted parameters, be declared in a Minimal Header without hardwiring and duplicating the declaration information over the system? | For such complex types, provide an additional header file that comprises only Forward Declarations. Clients now include the Forward Declaring Header rather than providing their own Forward Declarations. |
Table 1. Thumbnails for patterns documented in this paper, listed in order.
Figure 1. Patterns and their successors for organizing C++ source and header files.
The patterns may be organized into a pattern language, as shown in Figure 1. Cohesive Header defines the natural entry point into the language, the remaining arrows showing the subsequent flow through the language.
The patterns are presented using a brief, low ceremony pattern form: pattern name followed by intent; a description of the problem, identifying the forces; a detailed description of the solution, identifying the consequences.
Cohesive Header
Organize header files around a single concept to simplify their comprehension and management.
Problem
What partitioning criteria should be used in allocating abstractions to header files in a system?
It is possible for a programmer to develop a small project in a single file. However, although this apparently simplifies tool usage - compilation, linkage, versioning, etc. - it does not scale well: Comprehensibility falls rapidly with size, editing becomes increasingly awkward and Cobol-like, and opportunities for developer teams, parallel development, separate compilation, fine-grained versioning, reuse, unit testing, etc. are all but eliminated. Such a seductively easy route leads quickly to a dead end.
Separate source files in part resolve these issues, but introduce the need for different source files to share common function declarations and type definitions. The temptation is to revert to a monolithic header file that includes all declarations and definitions needed across a project. As with the single source file approach, such a " project.h " requires a particular presence of mind to ensure the internal organization of the file does not degenerate into cryptic spaghetti [ 2 ] , a secret dish known only to those initiated in its history and development.
Dividing the declarations and definitions of a project across many header files increases the opportunity for separate development, and many of the benefits that it brings. However, partitioning alone is insufficient to bring in all of the benefits, and may also be used inadvertently as a tool to reduce comprehensib-ility: The partitioning may be coincidental or inconsistent, making the apparent structure of the system difficult for a programmer to navigate, modify, and extend. The resulting header files may be large and still difficult to comprehend, offering apparently unrelated concepts in a single place, i.e. " kitchensink.h " (because they contain "everything but the kitchen sink").
Breaking declarations and definitions across multiple header files will lead to implied dependencies between header files, where one declaration depends on another. This can be awkward to manage in source files as the programmer must be aware of and track such subtle relationships, ensuring that they #include the right files in the right order.
Solution
Organize a header around a single concept, such as a class or set of services. The header should declare or include everything it needs to be self-contained, but no more: It should also be a Minimal Header . A Source/Header Correspondence ensures that the implementation of features offered in a Cohesive Header is easy to find. An Idempotent Header ensures that multiple inclusion of a header is transparent in non-hierarchical inclusion graphs (i.e. directed acyclic graphs).
A Cohesive Header represents a physical interface that corresponds to the logical structure. It represents a unit of common release and change. For instance, a Cohesive Header may be organized around a single class and its dependent functions. It would include any associated operators and algorithms, helper classes, and namespace wrapping, as well as its own inclusion of any headers whose contents it depended on. Thus to gain access to the full use of a class a programmer need only determine which Cohesive Header corresponds to that concept and include it.
Cohesive Header supports and is supported by the Interface Principle [ Sutter2000 ], where "for a class X, all functions, including free functions, that both 'mention' X and 'are supplied with' X are logically part of X, because they form part of the interface of X".
A Cohesive Header should be named after its principal concept, i.e. a class or general purpose, followed by a suffix, such as .hpp or .hh. The suffix is generally dependent on the project and the platform culture. Although the standard header files omit suffices, e.g. <string> and <list> , such files cannot be associated with applications or found conveniently using wildcard matching. Many source code editors use syntax highlighting based on identifying the language through the suffix, and the absence of a suffix introduces an inconvenience. The .h suffix is generally not recommended because of its more standard association with C header files, so that C++ headers cannot be readily distinguished from C headers by either programmers or tools.
Where namespaces are used to represent subsystems or packages, as opposed to simply wrapping up a single class and its dependent functions, the physical directory structure should correspond to the logical source structure. This again makes it easier to find and comprehend the header in question, and should also be made apparent in the inclusion by client code, i.e. #include <subsystem/concept.hpp> rather than simply #include <concept.hpp> . Such partitioning was relatively common before the introduction of C++ namespaces, and is reinforced by their addition as well as the adoption of the same model as the basis of Java's package mechanism.
Partitioning a system's logical interfaces across Cohesive Header s will result in more header files than other partitioning criteria, which means that there are more files for the programmer to be aware of and to manage. However, each of these files will be smaller, more cohesive, and - therefore - easier to comprehend and manage, e.g. change history will clearly reflect the stability of and rate of change of a concept. If the compiler's own file management is limited, many small header files on a large project can have a knock-on effect on the build time as the compiler will be performing more file opening, reading, and closing operations.
As with many forms of source partitioning, Cohesive Header s is a convention that requires judgment - i.e. whether or not the contents of a given header are genuinely cohesive. If adopted by developers, it is a convention that is enforced beyond the control of the compiler.
Source/Header Correspondence
Each header file should be matched by defining source file to simplify comprehension of the system's file structure and management of dependencies.
Problem
What should be the relationship between header files that declare the abstractions in a system and the source files that define them? Having partitioned the physical view of the logical elements a system into Cohesive Headers, header file clients can now easily locate functionality for inclusion. But what about programmers working on the actual defining source code that realizes the features in Cohesive Header ? It should be simple for them to locate definition code for reading, modification, or extension based on the header, as well as know which header to modify if a change requires that they be kept in synch.
Ada and Modula-2 have a clear correspondence between the exported view of a package or module, but C++'s preprocessor does not enforce any model, focusing only on the composition of a translation unit, by inclusion and macro expansion, for further compilation. It would be equally legitimate for a system to have its function and data definitions grouped into a single source file as it would be for each individual function or data item to be assigned its own source file. The former will create a slab of code that defies software engineering principles and the latter will create a finely fragmented system structure that is overwhelming for both programmers and compilers, leading to astonishingly long build times.
Solution
For each Cohesive Header define a corresponding source file with the definitions to the declared abstractions. The source file includes the header and has the same base name as the header.
Source/Header Correspondence effectively splits an abstraction into two: Its declarative or export part, and its defining or implementation part. This longstanding practice follows the structure enforced by languages such as Ada and Modula-2. The two files should have the same base name and differ only in the appropriately corresponding suffices, i.e. .hpp and .cpp or .hh and .cc . The source file is also a unit of information hiding within which the use of anonymous namespaces or the deprecated use of static can keep additional implementation abstractions private. Source/Header Correspondence makes the source structure self-evident and easy to work with.
However, with inline functions and the include-all model for template compilation, C++ does not follow a perfect specification/implementation separation. Source/Header Correspondence may be further generalized to include files for inlines and headers for templates, so that a header file provides function and data declarations and class definitions only, and includes source for template definitions and inline function definitions as separate files. Such a separation of concerns simplifies source management.
A source file includes its own corresponding header file, as well as any other system and project files required for its definitions. The recommended order for inclusion in the source file first #include the corresponding header, next any headers developed for the project, and finally any other system or third party headers. Within each of these groupings any consistent ordering, such as alphabetical, will suffice. This reinforces a Cohesive Header by ensuring that headers are genuinely self-contained, not accidentally relying on features they do not themselves include.
Idempotent Header
Ensure that including a header file has the same effect wherever and whenever it is included.
Problem
What steps should be taken to deal with header files that include common header files, which may lead to multiple inclusion problems?
It is common for header files in projects to require the same abstraction, and therefore as Cohesive Header s to include them. Any other source or header file that includes these will then be including a common header file at least twice. This can potentially cause problems: In a single translation unit a class or function, including inline functions, may be defined only once; multiple inclusion leads to multiple definitions, and therefore failed compilation. For instance:
// timer.hpp #include "callback.hpp" .... // viewer.hpp #include "callback.hpp" .... // main.cpp #include "timer.hpp" #include "viewer.hpp" ....
Such directed acyclic graphs of dependencies are inevitable in any well-factored project, so it is often impossible as well as impractical to mandate that repeated inclusion "shall not happen". Alternatively if header files include nothing and source files are required to resolve all of the dependencies, the problem can be resolved technically. However, the header files are no longer Cohesive Headers and a source file programmer is required to know and document explicitly not only what the header filesdependencies of each include file are, but also the required ordering. This makes header file usage cumbersome and unpleasant; there is more to learn, and more to miss. In particular, the programmer must be aware of the private section of each class and dependencies of each inline function or template. This is acceptable for the developer of the header file, but not for its user. It simply adds complexity and means that can break in a build.
Solution
A header file should have the same effect, no matter how many times it is included. Use a Header Guard to ensure that its contents are expanded out only the first time a header is included in a translation unit.
An Idempotent Header may be included multiple times in producing the same translation unit without harm. This allows Cohesive Header partitioning and simplifies the use of header files from the point of view of its users, ensuring that they are self-contained and work out of the box.
The portable and conventional way to implement an Idempotent Header is to use a Header Guard , but some compilers support a #pragma that ensures that the body of a header is expanded out by the preprocessor only the first time it is encountered. In principle, there are portable cases where an Idempotent Header need not use a Header Guard : Declarations can be repeated without the ill effect caused by repeated definitions. However, for consistency and surprise-reduction a Header Guard is always recommended as the support for an Idempotent Header .
Header Guard
Use conditional compilation to prevent problems arising from multiple inclusion.
Problem
What mechanism can be used to prevent multiple inclusion of a header file leading to multiple expansion of its contents, and hence multiple definition problems?
Idempotent Header s simplify the use of headers, but the provider of the header file must ensure that such headers genuinely are idempotent, avoiding the problems arising from multiple inclusion of definitions. As this support is not automatic within the language, programmers are required to provide it themselves, and the conventions used should be consistent across a project.
Solution
Test for the definition of a macro - based on the capitalized version of the header's base name - only including the content of the header if it is undefined. Define the macro only if the content is included. A conditional compilation guard therefore surrounds the body of the header.
The basic schema for a Header Guard is
#ifndef HEADERBASENAME_INCLUDED #define HEADERBASENAME_INCLUDED .... // body of header #endif
There are many different conventions for naming the guard macro, and any convention adopted should meet the following criteria:
Macros should follow convention by being all in uppercase, i.e. HEADERBASENAME_INCLUDED is OK, but not HeaderBaseName_Included .
Leading underscores followed an uppercase letter are reserved for use by the compiler and should be avoided, i.e. _HEADERBASENAME_INCLUDED should not be used. The same is true for any use of double underscores, i.e. HEADERBASENAME__INCLUDED should also be avoided.
Any directory name required for inclusion should also be encoded in the name to avoid clashes with other files with the same base name, i.e. SUBSYSTEM_HEADERBASENAME_INCLUDED .
Any suffix used should consider the effect of moving header files across platforms where the header suffix might be changed in line with any compiler-required changes to the suffix of the source file, i.e. HEADERBASENAME_H and HEADERBASENAME_HPP suggest they are tied to a particular header suffix whereas HEADERBASENAME_INCLUDED and HEADERBASEDNAME_HEADER do not. The suffix can be omitted altogether, but note that the macro then looks like any other macro as opposed to a header guard, i.e. HEADERBASENAME .
The convention should be used consistently across a project.
This last point is important as it conforms to the principle of least astonishment, especially when external guards are used for larger files: Although repeated inclusion with a Header Guard does no harm, the file is still read by the compiler, which can add significant time for large headers and, on some systems, can exhaust the number of file handles available to the compiler. In such cases, a build optimization is possible by using redundant external guards around an include [ Lakos1996 ]:
#ifndef HEADERBASENAME_INCLUDED #include "headerbasename.hpp" #endif
All header files should, regardless, define Header Guard s. The programmer must remember to do this consistently, and this can be helped by using boilerplates or editor scripting.
Minimal Header
A header should include no more than it needs for legal compilation.
Problem
How can a Cohesive Header include no more information than is strictly necessary when its classes and functions mention types defined in other header files?
A programmer, assuming that a header is not only self-sufficient but also sufficient, may assume that all the headers included by the header are genuine dependencies of the header's contents. However, it is often the case that certain headers are included by force of habit or laziness to include certain utility files elsewhere, e.g. #include <iostream> or #include <string> . Such headers are often not required by either the header or many of the source files. This makes the dependency graph for a system confused, so that listing #include directives does not give an accurate reflection of the necessary and logical dependencies in a system. It also unnecessarily increases the build time for a given header.
In other cases, although all of the headers included are Cohesive Header s and their concepts are used in the including header, e.g. use of a class name, inclusion of a full header is not always necessary to allow the code to compile. In these cases the inclusion is mostly redundant and creates a compile-time overhead.
Solution
Include only the headers that define non-class types and the headers for classes whose definition is genuinely required, e.g. base classes and data members by value. For each other class that is used, e.g. as a function argument or a pointer member, provide a Forward Declaration .
A Minimal Header supports Cohesive Header principles by ensuring that what is defined by a header is necessary and sufficient, that it is standalone. It is the source file programmer's responsibility to include any other header files they see fit, but it is not the job of the header file programmer to cater for a "just in case" need or to predict other possible inclusion requirements of the header file client.
In projects that have come to rely on this, making the change will involve an initial refactoring that breaks a lot of compilation. However, the required changes will be easy to identify and make, and following that change the system build time will be faster and header files more self descriptive.
Header files with inlines and template definitions will typically require many more inclusions because they require implementation detail that would otherwise be in a source file. For inline functions, it is best to be sure that they are genuinely required and that they do have a measurable positive effect on the runtime performance or footprint of the system, otherwise they are simply an excuse for laziness at the expense of dependency management and build times. For templates, techniques such as hoisting [ Carroll-1995 ] can be used in reducing the inclusions required to allow compilation of a template.
Forward Declaration
Declare a class name rather than include its full definition in a client header file.
Problem
How can a class be referred to in a header without causing its full definition to be included for compilation?
Not all types that are included in a header are fully used in that header, e.g. a type name may be used to declare a prototype of as a pointer data member. For a class included for use as a base class or whose instances are manipulated in inline or template functions, the full definition is clearly required. For these other cases, where only a mention of the name is required, inclusion of the full definition seems wasteful.
Solution
It is possible to forward declare a class or struct name for use in function prototypes, class static and extern data declarations, pointer or reference data members, and some template parameters. However, this does not apply to other type names, e.g. enum or typedef names. Thus, for classes (and structs) declare only the name in the header file, placing it after any included files. In the defining source file include the appropriate header file for all forward-declared types that are used. A Forward Declaring Header simplifies forward declaration for complex type names.
For classes defined in a namespace, the namespace can be reopened for the forward declaration:
namespace subsystem { class concept; }
Forward Declaring Header s can make this less of a chore.
Forward Declaration supports Minimal Header s based on the principle of sufficiency, leading to improvements in build times and reduction of recompilation when headers are changed. The onus is now placed on the file including the header file to include the full definition, assuming it is needed (not always the case).
Forward Declaration 's role as a dependency management idiom is the basis of the Opaque Type idiom, which in turn, and along with Handle/Body , forms the basis of the Cheshire Cat idiom. Another technique for eliminating dependencies arising from private implementation detail is at the logical rather than the source level: Defining an Interface Class uses inheritance to eliminate any private section, and thus any dependencies from it.
Forward Declaring Header
Group complex forward declarations into a separate header file to simplify client header files.
Problem
How can Forward Declaration s for complex types, such as template definitions with defaulted parameters, be declared in a Minimal Header without hardwiring and duplicating the declaration information over the system?
For instance, if a programmer wished to declare a prototype that took an ostream from the standard library as one of its arguments without including any of the sizeable I/O stream headers, a Forward Declaration would suggest itself as the solution:
namespace std { class ostream; }
However, this will not work because ostream is a typedef name that hides a more complex type:
namespace std { .... template< typename char_type, typename traits = char_traits<char_type> > class basic_ostream; typedef basic_ostream<char> ostream; .... }
Declarations of this complexity make Forward Declaration s by the including programmer impractical and error-prone, as well as susceptible to minor changes, e.g. if a default template parameter were to change.
Solution
For such complex types, provide an additional header file that comprises only Forward Declaration s. Clients now include the Forward Declaring Header rather than providing their own Forward Declaration s.
A Forward Declaring Header requires the anticipation of the programmer providing the principal header. The Forward Declaring Header includes the relevant declarations, dealing with namespace and template parameter issues, for the user to include directly. The principal header file can itself include the Forward Declaring Header , helping to ensure that its Forward Declaration s are in synch with the actual definitions, and simplifying the management of what is now an increased number of files.
The role of the header file as a Forward Declaring Header should be made clear by use of an appropriate convention, such as appending fwd to the base name. For example, #include <iosfwd> . If a consistent and clear convention is followed, users will be more aware of these header files, and will be more likely to use them to reduce their dependencies and build times.
Other Patterns
Table 2 presents other key patterns that are used in this paper. The references given indicate where a pattern has been formally documented as such or, alternatively, where it has been documented as a proven, recognizable practice, possibly by a different name.
Name | Problem | Solution |
---|---|---|
Cheshire Cat [ Geisler1999 , Lakos1996 , Murray1993 , Sutter2000 ] | How can the representation of an object be fully and physically decoupled from its usage, so that users of the object are presented with a conventional class interface? | Define the representation of the object as an Opaque Type. The Opaque Type is declared as nested and private within the class; it is only fully defined in the same source file as the definitions of the class's member functions. |
Handle/Body [ Coplien1992 , Gamma-1995 ] | How can the representation of an object be decoupled from its usage? | Place the abstraction and representation into separate objects and hierarchies, so that the abstraction is accessed via a handle object and its representation is a separate, hidden, body object. |
Interface Class [ Barton-1994 , Carroll-1995 , Henney1999 , Lakos1996 , Riehle2000 ] | How can we represent the protocol for class usage without also expressing any implementation? | Define a separate abstract class, containing only pure virtual functions, to express the common capabilities of derived classes. |
Opaque Type [ Henney2000 , Lakos1996 ] | How can the representation of an object be fully and physically decoupled from its usage, so that a change in representation does not require a recompilation of clients | In a header file, provide a Forward Declaration for the type along with a set of functions that operate on pointers to that type. The type is only fully defined in the same source file as the functions that operate on it. |
Table 2. Thumbnails for patterns used but not documented in this paper, listed alphabetically.
References
[Barton-1994] John J Barton and Lee R Nackman, Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples , Addison-Wesley, 1994.
[Carroll-1995] Martin D Carroll and Margaret A Ellis, Designing and Coding Reusable C++ , Addison-Wesley, 1995.
[Coplien1992] James O Coplien, Advanced C++: Programming Styles and Idioms , Addison-Wesley, 1992.
[Gamma-1995] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software , Addison-Wesley, 1995.
[Geisler1999] Andreas Geisler, "Hidden State", C++ Report 11(10), November/December 1999.
[Henney1999] Kevlin Henney, "Clone Alone", Overload 33 , August 1999.
[Henney2000] Kevlin Henney, " C++ Patterns: Principles, Idioms and Techniques ", presented at OOP 2000 , January 2000.
[ISO1998] International Standard: Programming Language - C++ , ISO/IEC 14882:1998(E), 1998.
[Lakos1996] John Lakos, Large-Scale C++ Software Design , Addison-Wesley, 1996.
[Murray1993] Robert B Murray, C++ Strategies and Tactics , Addison-Wesley, 1993.
[Riehle2000] Dirk Riehle, "Working with Classes and Interfaces: Five Fundamental Patterns", C++ Report 12(3), March 2000.
[Stroustrup1994] Bjarne Stroustrup, The Design and Evolution of C++ , Addison-Wesley, 1994.
[Sutter2000] Herb Sutter, Exceptional C++ , Addison-Wesley, 2000.
[ 1 ] Unless otherwise stated, a source file in this paper is considered to be the file containing non-inline, non-template definition code, i.e. .cpp or .cc files on many systems. These are considered distinct from header files which they #include, and which #include each other, i.e. .hpp, .cc, or .h files.
[ 2 ] Don't get me wrong: I love spaghetti. It's just that it does not provide us with a useful metaphor for good software structure; the layering of lasagna is far more suited to this.