The syntactic form of the member initialiser list restricts the logic that it contains. Cassio Neri presents some techniques to overcome these constraints.
In C++, during a constructor call, before execution gets into its body all
subobjects
– base classes and non-static data members – of the class are initialised. (In C++11, this rule has an exception which we shall exploit later.) The
member initialiser list
(MIL) lets the programmer customise this initialisation. A subobject is initialised from a parenthesised
1
list of expressions that follows its identifier in the MIL. The MIL of
bar
’s constructor is emphasised in Listing 1.
class base { ... public: base(double b); }; class foo { ... public: foo(double f1, double f2); }; class bar : public base { const double x_, y_; foo& r_; foo f_; double d_; ... public: bar(double d, foo& r1, foo& r2); }; bar::bar(double d, foo& r1, foo& r2) : base(d * d), x_(cos(d * d)), y_(sin(d * d)), r_(d > 0.0 ? r1 : r2), f_(exp(d), -exp(d)) { d_ = d; } |
Listing 1 |
Most often the MIL forwards the arguments to the subobject initialisers. In contrast,
bar
constructor’s MIL firstly performs computations with the arguments and then passes the results through. The operations here are still fairly simple to fit in full expressions but had they been more complex (e.g. with branches and loops) the syntactic form of the MIL would be an obstacle.
This article presents some techniques that allow more complex logic in the MIL. It’s not advocating complexity in the MIL, it only shows some ways to achieve this if you have to .
Before looking at these methods, we consider the possibility of avoiding the MIL altogether.
Avoiding the MIL
Notice that
d_
isn’t initialised in the MIL. In this case, the compiler implicitly
initialises
2
d_
and then we
assign
it to
d
in the constructor’s body. Could we do the same for the other subobjects? Not always. Assume that
foo
doesn’t have an accessible default constructor. Then, the compiler can’t implicitly initialise
f_
and yields an error. We simply don’t have a choice and
must
initialise
f_
in the MIL. In addition to subobjects of types without an accessible default constructor, reference members (e.g.
r_
) and
const
members of non class type (e.g.
x_
and
y_
)
must
be explicitly initialised otherwise the compiler complains. Although not enforced by the language, we can add to this list subobjects of
immutable
types – types with no non-
const
methods apart from constructors and a destructor.
It’s possible for some subobjects to be default initialised first and then changed in the constructor’s body. Nevertheless this two-step set up process might be wasteful. Actually, this argument is the most common stated reason to prefer initialisation in the MIL to assignment in constructor [ Meyers05 , §4]. For fundamental types, however, there’s no penalty because default initialisation does nothing and costs nothing.
Initialiser functions
The first idea for complex initialisation is very simple and consists of writing an initialiser function that delivers the final result to direct initialise a subobject. Listing 2 shows this technique applied to our example.
double init_x(double d) { const double b = d * d; const double x = cos(b); return x; } bar::bar(double d, foo& r1, foo& r2) : ... x_(init_x(d)), ... |
Listing 2 |
We emphasise that, in our toy example,
x_
can be directly initialised in the MIL (as seen in Listing 1). Listing 2 is merely a sample for more complex cases.
Most frequently the initialiser function creates a local object of the same type of the subobject that it initialises and returns it by value. Then the subobject is copy- or move-initialised from this value. Therefore, the subobject’s type must be constructible (in particular, it can’t be an abstract class) and also copy- or move-constructible.
Calling the copy- or move-constructor might have a cost. Nevertheless, mainstream compilers implement the return value optimisation [ RVO ] which, under certain circumstances, elides this call. Unfortunately, this doesn’t eliminate the need for the subobject’s type to be copy- or move-constructible.
In another variation, there are initialisers for various arguments that the subobjects’ constructors take. For instance, an initialiser function for base might compute
d * d
and return this value which is then passed to
base
’s constructor. In this way, the argument types, rather than the subobjects, must be constructible and copy- or move-constructible.
It’s worth mentioning that when the subobject is a reference member, the initialiser function must return a reference to a non-local object, otherwise the member will dangle. For instance, an initialiser function for
r_
could be as follows.
foo& init_r(double d, foo& r1, foo& r2) { // r1 and r2 are non-local return d > 0.0 ? r1 : r2; }
A positive aspect of having an initialiser function is that it can be used (and it most likely will be) by many constructors. When there’s no need to reuse the initialiser, C++11 offers the tempting possibility of writing the initialiser function as a lambda expression as shown below. Notice, however, that readability suffers.
x_([&]() -> double { const double b = d * d; // d is captured const double x = cos(b); return x; } (/* parentheses for calling the lambda */) )
Where should the initialiser function be? Assuming that its sole purpose is initialising a class member (so it’s not going to be used anywhere else), then placing it in the global or in a named
namespace
is pollution. Making the initialiser a member of the class might come to mind but this isn’t ideal because it decreases encapsulation [
Meyers00
]. Additionally, this requires the initialiser’s declaration to be in the class header file forcing on clients an artificial dependency on the initialiser function. The best place for it is inside the class source file (which we’re assuming is
not
its header file). Making the initialiser invisible outside the file (by declaring it either static or in an unnamed
namespace
) improves encapsulation and decreases linking time.
Using an initialiser function is the best technique presented in this article as far as encapsulation, clarity and safety are concerned. However, one feature that this solution lacks is the ability to reuse results obtained by one initialiser into another. For instance, the value of
d * d
must be calculated by the initialiser functions of
base
,
x_
and
y_
. In this example, this issue isn’t a big deal but it could be if the result was obtained through a very costly operation.
Classes can have a member whose only purpose is storing a result to be used by different initialiser functions (e.g.
bar
could have a member
b_
to store
d * d
). This is obviously wasteful and, as in this section, we want partial results to have a short lifetime. The next sections present methods to achieve this goal.
Bundling members
We can bundle some related members into a nested
struct
and create an initialiser function for the
struct
rather than for individual members. Listing 3 shows relevant changes to bar needed to initialise the two
const
members in one go.
class bar : public base { struct point { double x, y; }; const point p_; static point init_p(double d); ... }; bar::point bar::init_p(double d) { const double b = d * d; const bar::point p = {cos(b), sin(b)}; return p; } bar::bar(double d, foo& r1, foo& r2) : ... p_(init_p(d)), ... |
Listing 3 |
As in the previous section, the type returned by the initialiser function must be copy- or move-constructible and so do the
struct
members.
The initialiser function needs access to the nested
struct
. Ideally, this type will be
private
and the initialiser will be a
static private
member. The initialiser could be a
friend
but, being an implementation detail, hiding it inside the class is advisable. (Unfortunately, it can’t be hidden as much as in the previous section.) Alternatively, the initialiser function can be non-member and non-
friend
provided that the
struct
is made
public
but this decreases encapsulation even further.
We can’t include base classes in the
struct
and each of them needs a different initialiser function. However, as in our example, the initialiser function of a base class could profit from results obtained by other initialiser functions. The next section shows how to achieve this goal.
Using an argument for temporary storage
In rare cases we can change the value of an argument to something that is more reusable. Listing 4 is an attempt for our example and consists of changing
d
to
d * d
just before initialising
base
. Unfortunately, this doesn’t work here since initialisations of
r_
,
f_
and
d_
need the original value of
d
but they also get the new one.
bar::bar(double d, foo& r1, foo& r2) : base(d = d * d), // d has a new value x_(cos(d)), y_(sin(d)), // OK : uses new value r_(d > 0.0 ? r1 : r2), // BUG: uses new value f_(exp(d), -exp(d)) { // BUG: uses new value d_ = d; // BUG: uses new value } |
Listing 4 |
A fix for the issue above is to use a dummy argument for temporary storage and giving it a default value to avoid bothering clients. This technique is in practice in Listing 5.
class bar : public base { ... public: bar(double d, foo& r1, foo& r2, double b = 0.0); }; bar::bar(double d, foo& r1, foo& r2, double b) : base(b = d * d), // b has a new value x_(cos(b)), y_(sin(b)), // OK : uses b = d * d r_(d > 0.0 ? r1 : r2), // OK : uses d f_(exp(d), -exp(d)) { // OK : uses d d_ = d; // OK : uses d } |
Listing 5 |
This works because the dummy argument persists for a short period but long enough to be reused by different initialisers. More precisely, its lifetime starts before the first initialisation of a subobject (
base
in our example) and ends after the constructor exits.
A problem (alas, there will be others) with this approach is that the constructor’s extended signature might conflict with another one. If it doesn’t today, it might tomorrow. As an improvement, we create a new type for the storage. For better encapsulation this type is nested in the
private
section of the class as Listing 6 illustrates.
class bar : public base { struct storage { double b; }; ... public: bar(double d, foo& r1, foo& r2, storage tmp = storage()); }; bar::bar(double d, foo& r1, foo& r2, storage tmp) : base(tmp.b = d * d), x_(cos(tmp.b)), y_(sin(tmp.b)), ... |
Listing 6 |
The simplicity of our example is misleading because the assignment
tmp.b = d * d
can be nicely put in the MIL whereas in more realistic scenarios
tmp
might need a more complex set up. It can be done, for instance, in
base
’s initialiser function by making it take a storage argument by reference as Listing 7 shows.
double bar::init_base(double d, storage& tmp) { tmp.b = d * d; return tmp.b; } double bar::init_x(const storage& tmp) { const double x = cos(tmp.b); return x; } bar::bar(double d, foo& r1, foo& r2, storage tmp) : base(init_base(d, tmp)), x_(init_x(tmp)), ... |
Listing 7 |
Notice that
tmp
is passing through the two-step set up process that we have previously advised against. Could we forward
d
to
storage
’s constructor to avoid the default initialisation? For this,
bar
’s constructor requires a declaration similar to
bar(double d, foo& r1, foo& r2, storage tmp = storage(d));
Unfortunately, this isn’t legal. The evaluation of one argument can’t refer to others. Indeed, it’s fairly well known that in a function call the order of argument evaluation is undefined. If the code above were allowed, then we could not be sure that the evaluation of
tmp
occurs after that of
d
. Recall that if
storage
consists of fundamental types only, then the default initialisation costs nothing. If it contains a member of non-fundamental type, then the technique presented in the next section applies to prevent default initialisation of a member. The method is general and equally applies to
bar
itself.
A very important warning is in order before leaving this section. Unfortunately, the method presented here is unsafe! The main issue is that the technique is very dependent on the order of initialisation of subobjects. In our example,
base
is the first subobject to be initialised. For this reason,
init_base
had the responsibility of setting up
tmp
before it could be used by
init_x
. The order of initialisation of subobjects is very sensitive to changes in the class. To mitigate this issue you can create a reusable empty class, say,
first_base
, that as its name indicates, must be the first base of a class to which we want to apply the technique presented here. Furthermore, this class’ initialiser function will have the responsibility of setting up the temporary storage as shown in Listing 8.
class first_base { protected: explicit first_base(int) { // does nothing } }; class bar : first_base, public base { ... }; int bar::init_first_base(double d, storage& tmp) { tmp.b = d * d; return 0; } double bar::init_base(const storage& tmp) { return tmp.b; } bar::bar(double d, foo& r1, foo& r2, storage tmp) : first_base(init_first_base(d, tmp)), base(init_base(tmp)), ... |
Listing 8 |
The use of
first_base
makes the code safer, clear and
almost
solves the problem. Even when
first_base
is the first in the list of base classes, there’s still a chance that it’s not going to be the first subobject to be initialised. This occurs when the derived class has a direct or indirect virtual base class because virtual bases are initialised first. Experience shows that only a minority of inheritances are virtual and, therefore, this issue is unlikely to happen. However, it’s always good to play safe. So, to be 100% sure, it suffices to virtually inherit from
first_base
(always keeping it as the first base in the list). The price that a class has to pay for this extra safety is carrying an extra pointer.
Delaying initialisation
We arrive at the final technique of this article. The basic idea is delaying the initialisation of a subobject until the constructor’s body where more complex code can sit.
Compilers have a duty of trying to ensure that every object of class type is properly initialised before being used. Their way to perform this task is calling the default constructor whenever the programmer doesn’t explicitly call one. However, C++11 offers a loophole that we can exploit to prevent the compiler calling the default constructor.
The underlying pattern that supports delayed initialisation is the tagged union [ TU ], also known by various other names (e.g. discriminated union, variant type ). A tagged union can hold objects of different types but at any time keeps track of the type currently held. Frequently, default initialisation of a tagged union means either no initialisation at all or default initialisation of a particular type (which again might mean no initialisation at all).
In general, tagged unions are implemented in C/C++ through unions. Unfortunately, the constraints that C++03 imposes on types that can be members of unions are quite strict and implementing tagged unions demands a lot of effort [ Alexandrescu02 ]. C++11 relaxes the constraints on union members and gives more power to programmers. However, this come with a cost: now the programmer is responsible for assuring proper initialisation of union members. The technique that we shall see now relies on C++11. Later we shall see what can be done in C++03.
Class
foo
has no accessible default constructor and we are forced to initialise
f_
in the MIL to prevent a compiler error. We want to postpone the initialisation of
f_
to the constructor’s body where we can compute, store and reuse
exp(d)
. This can be achieved by putting
f_
inside an unnamed
union
as shown in Listing 9.
class bar : public base { union { // unnamed union type foo f_; }; ... }; bar::bar(double d, foo& r1, foo& r2) : ... /* no f_ in the MIL */ { const double e = exp(d); new (&f_) foo(e, -e); } bar::~bar() { (&f_)->~foo(); } |
Listing 9 |
Since the
union
is unnamed all its members (only
f_
in this case) are seen as if they were members of
bar
but the compiler forgoes their initialisations. A member of the
union
can be initialised in the constructor’s body through a placement
new
. In Listing 9 this builds an object of type
foo
in the address pointed by
&f_
or, in other words, the
this
pointer inside
foo
’s constructor will be set to
&f_
. Simple, beautiful and efficient – but this isn’t the end of the story.
The compiler neither initialises a member of a
union
nor destroys it. Ensuring proper destruction is again the programmer’s responsibility. Previously – listings 1–8 – the destruction of
f_
was called when its containing
bar
object was destroyed. To imitate this behaviour, the new
bar
’s destructor calls
~foo()on
the object pointed by
&f_
.
We have just written a destructor, and the rule of three says that we probably need to write a copy-constructor and an assignment operator as well. This is the case here. In addition, there are extra dangers that we must consider. For instance, a new constructor might be added to
bar
and the writer might forget to initialise
f_
. If a bar object is built by this constructor, then at destruction time (probably earlier)
f_
will be used. The code is then in undefined behaviour situation. To avoid this and other issues, we use a
bool
flag to signal whether
f_
has been initialised or not. When an attempt to use an uninitialised
f_
is made, the code might inform you by, say, throwing an exception. However,
bar
’s destructor can be more forgiving and ignore
f_
if it’s uninitialised. (Recall that a destructor shouldn’t throw anyway.)
Instead of forcing
bar
to manage
f_
’s usage and lifetime, it’s better to encapsulate this task in a generic template class called, say,
delayed_init
. Listing 10 shows a rough draft of an implementation. A more complete version is available in [
Neri
] but
don’t use it
(I repeat,
don’t use it
) because Boost.Optional [
Optional
] is a better alternative. Indeed, it’s a mature library that has been heavily tested over the last few years and also works with C++03.
delayed_init
is presented for didactic purposes only. As mentioned above,
union
rules in C++03 are strict and make the implementation of
boost::optional
more complex and difficult to understand. In contrast,
delayed_init
assumes C++11 rules and has a simpler code. See
delayed_init
as a draft of what
boost::optional
could be if written in C++11. Even though, Fernando Cacciola – the author of Boost.Optional – and Andrzej Krzemienski are working on a proposal [
Proposal
] for
optional
to be added to the C++ Standard Library. This idea has already been praised by a few members of the committee.
template <typename T> class delayed_init { bool is_init_ = false; union { T obj_; }; public: delayed_init() { } ~delayed_init() { if (is_init) (&obj_)->~T() } template <typename... Args> void init(Args&&... args) { new (&obj_) T(std::forward<Args>(args)...); is_init_ = true; } T* operator->() { return is_init_ ? &obj_ : nullptr; } T& operator*() const { if (is_init_) return obj_; throw std::logic_error("attempt to use " "uninitialised object"); } ... }; |
Listing 10 |
Let’s see what
delayed_init
looks like. Its member
is_init_
is initialised to false using the new
brace-or-equal initialisation
feature of C++11. Therefore, we don’t need to do it in the MIL. This leaves the default constructor empty and you might wonder why bother writing this constructor since the compiler will automatically implement one exactly as ours. Actually, it won’t because
delayed_init
has an unnamed
union
member (which is the whole point of this template class).
When the time comes to initialise the inner object, it suffices to call
init()
. This method is a
variadic template
function – another welcome and celebrated C++11 novelty – that takes an arbitrary number of arguments (indicated by the ellipsis
...
) of arbitrary types by
universal reference
[
Meyers12
] (indicated by
Args&&
where
Args
is deduced). These arguments are simply handed over to
T
’s constructor via
std::forward
. (Take another look at this pattern since it’s expected to become more and more frequent.)
Also note the presence of
operator->()
. Essentially, the class
delayed_init<T>
is a wrapper to a type
T
. We wish it could be used as a
T
by implementing
T
’s
public
interface and simply forwarding calls to
obj_
. This is impossible since
T
is unknown. A close alternative is returning a pointer to
obj_
because
T*
replicates
T
’s interface with slightly different syntax and semantics. Actually, pointer semantics fits very naturally here. Indeed, it’s common for a class to hold a pointer to an object rather than the object itself. In this way, the class can delay the object’s initialisation to a later moment where all data required for the construction is gathered. At this time the object is created on the heap and its address is stored by the pointer. Through
delayed_init
, we are basically replacing the heap with internal storage and, like in a smart pointer, managing the object’s lifetime. Finally, the
operator*()
is also implemented. It provides access to
obj_
and throws if
obj_
hasn’t been initialised.
Conclusion
Initialisation in the MIL rather than assignment in the constructor has been advocated for long time. However, in some circumstances, there’s genuine need for not so simple initialisations which conflict with the poorness of the MIL’s syntax. This article has presented four techniques to overcome this situation. They vary in applicability, clarity and safety. On the way it presented some of the new C++11 features.
Acknowledgements
Cassio Neri thanks Fernando Cacciola and Lorenz Schneider for their suggestions and careful reading of this article. He also thanks the Overload team for valuable remarks and feedback.
References
[Alexandrescu02] Andrei Alexandrescu, Generic: Discriminated Unions (I), (II) & (III), Dr.Dobb’s, June 2002. http://tinyurl.com/8srld2z http://tinyurl.com/9tofeq4 http://tinyurl.com/8ku347d
[Meyers00] Scott Meyers, How Non-Member Functions Improve Encapsulation, Dr.Dobb’s , February 2000. http://tinyurl.com/8er3ybp
[Meyers05] Scott Meyers, Effective C++ , Addison-Wesley 2005.
[Meyers12] Scott Meyers, Universal References in C++11, Overload 111, October 2012. http://tinyurl.com/9akcqjl
[Neri] Cassio Neri, delayed_init implementation. https://github.com/cassioneri/delayed_init
[Optional] Fernando Cacciola, Boost.Optional. http://tinyurl.com/8ctk6rf
[Proposal] Fernando Cacciola and Andrzej Krzemienski, A proposal to add a utility class to represent optional objects (Revision 2), September 2012. http://tinyurl.com/bvyfjq7
[RVO] Return Value Optimization, Wikipedia. http://tinyurl.com/kpmvdw
[TU] Tagged Union, Wikipedia. http://tinyurl.com/42p5tuz
- C++11 also allows the use of braces but their semantics are different and outside the scope of this article. Therefore, we shall consider only parenthesised initialisations and their C++03 semantics.
- It’s unfortunate but according to C++ Standard definitions, sometimes – as in this particular case – initialisation means doing nothing and the value of the object is indeterminate.