Complex Logic in the Member Initialiser List

The syntactic form of the member initialiser list restricts the logic that it contains. Cassio Neri presents some techniques to overcome these constraints.

In C++, during a constructor call, before execution gets into its body all subobjects – base classes and non-static data members – of the class are initialised. (In C++11, this rule has an exception which we shall exploit later.) The member initialiser list (MIL) lets the programmer customise this initialisation. A subobject is initialised from a parenthesised ¹ list of expressions that follows its identifier in the MIL. The MIL of bar ’s constructor is emphasised in Listing 1.

class base {
  ...
public:
  base(double b);
};

class foo {
  ...
public:
  foo(double f1, double f2);
};

class bar : public base {
  const  double x_, y_;
  foo&   r_;
  foo    f_;
  double d_;
  ...
public:
  bar(double d, foo& r1, foo& r2);
};

bar::bar(double d, foo& r1, foo& r2)
: base(d * d), x_(cos(d * d)), y_(sin(d * d)),
  r_(d > 0.0 ? r1 : r2), f_(exp(d), -exp(d))
{
  d_ = d;
}

Listing 1

Most often the MIL forwards the arguments to the subobject initialisers. In contrast, bar constructor’s MIL firstly performs computations with the arguments and then passes the results through. The operations here are still fairly simple to fit in full expressions but had they been more complex (e.g. with branches and loops) the syntactic form of the MIL would be an obstacle.

This article presents some techniques that allow more complex logic in the MIL. It’s not advocating complexity in the MIL, it only shows some ways to achieve this if you have to .

Before looking at these methods, we consider the possibility of avoiding the MIL altogether.

Avoiding the MIL

Notice that d_ isn’t initialised in the MIL. In this case, the compiler implicitly initialises ² d_ and then we assign it to d in the constructor’s body. Could we do the same for the other subobjects? Not always. Assume that foo doesn’t have an accessible default constructor. Then, the compiler can’t implicitly initialise f_ and yields an error. We simply don’t have a choice and must initialise f_ in the MIL. In addition to subobjects of types without an accessible default constructor, reference members (e.g. r_ ) and const members of non class type (e.g. x_ and y_ ) must be explicitly initialised otherwise the compiler complains. Although not enforced by the language, we can add to this list subobjects of immutable types – types with no non- const methods apart from constructors and a destructor.

It’s possible for some subobjects to be default initialised first and then changed in the constructor’s body. Nevertheless this two-step set up process might be wasteful. Actually, this argument is the most common stated reason to prefer initialisation in the MIL to assignment in constructor [ Meyers05 , §4]. For fundamental types, however, there’s no penalty because default initialisation does nothing and costs nothing.

Initialiser functions

The first idea for complex initialisation is very simple and consists of writing an initialiser function that delivers the final result to direct initialise a subobject. Listing 2 shows this technique applied to our example.

double init_x(double d) {
  const double b = d * d;
  const double x = cos(b);
  return x;
}

bar::bar(double d, foo& r1, foo& r2)
: ... x_(init_x(d)), ...

Listing 2

We emphasise that, in our toy example, x_ can be directly initialised in the MIL (as seen in Listing 1). Listing 2 is merely a sample for more complex cases.

Most frequently the initialiser function creates a local object of the same type of the subobject that it initialises and returns it by value. Then the subobject is copy- or move-initialised from this value. Therefore, the subobject’s type must be constructible (in particular, it can’t be an abstract class) and also copy- or move-constructible.

Calling the copy- or move-constructor might have a cost. Nevertheless, mainstream compilers implement the return value optimisation [ RVO ] which, under certain circumstances, elides this call. Unfortunately, this doesn’t eliminate the need for the subobject’s type to be copy- or move-constructible.

In another variation, there are initialisers for various arguments that the subobjects’ constructors take. For instance, an initialiser function for base might compute d * d and return this value which is then passed to base ’s constructor. In this way, the argument types, rather than the subobjects, must be constructible and copy- or move-constructible.

It’s worth mentioning that when the subobject is a reference member, the initialiser function must return a reference to a non-local object, otherwise the member will dangle. For instance, an initialiser function for r_ could be as follows.

  foo& init_r(double d, foo& r1, foo& r2) {
    // r1 and r2 are non-local
    return d > 0.0 ? r1 : r2;
  }

A positive aspect of having an initialiser function is that it can be used (and it most likely will be) by many constructors. When there’s no need to reuse the initialiser, C++11 offers the tempting possibility of writing the initialiser function as a lambda expression as shown below. Notice, however, that readability suffers.

  x_([&]() -> double {
    const double b = d * d; // d is captured
    const double x = cos(b);
    return x;
  } (/* parentheses for calling the lambda */) )

Where should the initialiser function be? Assuming that its sole purpose is initialising a class member (so it’s not going to be used anywhere else), then placing it in the global or in a named namespace is pollution. Making the initialiser a member of the class might come to mind but this isn’t ideal because it decreases encapsulation [ Meyers00 ]. Additionally, this requires the initialiser’s declaration to be in the class header file forcing on clients an artificial dependency on the initialiser function. The best place for it is inside the class source file (which we’re assuming is not its header file). Making the initialiser invisible outside the file (by declaring it either static or in an unnamed namespace ) improves encapsulation and decreases linking time.

Using an initialiser function is the best technique presented in this article as far as encapsulation, clarity and safety are concerned. However, one feature that this solution lacks is the ability to reuse results obtained by one initialiser into another. For instance, the value of d * d must be calculated by the initialiser functions of base , x_ and y_ . In this example, this issue isn’t a big deal but it could be if the result was obtained through a very costly operation.

Classes can have a member whose only purpose is storing a result to be used by different initialiser functions (e.g. bar could have a member b_ to store d * d ). This is obviously wasteful and, as in this section, we want partial results to have a short lifetime. The next sections present methods to achieve this goal.

Bundling members

We can bundle some related members into a nested struct and create an initialiser function for the struct rather than for individual members. Listing 3 shows relevant changes to bar needed to initialise the two const members in one go.

class bar : public base {
  struct point {
    double x, y;
  };
  const point p_;
  static point init_p(double d);
  ...
};

bar::point bar::init_p(double d) {
  const double     b = d * d;
  const bar::point p = {cos(b), sin(b)};
  return p;
}

bar::bar(double d, foo& r1, foo& r2)
:  ... p_(init_p(d)), ...

Listing 3

As in the previous section, the type returned by the initialiser function must be copy- or move-constructible and so do the struct members.

The initialiser function needs access to the nested struct . Ideally, this type will be private and the initialiser will be a static private member. The initialiser could be a friend but, being an implementation detail, hiding it inside the class is advisable. (Unfortunately, it can’t be hidden as much as in the previous section.) Alternatively, the initialiser function can be non-member and non- friend provided that the struct is made public but this decreases encapsulation even further.

We can’t include base classes in the struct and each of them needs a different initialiser function. However, as in our example, the initialiser function of a base class could profit from results obtained by other initialiser functions. The next section shows how to achieve this goal.

Using an argument for temporary storage

In rare cases we can change the value of an argument to something that is more reusable. Listing 4 is an attempt for our example and consists of changing d to d * d just before initialising base . Unfortunately, this doesn’t work here since initialisations of r_ , f_ and d_ need the original value of d but they also get the new one.

bar::bar(double d, foo& r1, foo& r2)
: base(d = d * d),        // d has a new value
  x_(cos(d)), y_(sin(d)), // OK : uses new value
  r_(d > 0.0 ? r1 : r2),  // BUG: uses new value
  f_(exp(d), -exp(d)) {   // BUG: uses new value
  d_ = d;                 // BUG: uses new value
}

Listing 4

A fix for the issue above is to use a dummy argument for temporary storage and giving it a default value to avoid bothering clients. This technique is in practice in Listing 5.

class bar : public base {
  ...
public:
  bar(double d, foo& r1, foo& r2, double b = 0.0);
};

bar::bar(double d, foo& r1, foo& r2, double b)
: base(b = d * d),        // b has a new value
  x_(cos(b)), y_(sin(b)), // OK : uses b = d * d
  r_(d > 0.0 ? r1 : r2),  // OK : uses d
  f_(exp(d), -exp(d)) {   // OK : uses d
  d_ = d;                 // OK : uses d
}

Listing 5

This works because the dummy argument persists for a short period but long enough to be reused by different initialisers. More precisely, its lifetime starts before the first initialisation of a subobject ( base in our example) and ends after the constructor exits.

A problem (alas, there will be others) with this approach is that the constructor’s extended signature might conflict with another one. If it doesn’t today, it might tomorrow. As an improvement, we create a new type for the storage. For better encapsulation this type is nested in the private section of the class as Listing 6 illustrates.

class bar : public base {
  struct storage {
    double b;
  };
  ...
public:
  bar(double d, foo& r1, foo& r2,
      storage tmp = storage());
};

bar::bar(double d, foo& r1, foo& r2, storage tmp)
: base(tmp.b = d * d),
  x_(cos(tmp.b)), y_(sin(tmp.b)), ...

Listing 6

The simplicity of our example is misleading because the assignment tmp.b = d * d can be nicely put in the MIL whereas in more realistic scenarios tmp might need a more complex set up. It can be done, for instance, in base ’s initialiser function by making it take a storage argument by reference as Listing 7 shows.

double bar::init_base(double d, storage& tmp) {
  tmp.b = d * d;
  return tmp.b;
}

double bar::init_x(const storage& tmp) {
  const double x = cos(tmp.b);
  return x;
}

bar::bar(double d, foo& r1, foo& r2, storage tmp)
: base(init_base(d, tmp)), x_(init_x(tmp)), ...

Listing 7

Notice that tmp is passing through the two-step set up process that we have previously advised against. Could we forward d to storage ’s constructor to avoid the default initialisation? For this, bar ’s constructor requires a declaration similar to

  bar(double d, foo& r1, foo& r2,
     storage tmp = storage(d));

Unfortunately, this isn’t legal. The evaluation of one argument can’t refer to others. Indeed, it’s fairly well known that in a function call the order of argument evaluation is undefined. If the code above were allowed, then we could not be sure that the evaluation of tmp occurs after that of d . Recall that if storage consists of fundamental types only, then the default initialisation costs nothing. If it contains a member of non-fundamental type, then the technique presented in the next section applies to prevent default initialisation of a member. The method is general and equally applies to bar itself.

A very important warning is in order before leaving this section. Unfortunately, the method presented here is unsafe! The main issue is that the technique is very dependent on the order of initialisation of subobjects. In our example, base is the first subobject to be initialised. For this reason, init_base had the responsibility of setting up tmp before it could be used by init_x . The order of initialisation of subobjects is very sensitive to changes in the class. To mitigate this issue you can create a reusable empty class, say, first_base , that as its name indicates, must be the first base of a class to which we want to apply the technique presented here. Furthermore, this class’ initialiser function will have the responsibility of setting up the temporary storage as shown in Listing 8.

class first_base {
protected:
  explicit first_base(int) { // does nothing
  }
};

class bar : first_base, public base {
  ...
};

int bar::init_first_base(double d, storage& tmp) {
  tmp.b = d * d;
  return 0;
}

double bar::init_base(const storage& tmp) {
  return tmp.b;
}

bar::bar(double d, foo& r1, foo& r2, storage tmp)
: first_base(init_first_base(d, tmp)),
  base(init_base(tmp)), ...

Listing 8

The use of first_base makes the code safer, clear and almost solves the problem. Even when first_base is the first in the list of base classes, there’s still a chance that it’s not going to be the first subobject to be initialised. This occurs when the derived class has a direct or indirect virtual base class because virtual bases are initialised first. Experience shows that only a minority of inheritances are virtual and, therefore, this issue is unlikely to happen. However, it’s always good to play safe. So, to be 100% sure, it suffices to virtually inherit from first_base (always keeping it as the first base in the list). The price that a class has to pay for this extra safety is carrying an extra pointer.

Delaying initialisation

We arrive at the final technique of this article. The basic idea is delaying the initialisation of a subobject until the constructor’s body where more complex code can sit.

Compilers have a duty of trying to ensure that every object of class type is properly initialised before being used. Their way to perform this task is calling the default constructor whenever the programmer doesn’t explicitly call one. However, C++11 offers a loophole that we can exploit to prevent the compiler calling the default constructor.

The underlying pattern that supports delayed initialisation is the tagged union [ TU ], also known by various other names (e.g. discriminated union, variant type ). A tagged union can hold objects of different types but at any time keeps track of the type currently held. Frequently, default initialisation of a tagged union means either no initialisation at all or default initialisation of a particular type (which again might mean no initialisation at all).

In general, tagged unions are implemented in C/C++ through unions. Unfortunately, the constraints that C++03 imposes on types that can be members of unions are quite strict and implementing tagged unions demands a lot of effort [ Alexandrescu02 ]. C++11 relaxes the constraints on union members and gives more power to programmers. However, this come with a cost: now the programmer is responsible for assuring proper initialisation of union members. The technique that we shall see now relies on C++11. Later we shall see what can be done in C++03.

Class foo has no accessible default constructor and we are forced to initialise f_ in the MIL to prevent a compiler error. We want to postpone the initialisation of f_ to the constructor’s body where we can compute, store and reuse exp(d) . This can be achieved by putting f_ inside an unnamed union as shown in Listing 9.

class bar : public base {
  union { // unnamed union type
    foo f_;
  };
  ...
};

bar::bar(double d, foo& r1, foo& r2)
: ... /* no f_ in the MIL */ {
  const double e = exp(d);
  new (&f_) foo(e, -e);
}

bar::~bar() {
  (&f_)->~foo();
}

Listing 9

Since the union is unnamed all its members (only f_ in this case) are seen as if they were members of bar but the compiler forgoes their initialisations. A member of the union can be initialised in the constructor’s body through a placement new . In Listing 9 this builds an object of type foo in the address pointed by &f_ or, in other words, the this pointer inside foo ’s constructor will be set to &f_ . Simple, beautiful and efficient – but this isn’t the end of the story.

The compiler neither initialises a member of a union nor destroys it. Ensuring proper destruction is again the programmer’s responsibility. Previously – listings 1–8 – the destruction of f_ was called when its containing bar object was destroyed. To imitate this behaviour, the new bar ’s destructor calls ~foo()on the object pointed by &f_ .

We have just written a destructor, and the rule of three says that we probably need to write a copy-constructor and an assignment operator as well. This is the case here. In addition, there are extra dangers that we must consider. For instance, a new constructor might be added to bar and the writer might forget to initialise f_ . If a bar object is built by this constructor, then at destruction time (probably earlier) f_ will be used. The code is then in undefined behaviour situation. To avoid this and other issues, we use a bool flag to signal whether f_ has been initialised or not. When an attempt to use an uninitialised f_ is made, the code might inform you by, say, throwing an exception. However, bar ’s destructor can be more forgiving and ignore f_ if it’s uninitialised. (Recall that a destructor shouldn’t throw anyway.)

Instead of forcing bar to manage f_ ’s usage and lifetime, it’s better to encapsulate this task in a generic template class called, say, delayed_init . Listing 10 shows a rough draft of an implementation. A more complete version is available in [ Neri ] but don’t use it (I repeat, don’t use it ) because Boost.Optional [ Optional ] is a better alternative. Indeed, it’s a mature library that has been heavily tested over the last few years and also works with C++03. delayed_init is presented for didactic purposes only. As mentioned above, union rules in C++03 are strict and make the implementation of boost::optional more complex and difficult to understand. In contrast, delayed_init assumes C++11 rules and has a simpler code. See delayed_init as a draft of what boost::optional could be if written in C++11. Even though, Fernando Cacciola – the author of Boost.Optional – and Andrzej Krzemienski are working on a proposal [ Proposal ] for optional to be added to the C++ Standard Library. This idea has already been praised by a few members of the committee.

template <typename T>
class delayed_init {
  bool is_init_ = false;
  union {
    T obj_;
  };

public:
  delayed_init() {
  }
  ~delayed_init() {
    if (is_init)
      (&obj_)->~T()
  }

  template <typename... Args>
  void init(Args&&... args) {
    new (&obj_) T(std::forward<Args>(args)...);
    is_init_ = true;
  }
  T* operator->() {
    return is_init_ ? &obj_ : nullptr;
  }
  T& operator*() const {
    if (is_init_)
      return obj_;
    throw std::logic_error("attempt to use "
      "uninitialised object");
  } 
  ...
};

Listing 10

Let’s see what delayed_init looks like. Its member is_init_ is initialised to false using the new brace-or-equal initialisation feature of C++11. Therefore, we don’t need to do it in the MIL. This leaves the default constructor empty and you might wonder why bother writing this constructor since the compiler will automatically implement one exactly as ours. Actually, it won’t because delayed_init has an unnamed union member (which is the whole point of this template class).

When the time comes to initialise the inner object, it suffices to call init() . This method is a variadic template function – another welcome and celebrated C++11 novelty – that takes an arbitrary number of arguments (indicated by the ellipsis ... ) of arbitrary types by universal reference [ Meyers12 ] (indicated by Args&& where Args is deduced). These arguments are simply handed over to T ’s constructor via std::forward . (Take another look at this pattern since it’s expected to become more and more frequent.)

Also note the presence of operator->() . Essentially, the class delayed_init<T> is a wrapper to a type T . We wish it could be used as a T by implementing T ’s public interface and simply forwarding calls to obj_ . This is impossible since T is unknown. A close alternative is returning a pointer to obj_ because T* replicates T ’s interface with slightly different syntax and semantics. Actually, pointer semantics fits very naturally here. Indeed, it’s common for a class to hold a pointer to an object rather than the object itself. In this way, the class can delay the object’s initialisation to a later moment where all data required for the construction is gathered. At this time the object is created on the heap and its address is stored by the pointer. Through delayed_init , we are basically replacing the heap with internal storage and, like in a smart pointer, managing the object’s lifetime. Finally, the operator*() is also implemented. It provides access to obj_ and throws if obj_ hasn’t been initialised.

Conclusion

Initialisation in the MIL rather than assignment in the constructor has been advocated for long time. However, in some circumstances, there’s genuine need for not so simple initialisations which conflict with the poorness of the MIL’s syntax. This article has presented four techniques to overcome this situation. They vary in applicability, clarity and safety. On the way it presented some of the new C++11 features.

Acknowledgements

Cassio Neri thanks Fernando Cacciola and Lorenz Schneider for their suggestions and careful reading of this article. He also thanks the Overload team for valuable remarks and feedback.

References

[Alexandrescu02] Andrei Alexandrescu, Generic: Discriminated Unions (I), (II) & (III), Dr.Dobb’s, June 2002. http://tinyurl.com/8srld2z http://tinyurl.com/9tofeq4 http://tinyurl.com/8ku347d

[Meyers00] Scott Meyers, How Non-Member Functions Improve Encapsulation, Dr.Dobb’s , February 2000. http://tinyurl.com/8er3ybp

[Meyers05] Scott Meyers, Effective C++ , Addison-Wesley 2005.

[Meyers12] Scott Meyers, Universal References in C++11, Overload 111, October 2012. http://tinyurl.com/9akcqjl

[Neri] Cassio Neri, delayed_init implementation. https://github.com/cassioneri/delayed_init

[Optional] Fernando Cacciola, Boost.Optional. http://tinyurl.com/8ctk6rf

[Proposal] Fernando Cacciola and Andrzej Krzemienski, A proposal to add a utility class to represent optional objects (Revision 2), September 2012. http://tinyurl.com/bvyfjq7

[RVO] Return Value Optimization, Wikipedia. http://tinyurl.com/kpmvdw

[TU] Tagged Union, Wikipedia. http://tinyurl.com/42p5tuz

C++11 also allows the use of braces but their semantics are different and outside the scope of this article. Therefore, we shall consider only parenthesised initialisations and their C++03 semantics.
It’s unfortunate but according to C++ Standard definitions, sometimes – as in this particular case – initialisation means doing nothing and the value of the object is indeterminate.