ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinInitialization in C++ is Bonkers

Overload Journal #139 - June 2017 + Programming Topics   Author: Simon Brand
Uninitialised variables can cause problems. Simon Brand reminds us how complicated it can get.

C++ pop quiz time: what are the values of a.a and b.b on the last line in main of this program? (Listing 1)

#include <iostream>

struct foo {
    foo() = default;
    int a;
};
struct bar {
    bar();
    int b;
};
bar::bar() = default;

int main() {
    foo a{};
    bar b{};
    std::cout << a.a << ' ' << b.b;
}
			
Listing 1

The answer is that a.a is 0 and b.b is indeterminate, so reading it is undefined behaviour. Why? Because initialization in C++ is bonkers.

Default-, value-, and zero-initialization

Before we get into the details which cause this, I’ll introduce the concepts of default-, value- and zero-initialization. Feel free to skip this section if you’re already familiar with these (Listing 2).

T global;    //zero-initialization, then
             // default-initialization
void foo() {
  T i;       //default-initialization
  T j{};     //value-initialization (C++11)
  T k = T(); //value-initialization
  T l = T{}; //value-initialization (C++11)
  T m();     //function-declaration
  new T;     //default-initialization
  new T();   //value-initialization
  new T{};   //value-initialization (C++11)
}

//t is value-initialized
struct A { T t; A() : t() {} };
//t is value-initialized (C++11)
struct B { T t; B() : t{} {} }; 
//t is default-initialized
struct C { T t; C()       {} };
			
Listing 2

The rules for these different initialization forms are fairly complex, so I’ll give a simplified outline of the C++11 rules (C++14 even changed some of them, so those value-initialization forms can be aggregate initialization). If you want to understand all the details of these forms, check out the relevant cppreference.com articles, or see the standards quotes at the bottom of the article.

  • default-initialization – If T is a class, the default constructor is called; if it’s an array, each element is default-initialized; otherwise, no initialization is done, resulting in indeterminate values. [cppref1]
  • value-initialization – If T is a class, the object is default-initialized (after being zero-initialized if T’s default constructor is not user-provided/deleted); if it’s an array, each element is value-initialized; otherwise, the object is zero-initialized. [cppref2]
  • zero-initialization – Applied to static and thread-local variables before any other initialization. If T is scalar (arithmetic, pointer, enum), it is initialized from 0; if it’s a class type, all base classes and data members are zero-initialized; if it’s an array, each element is zero-initialized. [cppref3]

Taking the simple example of int as T, global and all of the value-initialized variables will have the value 0, and all other variables will have an indeterminate value. Reading these indeterminate values results in undefined behaviour.

Back to our original example

Now we have the necessary knowledge to understand what’s going on in my original example. Essentially, the behaviours of foo and bar are changed by the different location of =default on their constructors. Again, the relevant standards passages are down at the bottom of the article if you want them, but the gist is this:

Since the constructor for foo is defaulted on its first declaration, it is not technically user-provided – I’ll explain what this term means shortly, just accept this standardese for now. The constructor for bar, conversely, is only defaulted at its definition, so it is user-provided. Put another way, if you don’t want your constructor to be user-provided, be sure to write =default when you declare it rather than define it like that elsewhere. This rule makes sense when you think about it: without having access to the definition of a constructor, a translation unit can’t know if it is going to be a simple compiler-generated one, or if it’s going to send a telegram to the Moon to retrieve some data and block until it gets a response.

The default constructor being user-provided has a few consequences for the class type. For example, you can’t default-initialize a const-qualified object if it lacks a user-provided constructor, the notion being that if the object should only be set once, it better be initialised with something reasonable:

  //ill-formed, no user-provided constructor
  const int my_int;
  
  //well-formed, has a user-provided constructor
  const std::string my_string;
  
  //ill-formed, no user-provided constructor
  const foo my_foo;
  
  //well-formed, has a user-provided constructor
  const bar my_bar;

Additionally, in order to be trivial (and therefore POD) or an aggregate, a class must have no user-provided constructors. Don’t worry if you don’t know those terms, it suffices to know that whether your constructors are user-provided or not modifies some of the restrictions of what you can do with that class and how it acts.

For our first example, however, we’re interested in how user-provided constructors interact with initialization rules. The language mandates that both a and b are value-initialized, but only a is additionally zero-initialized. Zero-initialization for a gives a.a the value 0, whereas b.b is not initialized at all, giving us undefined behaviour if we attempt to read it. This is a very subtle distinction which has inadvertently changed our program from executing safely to summoning nasal demons/eating your cat/ordering pizza/your favourite undefined behaviour metaphor.

Fortunately, there’s a simple solution. At the risk of repeating advice which has been given many times before, initialize your variables.

Seriously.

Do it.

INITIALIZE YOUR GORRAM VARIABLES.

If the designer of foo and bar decides that they should be default constructible, they should initialize their contents with some sensible values. If they decide that they should not be default constructible, they should delete the constructors to avoid issues. (See Listing 3.)

struct foo {
  foo() : a{0} {} //initialize to 0 explicitly
  int a;
};

struct bar {
  bar() = delete; //delete constructor
  //insert non-default constructor which does
  // something sensible here
  int b;
};
			
Listing 3

Internalising this way of thinking about initialization is key to writing unsurprising code. If you’ve profiled your code and found a bottleneck caused by unnecessary initialization, then sure, optimise it, but you best be certain that the extra performance is worth the possible headaches and money spent to keep the code safe.

If you still aren’t convinced that C++ initialization rules are crazy-complex, take a minute to think of all the forms of initialization you can think of. My answers are below.

Done? How many did you come up with? In perusal of the standard, I counted eighteen different forms of initialization1. Here they are with a short example/description:

  • default: int i;
  • value: int i{};
  • zero: static int i;
  • constant: static int i = some_constexpr_function();
  • static: zero- or constant-initialization
  • dynamic: not static initialization
  • unordered: dynamic initialization of class template static data members which are not explicitly specialized
  • ordered: dynamic initialization of other non-local variables with static storage duration
  • non-trivial: when a class or aggregate is initialized by a non-trivial constructor
  • direct: int i{42}; int j(42);
  • copy: int i = 42;
  • copy-list: int i = {42};
  • direct-list: int i{42};
  • list: either copy-list or direct-list
  • aggregate: int is[3] = {0,1,2};
  • reference: const int& i = 42; auto&& j = 42;
  • implicit: default or value
  • explicit: direct, copy, or list

Don’t try to memorise all of these rules; therein lies madness. Just be careful, and keep in mind that C++’s initialization rules are there to pounce on you when you least expect it. If you won’t listen to me, then maybe you’ll listen to the illustrious authors of the C++ Core Guidelines [cppcore], who also recommend always initializing your variables in item ES.20. And if you ever fall in to the trap of thinking C++ is a sane language, remember this:

In C++, you can give your program undefined behaviour by changing the point at which you tell the compiler to generate something it was probably going to generate for you anyway.

Standards quotes

All quotes from N4140 (essentially C++14).

[dcl.fct.def.default]/5:

Explicitly-defaulted functions and implicitly-declared functions are collectively called defaulted functions, and the implementation shall provide implicit definitions for them (12.1 12.4, 12.8), which might mean defining them as deleted. A function is user-provided if it is user-declared and not explicitly defaulted or deleted on its first declaration. A user-provided explicitly-defaulted function (i.e., explicitly defaulted after its first declaration) is defined at the point where it is explicitly defaulted; if such a function is implicitly defined as deleted, the program is ill-formed.

[dcl.init]/6-8:

To zero-initialize an object or reference of type T means:

  • if T is a scalar type (3.9), the object is initialized to the value obtained by converting the integer literal 0 (zero) to T
  • if T is a (possibly cv-qualified) non-union class type, each non-static data member and each base-class subobject is zero-initialized and padding is initialized to zero bits;
  • if T is a (possibly cv-qualified) union type, the object’s first non-static named data member is zero-initialized and padding is initialized to zero bits;
  • if T is an array type, each element is zero-initialized;
  • if T is a reference type, no initialization is performed.

To default-initialize an object of type T means:

  • if T is a (possibly cv-qualified) class type (Clause 9), the default constructor (12.1) for T is called (and the initialization is ill-formed if T has no default constructor or overload resolution (13.3) results in an ambiguity or in a function that is deleted or inaccessible from the context of the initialization);
  • if T is an array type, each element is default-initialized;
  • otherwise, no initialization is performed. If a program calls for the default initialization of an object of a const-qualified type T, T shall be a class type with a user-provided default constructor.

To value-initialize an object of type T means:

  • if T is a (possibly cv-qualified) class type (Clause 9) with either no default constructor (12.1) or a default constructor that is user-provided or deleted, then the object is default-initialized;
  • if T is a (possibly cv-qualified) class type without a user-provided or deleted default constructor, then the object is zero-initialized and the semantic constraints for default-initialization are checked, and if T has a non-trivial default constructor, the object is default-initialized;
  • if T is an array type, then each element is value-initialized;
  • otherwise, the object is zero-initialized.

[basic.start.init]/2:

Variables with static storage duration (3.7.1) or thread storage duration (3.7.2) shall be zero-initialized (8.5) before any other initialization takes place. […]

This article was previously published at http://blog.tartanllama.xyz/c++/2017/01/20/initialization-is-bonkers/

References

[cppcore] https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Res-always

[cppref1] value-initialization http://en.cppreference.com/w/cpp/language/value_initialization

[cppref2] default-initialization http://en.cppreference.com/w/cpp/language/default_initialization

[cppref3] zero-initialization http://en.cppreference.com/w/cpp/language/zero_initialization

  • Feel free to debate that some of these are different flavours of initialization forms, or attributes of initialization rather than separate concepts, I don’t really care, suffice to say there are a lot.

Overload Journal #139 - June 2017 + Programming Topics