ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinProfiting from the Folly of Others

Overload Journal #156 - April 2020 + Programming Topics   Author: Alastair Harrison
Code referred to as a hack can raise an eyebrow. Alastair Harrison learns about accessing private members of C++ classes by investigating a header called UninitializedMemoryHacks.h

I always enjoy browsing through the source code of libraries written by other people. With so many dark corners in C++ I often come across new and interesting ideas. I’d like to share one such example from the ‘Folly’ library. Not because I think it illustrates best practice (it doesn’t!), but because I learned something about C++ in the process of deciphering it.

Folly [GitHub] is a C++ library developed at Facebook and released under the open-source Apache 2.0 licence. It contains useful algorithms, vocabulary types and utility functions. Hidden amongst the main-stream functionality are some utilities tailored towards the more unusual situations that a C++ developer may find themselves in.

The code I’d like to focus on lives in a header with the ominous title of UninitializedMemoryHacks.h. Its subtle use of loopholes and language features is fascinating, despite its obviously questionable nature.

The file contains a collection of helper functions that do reprehensible things in the name of performance. In particular, it provides a set of overloaded functions in the folly:: namespace, named resizeWithoutInitialization and whose purpose is to ‘resize std::string or std::vector without constructing or initializing new elements’.

It does what?

Normally when we call resize to increase the size of a std::vector, the container first checks to see if the existing capacity is sufficient to hold the requested number of elements. Even when the existing capacity is sufficient, the implementation needs to do something with the newly added elements. They each need to be constructed or initialized to ensure that they are in a valid state. For trivial types such as int it’s actually OK to leave the values uninitialized, as long as we don’t try to read them before we’ve first written something to them. But std::vector always forces us to pay the cost of initialization, even if we were intending to overwrite all of the newly initialized elements straight after calling resize.

In contrast, when we call folly::resizeWithoutInitialization on a std::vector with sufficient capacity, it simply reaches in to the private implementation and moves the pointer representing the end of the sequence. The memory for the new elements is left uninitialized, leaving the caller responsible for that task.

The first time I looked at the implementation of this function, I was amazed and alarmed to see it somehow bypassing the normal C++ access restrictions to modify a private member variable of a standard library component. I say ‘somehow’ because the precise mechanism was so thoroughly obfuscated behind layers of macros, template trickery and arcane member function pointer syntax that it might as well have been magic. The baffling part was that it claimed to pull off this magic trick without invoking any undefined behaviour. I had to know how this worked!

I won’t dwell further on the specifics of how Folly meddles with the internals of the standard containers. The interesting part is how it bypasses the access control mechanisms of C++. Herb Sutter has a Guru of the Week article [GotW] discussing three nefarious techniques for accessing private members of a class, though none of them quite matches the applicability of the method in the Folly library. The first two are illegal and the third involves writing a sneaky member function specialization, which makes it relevant only to classes that contain member function templates.

What’s interesting about the technique used in Folly is that it’s able to freely access private members of any class, without any particular structural requirements. It does this with a clever combination of infrequently-used language features and a small loophole allowed by the C++ standard.

The effect

Let’s take a simple class with a private member function:

  class Widget {
   private:
    void forbidden();
  };

Our aim is to write a free function called hijack which takes a reference to a Widget as input and calls the Widget::forbidden() member function on it. Assume that the Widget class is closed for modification by us, so we can’t just change the private to public, or make hijack a friend of Widget.

Obviously we can’t call the private member function directly:

  void hijack(Widget& w) {
    w.forbidden();  // ERROR!
  }

because the compiler will stop us:

  In function 'void hijack(Widget&)':
  error: 'void Widget::forbidden()' is private
  within this context
        |     w.forbidden();
        |                 ^

Using techniques from the Folly library, we’ll build up a solution piece-by-piece. This article covers the specific case of calling private member functions, but the approach is equally applicable to accessing and mutating private member variables in a class. The underlying techniques all work in C++98, but some more modern features will be used to ease exposition.

A syntax refresher for pointers to member functions

We’ll be using pointers to member functions (PMFs) extensively, so it’s worth revisiting their syntax before we dive in further. PMFs enable a primitive form of polymorphism over methods in a class. For the sake of exposition, let’s start with a hypothetical calculator class (Listing 1).

class Calculator {
  float current_val = 0.f;
 public:
   void clear_value() { current_val = 0.f; };
   float value() const {
     return current_val;
   };

   void add(float x) { current_val += x; };
   void multiply(float x) { current_val *= x; };
};
			
Listing 1

Arguably the easiest way to work with pointers to member functions is through type aliases. The type alias is specific to a given class, but the pointer can be bound to any member function in the class that matches the signature. In the case of Calculator, both multiply and add take a single float argument and return void, so we can use the same type alias for both. It looks like this:

  using Operation = void (Calculator::*)(float);

We can then store the address of either multiply or add. But value doesn’t match the signature, so its address cannot be assigned to an Operation pointer.

  // OK
  Operation op1 = &Calculator::add;
  Operation op2 = &Calculator::multiply;
  
  // ERROR! Signature mismatch
  Operation op3 = &Calculator::value; 

We’ll need to make a new alias to match the signature of value:

  using Getter = float (Calculator::*)() const;
  
  // OK - signature now matches
  Getter get = &Calculator::value;

A pointer to a member function isn’t very useful unless we know which object instance we want to call it on. Here’s the syntax for calling members of Calculator through their pointers:1

  Calculator calc{};
  (calc.*op1)(123.0f); // Calls add

  (calc.*op2)(10.0f);  // Calls multiply

  
  // Prints 1230.0
  std::cout << (calc.*get)() << '\n';

One of the interesting things about pointers to member functions is that they can be bound to private member functions. That’s the first piece of the Folly puzzle.

Puzzle piece 1: Pointers to private member functions can be called from any scope

Suppose the author of the Widget class had helpfully provided a means to obtain a pointer to the Widget::forbidden() member function. Once we have that pointer, we are able to call it from any scope where we have a Widget available (Listing 2).

class Widget {
 public:
  static auto forbidden_fun() {
    return &Widget::forbidden;
  }
 private:
  void forbidden();
};

void hijack(Widget& w) {
  using ForbiddenFun = void (Widget::*)();
  ForbiddenFun const forbidden_fun =
    Widget::forbidden_fun();

  // Calls a private member function on the Widget
  // instance passed in to the function.
  (w.*forbidden_fun)();
}
			
Listing 2

That’s useful to know, but most classes don’t offer to hand out pointers to their private member functions. We need to find a sneakier way to get hold of one from outside of the class scope.

There’s a curious loophole in the C++ standard around the use of explicit template instantiation which allows us to refer to private class members. That gives us the second piece of the Folly puzzle.

Puzzle piece 2: The explicit template instantiation loophole

The C++17 standard contains the following paragraph (with the parts of interest to us marked in bold):

17.7.2 (item 12)

The usual access checking rules do not apply to names used to specify explicit instantiations. [Note: In particular, the template arguments and names used in the function declarator (including parameter types, return types and exception specifications) may be private types or objects which would normally not be accessible and the template may be a member template or member function which would not normally be accessible.]

To understand the reason behind this curiosity, we need to discuss the explicit template instantiation mechanism for a moment.

Suppose we’ve got a Company class with an internal private member function template, update_employee. Perhaps there is one particular template argument, OnSiteEmployeePolicy which is expensive to compile, but used regularly. We’d like to avoid the cost of instantiating that version of the template in lots of translation units. We can achieve this by explicitly instantiating the member template in just one translation unit and marking it as extern everywhere else. See Listing 3 (company.h) and Listing 4 (company.cpp).

class OnSiteEmployeePolicy {
  // ... contains daring and unfettered use of
  // ... hairy template meta-programming tricks.
};
class Company {
 private:
  template <typename EmployeePolicy>
  void update_employee(int employee_id) {
    // ...
  }
};
// Prevents implicit instantiation of a specific
// specialization.
extern template
Company::update_employee<OnSiteEmployeePolicy>;
			
Listing 3
#include "company.h"

// Explicit instantiation of the template only
// needs to happen in a single translation unit.
template
Company::update_employee<OnSiteEmployeePolicy>;
			
Listing 4

Brushing aside the question of how someone ever snuck such an awkward API design through code review, notice how the template instantiation mechanism needs to allow a reference to a private member of CompanyCompany::update_employee – in a context where we would not normally be able to (i.e. outside the scope of the Company class). That’s the reason for the exception in the C++ standard that allows for private types to appear in explicit template instantiations.

It’s also the crucial loophole that Folly takes advantage of. We can’t relax just yet, though. There’s still some work to be done.

Puzzle piece 3: Passing a member-function pointer as a non-type template parameter

In C++, template arguments are usually types, but there is some support for non-type template parameters if they are of integral or pointer type.2 Conveniently enough, it’s perfectly legal to pass a pointer-to-member-function as a template argument.3 Listing 5 is an example of what that looks like.

class SpaceShip {
 public:
  void dock();
  // ...
};

// Member function alias that matches the
// signature of SpaceShip::dock()
using SpaceShipFun = void (SpaceShip::*)();

// spaceship_fun is a pointer-to-member-function
// value which is baked-in to the type of the
// SpaceStation template at compile time.
template <SpaceShipFun spaceship_fun>
class SpaceStation {
  // ...
};

// Instantiate a SpaceStation and pass in a
// pointer to member function statically as a
// template argument.
SpaceStation<&SpaceShip::dock> space_station{};
			
Listing 5

The intermediate SpaceShipFun alias hampers the genericity of the SpaceStation template, so we can move the type of the pointer-to-member-function into the template arguments too (Listing 6).

template <
  typename SpaceShipFun,
  SpaceShipFun spaceship_fun
>
class SpaceStation {
  // ...
};

// Now we must also pass the type of the pointer to
// member function when we instantiate the
// SpaceStation template.
SpaceStation<
  void (SpaceShip::*)(),
  &SpaceShip::dock
> space_station{};
			
Listing 6

We can take that a step further and have the compiler deduce the type of the member function for us:

  SpaceStation<
    decltype(&SpaceShip::dock),
    &SpaceShip::dock
  > space_station{};

That relieves us of some of the burden of having to pass the member function signature to the template. We’ll stick with this approach for the rest of article as it’s what’s used in the Folly library, but it’s worth mentioning that C++17’s template <auto> feature removes the need for the first template parameter entirely.4

Passing a private pointer-to-member-function as a template parameter

Let’s combine the explicit template instantiation loophole with the ability to pass member function pointers as template parameters. The HijackImpl struct receives a pointer to Widget::forbidden() as a template parameter (see Listing 7).

// The first template parameter is the type
// signature of the pointer-to-member-function.
// The second template parameter is the pointer
// itself.
template <
  typename ForbiddenFun,
  ForbiddenFun forbidden_fun
>
struct HijackImpl {
  static void apply(Widget& w) {
    // Calls a private method of Widget
    (w.*forbidden_fun)();
  }
};

// Explicit instantiation is allowed to refer to
// `Widget::forbidden` in a scope where it's not
// normally permissible.
template struct HijackImpl<
  decltype(&Widget::forbidden),
  &Widget::forbidden
>;

			
Listing 7

Brilliant! We’ve instantiated a template that is able to reach in and call the forbidden() member function on any Widget that we care to pass in. So we just have to write the free function, hijack and we can go back to watching cat videos on YouTube, right?

  void hijack(Widget& w) {
    HijackImpl<
      decltype(&Widget::forbidden),
      &Widget::forbidden
    >::apply(w);
  }

The only problem is that it doesn’t work. The compiler sees through our ruse and raps us smartly on the knuckles:

 error: 'forbidden' is a private member of 'Widget'
   HijackImpl<decltype(&Widget::forbidden),
     &Widget::forbidden>::hijack(w);

The use of the HijackImpl template inside the hijack function is not an explicit template instantiation. It’s just a ‘normal’ implicit instantiation. So the loophole doesn’t apply. It’s time to phone a friend for help with solving the next piece of the puzzle.

Puzzle piece 4: A friend comes to our aid

Because we’re not allowed to refer to Widget::forbidden inside our hijack function, we must solve the conundrum of accessing the value of the ForbiddenFun template parameter without making any direct reference to the HijackImpl<...> template. This apparently unreasonable requirement is easily solved with a shrewd application of the friend keyword.

Let’s take another step back from the task in hand and look at some of the different effects one can achieve when marking a free function as a friend of a class. The behaviour depends on whether the class contains only a declaration of the function (i.e. function signature only), or whether the complete definition (including the function body) appears inside the class.

‘friend’ function declarations

Most C++ developers will be familiar with the pattern of placing a free function declaration inside of a class definition and marking it as a friend. The definition of the free-function still lives outside of the class, but is now allowed to access private members of the class. (See Listing 8.)

class Gadget {
  // Friend declaration gives `frobnicate` access
  // to Gadget's private members.
  friend void frobnicate();

 private:
  void internal() {
    // ...
  }
};

// Definition as a normal free function
void frobnicate() {
  Gadget g;
  // OK because `frobnicate()` is a friend of
  // `Gadget`.
  g.internal();
}
			
Listing 8

If we could make hijack be a friend of Widget then the compiler would allow us to refer to Widget::forbidden inside the hijack function. Alas, this option is unavailable because the rules of our game don’t allow us to modify Widget. Let’s try something else.

Inline ‘friend’ function definitions

It’s also possible to define a friend function inside a class (as opposed to just declaring it there). This isn’t something seen as often in C++ code. Probably because when we try to call the free function, the compiler is unable to find it! (See Listing 9.) Here’s the compile error:

class Gadget {
  // Free function declared as a friend of Gadget
  friend void frobnicate() {
    Gadget g;
    g.internal(); // Still OK
  }

 private:
   void internal();
};

void do_something() {
  // NOT OK: Compiler can't find frobnicate()
  // during name lookup
  frobnicate();
}
			
Listing 9
  error: 'frobnicate' was not declared in this scope
        |   frobnicate();
        |   ^

As before, frobnicate() is still a free function that lives in the global namespace, but it behaves quite differently under name lookup now that it is defined inside the Gadget class. A friend function defined inside a class is sometimes known as a ‘hidden friend’ [JSS19] [Saks18]. Hidden friends can only be found through Argument Dependent Lookup (ADL) and ADL only works if one of the arguments to the function is of the enclosing class type. In the above example frobnicate() takes no arguments, so argument dependent lookup won’t happen. The result is that frobnicate() can’t be called from anywhere. Not even from within frobnicate() itself!

If we add a parameter of the enclosing class type to frobnicate() then we’re able to call it via ADL (Listing 10).

class Gadget {
  friend void frobnicate(Gadget& gadget) {
    gadget.internal();
  }

 private:
   void internal();
};

void do_something(Gadget& gadget) {
  // OK: Compiler is now able to find the
  // definition of `frobnicate` inside Gadget
  // because ADL adds it to the candidate set for
  // name lookup.
  frobnicate(gadget);
}
			
Listing 10

Making hidden friends visible

The hidden-friend ADL trick can be very useful; it’s an ideal tool when writing operator overloads for user-defined types. But we’ll use a slightly bigger hammer for our hijack function. There’s another way of allowing the compiler to find hidden friends, and that is to put a declaration of the function in the enclosing namespace (Listing 11).

class Gadget {
  // Definition stays inside the Gadget class
  friend void frobnicate() {
    Gadget g;
    g.internal();
  }

 private:
   void internal();
};

// An additional namespace-scope declaration makes
// the function available for normal name lookup.
void frobnicate();

void do_something() {
  // The compiler can now find the function
  frobnicate();
}
			
Listing 11

This is exactly the opposite of the usual pattern of defining a free function and then placing a friend declaration for it inside of a class. The new behaviour is almost identical except for one critical difference: when the enclosing class is a template, the free function has access to the template parameters!

Using friends to pilfer template parameters

I trust you will be at least a little unsettled to discover that the program in Listing 12 is valid.

#include <iostream>

template <int N>
class SpookyAction {
  friend int observe() {
    return N;
  }
};

int observe();

int main() {
  SpookyAction<42>{};
  std::cout << observe() << '\n';  // Prints 42
}
			
Listing 12

What’s happening is that the observe() function is not defined until the point at which the SpookyAction template is instantiated (by its use in the main function). There is a single definition of the observe() function, because there is a single instantiation of the SpookyAction template. The really useful part is that the observe() function gains access to the template parameter of the SpookyAction<42> instantiation that caused it to be defined.

Of course things go wrong very quickly if we try to instantiate any more versions of the SpookyAction template, as each one results in a redefinition of the observe() function and an angry compiler.

Provided we use it carefully, we now have the last piece of our puzzle – a way to access the template parameters of a class from a scope external to that class.

Putting the puzzle pieces together

Let’s go back to our original Widget example, now that we’ve got all of the pieces that we need to be able to reach in and call its private member function, Widget::forbidden(). In summary:

  1. We use the loophole in the explicit template instantiation rules to allow us to refer to Widget::forbidden() from outside of the Widget class.
  2. We inject the address of Widget::forbidden() into our HijackImpl class as a template parameter.
  3. We define the hijack() function directly inside of HijackImpl so that it can access the template parameter containing the address of Widget::forbidden().
  4. We mark hijack as a friend so that it becomes a free function and we provide a declaration of hijack at namespace scope so that it participates in name-lookup.
  5. We can now invoke Widget::forbidden() on any Widget instance through the member-function address that is exposed to the hijack function.

The key parts of the mechanism are shown in Listing 13.

// HijackImpl is the mechanism for injecting the
// private member function pointer into the
// hijack function.
template <
  typename ForbiddenFun,
  ForbiddenFun forbidden_fun
>
class HijackImpl {
  // Definition of free function inside the class
  // template to give it access to the
  // forbidden_fun template argument.
  // Marking hijack as a friend prevents it from
  // becoming a member function.
  friend void hijack(Widget& w) {
    (w.*forbidden_fun)();
  }
};
// Declaration in the enclosing namespace to make
// hijack available for name lookup.
void hijack(Widget& w);

// Explicit instantiation of HijackImpl template
// bypasses access controls in the Widget class.
template class
HijackImpl<
  decltype(&Widget::forbidden),
  &Widget::forbidden
>;
			
Listing 13

Dealing with multiple definitions of the friend function

There’s still one more issue to overcome.5 To avoid violating the One Definition Rule, there must be one – and only one – explicit instantiation of a template (with given template arguments) across all translation units.

Consider what happens when our HijackImpl class is put in a header and is used in multiple translation units. The explicit instantiation of the class template must live outside of that header, otherwise it will appear in every translation unit that includes the header. We need to ensure that there is just one explicit template instantiation in the whole program. What’s more, the linker is not actually required to report duplicate instantiations across multiple translation units, so it won’t even help us to avoid the problem. That’s a recipe for a big maintenance headache.

The approach employed by the Folly library is to add an extra template parameter to the HijackImpl class and use it to accept an empty ‘tag’ struct which is defined in an anonymous namespace.

The anonymous namespace ensures that the tag parameter is of a different type in every translation unit. Every translation unit will therefore get its own unique explicit instantiation of the HijackImpl class.

The final solution is short, but packs in a surprising amount of nuance. See widget.h (Listing 14), widget_hijack.h (Listing 15) and main.cc (Listing 16.)

#pragma once
#include <iostream>

class Widget {
 private:
  void forbidden() {
    std::cout << "Whoops...\n";
  }
};
			
Listing 14
#pragma once
#include "widget.h"

namespace {
// This is a *different* type in every translation
// unit because of the anonymous namespace.
struct TranslationUnitTag {};
}

void hijack(Widget& w);

template <
  typename Tag,
  typename ForbiddenFun,
  ForbiddenFun forbidden_fun
>
class HijackImpl {
  friend void hijack(Widget& w) {
    (w.*forbidden_fun)();
  }
};

// Every translation unit gets its own unique
// explicit instantiation because of the
// guaranteed-unique tag parameter.
template class HijackImpl<
  TranslationUnitTag,
  decltype(&Widget::forbidden),
  &Widget::forbidden
>;
			
Listing 15
#include "widget.h"
#include "widget_hijack.h"

int main() {
  Widget w;
  hijack(w); // Prints "Whoops..."
}
			
Listing 16

Conclusion

Should you use this access-violation hack in production code? Almost certainly not. Well, not unless you enjoy the excitement and explosive unpredictability of maintaining extremely brittle code.

The C++ class member access rules are there to help authors of types to enforce invariants. If you fool the compiler into mutating private class members then you’re likely to be violating the class invariants, which risks leaving the program in an invalid state. You’re also relying on intimate knowledge of internal implementation details, for which the library author is under no stability obligations.

Should you experiment with this access-violation technique outside of production code? Absolutely! Learning how to subvert a system in a safe environment is not only fun, but it helps to foster a deeper understanding of that system. Untangling the machiavellian mechanisms in the Folly library tested my knowledge of C++, requiring me to improve my understanding of the language features I encountered along the way. It’s almost as if someone had reached in to my brain’s internal implementation and fiddled with its contents...

A note on the origins of the technique

The idea of using explicit template instantiation to bypass class access rules pre-dates the Folly library by a few years. The first mention I can find is from a 2010 blog post by Johannes Schaub [Schaub10], which describes a method using initialization of static class members. At the time, there was a discussion on the Boost mailing list about how the technique might prove to be a useful addition to the Boost serialization library.

A year later, Schaub offered the dubiously tempting promise of ‘Safer Nastiness’ in a follow-up blog post [Schaub11] in which he presented an improved version of the code. This removed the need for static class members and is much closer to what’s used today by the Folly library.

Acknowledgements

The author would like to thank Geoff Hester, Anthony Kirby and Kirsty McNaught for advice on early drafts of this article and Balog Pal for very patiently explaining some finer points of the one-definition rule.

References

[GitHub]: Facebook Folly on GitHub: https://github.com/facebook/folly

[GotW]: ‘Uses and Abuses of Access Rights at http://www.gotw.ca/gotw/076.htm

[JSS19]: ‘The Power of Hidden Friends in C++’ posted 25 June 2019: https://www.justsoftwaresolutions.co.uk/cplusplus/hidden-friends.html

[Saks18]: Dan Saks ‘Making New Friends’ recorded at CPPCon 18, available at: https://www.youtube.com/watch?v=POa_V15je8Y

[Schaub10]: Johannes Schaub ‘Access to private members. That’s easy!’, posted 3 July 2010: http://bloglitb.blogspot.com/2010/07/access-to-private-members-thats-easy.html

[Schaub11]: Johannes Schaub ‘Access to private members: Safer nastiness’, posted 30 December 2011: http://bloglitb.blogspot.com/2011/12/access-to-private-members-safer.html

Footnotes

  1. C++17 introduced the std::invoke template, which gives a unified syntax for working with callables.
  2. C++20 will significantly relax the restrictions on non-type template parameters.
  3. I imagine it’s staggeringly useful to someone.
  4. Readers lacking both a C++17 compiler and a certain amount of moral fibre have probably already worked out how to use a macro to remove the duplication in the template arguments.
  5. If you choose to ignore the pitchfork-bearing members of the C++ standards committee currently approaching your front door with a polite request that you stop doing this sort of thing at all.

Alastair Harrison started out as a robotics researcher but accidentally became a C++ build engineer because nobody else wanted to do it. These days he appreciates the fact that his customers are all in the same building as him and that they are apparently unfazed by his eagerness to delete old code.

Overload Journal #156 - April 2020 + Programming Topics