Code referred to as a hack can raise an eyebrow. Alastair Harrison learns about accessing private members of C++ classes by investigating a header called UninitializedMemoryHacks.h
I always enjoy browsing through the source code of libraries written by other people. With so many dark corners in C++ I often come across new and interesting ideas. I’d like to share one such example from the ‘Folly’ library. Not because I think it illustrates best practice (it doesn’t!), but because I learned something about C++ in the process of deciphering it.
Folly [ GitHub ] is a C++ library developed at Facebook and released under the open-source Apache 2.0 licence. It contains useful algorithms, vocabulary types and utility functions. Hidden amongst the main-stream functionality are some utilities tailored towards the more unusual situations that a C++ developer may find themselves in.
The code I’d like to focus on lives in a header with the ominous title of UninitializedMemoryHacks.h . Its subtle use of loopholes and language features is fascinating, despite its obviously questionable nature.
The file contains a collection of helper functions that do reprehensible things in the name of performance. In particular, it provides a set of overloaded functions in the
folly::
namespace, named
resizeWithoutInitialization
and whose purpose is to ‘
resize
std::string
or
std::vector
without constructing or initializing new elements
’.
It does what?
Normally when we call
resize
to increase the size of a
std::vector
, the container first checks to see if the existing capacity is sufficient to hold the requested number of elements. Even when the existing capacity is sufficient, the implementation needs to do something with the newly added elements. They each need to be constructed or initialized to ensure that they are in a valid state. For trivial types such as
int
it’s actually OK to leave the values uninitialized, as long as we don’t try to read them before we’ve first written something to them. But
std::vector
always forces us to pay the cost of initialization, even if we were intending to overwrite all of the newly initialized elements straight after calling
resize
.
In contrast, when we call
folly::resizeWithoutInitialization
on a
std::vector
with sufficient capacity, it simply
reaches in to the private implementation
and moves the pointer representing the end of the sequence. The memory for the new elements is left uninitialized, leaving the caller responsible for that task.
The first time I looked at the implementation of this function, I was amazed and alarmed to see it somehow bypassing the normal C++ access restrictions to modify a private member variable of a standard library component. I say ‘somehow’ because the precise mechanism was so thoroughly obfuscated behind layers of macros, template trickery and arcane member function pointer syntax that it might as well have been magic. The baffling part was that it claimed to pull off this magic trick without invoking any undefined behaviour. I had to know how this worked!
I won’t dwell further on the specifics of how Folly meddles with the internals of the standard containers. The interesting part is how it bypasses the access control mechanisms of C++. Herb Sutter has a Guru of the Week article [ GotW ] discussing three nefarious techniques for accessing private members of a class, though none of them quite matches the applicability of the method in the Folly library. The first two are illegal and the third involves writing a sneaky member function specialization, which makes it relevant only to classes that contain member function templates.
What’s interesting about the technique used in Folly is that it’s able to freely access private members of any class, without any particular structural requirements. It does this with a clever combination of infrequently-used language features and a small loophole allowed by the C++ standard.
The effect
Let’s take a simple class with a private member function:
class Widget { private: void forbidden(); };
Our aim is to write a free function called
hijack
which takes a reference to a
Widget
as input and calls the
Widget::forbidden()
member function on it. Assume that the
Widget
class is closed for modification by us, so we can’t just change the
private
to
public
, or make
hijack
a friend of
Widget
.
Obviously we can’t call the private member function directly:
void hijack(Widget& w) { w.forbidden(); // ERROR! }
because the compiler will stop us:
In function 'void hijack(Widget&)': error: 'void Widget::forbidden()' is private within this context | w.forbidden(); | ^
Using techniques from the Folly library, we’ll build up a solution piece-by-piece. This article covers the specific case of calling private member functions, but the approach is equally applicable to accessing and mutating private member variables in a class. The underlying techniques all work in C++98, but some more modern features will be used to ease exposition.
A syntax refresher for pointers to member functions
We’ll be using pointers to member functions (PMFs) extensively, so it’s worth revisiting their syntax before we dive in further. PMFs enable a primitive form of polymorphism over methods in a class. For the sake of exposition, let’s start with a hypothetical calculator class (Listing 1).
class Calculator { float current_val = 0.f; public: void clear_value() { current_val = 0.f; }; float value() const { return current_val; }; void add(float x) { current_val += x; }; void multiply(float x) { current_val *= x; }; }; |
Listing 1 |
Arguably the easiest way to work with pointers to member functions is through type aliases. The type alias is specific to a given class, but the pointer can be bound to any member function in the class that matches the signature. In the case of
Calculator
, both
multiply
and
add
take a single
float
argument and return
void
, so we can use the same type alias for both. It looks like this:
using Operation = void (Calculator::*)(float);
We can then store the address of either
multiply
or
add
. But
value
doesn’t match the signature, so its address cannot be assigned to an
Operation
pointer.
// OK Operation op1 = &Calculator::add; Operation op2 = &Calculator::multiply; // ERROR! Signature mismatch Operation op3 = &Calculator::value;
We’ll need to make a new alias to match the signature of
value
:
using Getter = float (Calculator::*)() const; // OK - signature now matches Getter get = &Calculator::value;
A pointer to a member function isn’t very useful unless we know which object instance we want to call it on. Here’s the syntax for calling members of
Calculator
through their pointers:
1
Calculator calc{}; (calc.*op1)(123.0f); // Calls add (calc.*op2)(10.0f); // Calls multiply // Prints 1230.0 std::cout << (calc.*get)() << '\n';
One of the interesting things about pointers to member functions is that they can be bound to private member functions. That’s the first piece of the Folly puzzle.
Puzzle piece 1: Pointers to private member functions can be called from any scope
Suppose the author of the
Widget
class had helpfully provided a means to obtain a pointer to the
Widget::forbidden()
member function. Once we have that pointer, we are able to call it from
any
scope where we have a
Widget
available (Listing 2).
class Widget { public: static auto forbidden_fun() { return &Widget::forbidden; } private: void forbidden(); }; void hijack(Widget& w) { using ForbiddenFun = void (Widget::*)(); ForbiddenFun const forbidden_fun = Widget::forbidden_fun(); // Calls a private member function on the Widget // instance passed in to the function. (w.*forbidden_fun)(); } |
Listing 2 |
That’s useful to know, but most classes don’t offer to hand out pointers to their private member functions. We need to find a sneakier way to get hold of one from outside of the class scope.
There’s a curious loophole in the C++ standard around the use of explicit template instantiation which allows us to refer to private class members. That gives us the second piece of the Folly puzzle.
Puzzle piece 2: The explicit template instantiation loophole
The C++17 standard contains the following paragraph (with the parts of interest to us marked in bold):
17.7.2 (item 12)
The usual access checking rules do not apply to names used to specify explicit instantiations. [Note: In particular, the template arguments and names used in the function declarator (including parameter types, return types and exception specifications) may be private types or objects which would normally not be accessible and the template may be a member template or member function which would not normally be accessible.]
To understand the reason behind this curiosity, we need to discuss the explicit template instantiation mechanism for a moment.
Suppose we’ve got a
Company
class with an internal private member function template,
update_employee
. Perhaps there is one particular template argument,
OnSiteEmployeePolicy
which is expensive to compile, but used regularly. We’d like to avoid the cost of instantiating that version of the template in lots of translation units. We can achieve this by explicitly instantiating the member template in just one translation unit and marking it as
extern
everywhere else. See Listing 3 (
company.h
) and Listing 4 (
company.cpp
).
class OnSiteEmployeePolicy { // ... contains daring and unfettered use of // ... hairy template meta-programming tricks. }; class Company { private: template <typename EmployeePolicy> void update_employee(int employee_id) { // ... } }; // Prevents implicit instantiation of a specific // specialization. extern template Company::update_employee<OnSiteEmployeePolicy>; |
Listing 3 |
#include "company.h" // Explicit instantiation of the template only // needs to happen in a single translation unit. template Company::update_employee<OnSiteEmployeePolicy>; |
Listing 4 |
Brushing aside the question of how someone ever snuck such an awkward API design through code review, notice how the template instantiation mechanism needs to allow a reference to a
private
member of
Company
–
Company::update_employee
– in a context where we would not normally be able to (i.e. outside the scope of the
Company
class). That’s the reason for the exception in the C++ standard that allows for private types to appear in explicit template instantiations.
It’s also the crucial loophole that Folly takes advantage of. We can’t relax just yet, though. There’s still some work to be done.
Puzzle piece 3: Passing a member-function pointer as a non-type template parameter
In C++, template arguments are usually types, but there is some support for non-type template parameters if they are of integral or pointer type. 2 Conveniently enough, it’s perfectly legal to pass a pointer-to-member-function as a template argument. 3 Listing 5 is an example of what that looks like.
class SpaceShip { public: void dock(); // ... }; // Member function alias that matches the // signature of SpaceShip::dock() using SpaceShipFun = void (SpaceShip::*)(); // spaceship_fun is a pointer-to-member-function // value which is baked-in to the type of the // SpaceStation template at compile time. template <SpaceShipFun spaceship_fun> class SpaceStation { // ... }; // Instantiate a SpaceStation and pass in a // pointer to member function statically as a // template argument. SpaceStation<&SpaceShip::dock> space_station{}; |
Listing 5 |
The intermediate
SpaceShipFun
alias hampers the genericity of the
SpaceStation
template, so we can move the type of the pointer-to-member-function into the template arguments too (Listing 6).
template < typename SpaceShipFun, SpaceShipFun spaceship_fun > class SpaceStation { // ... }; // Now we must also pass the type of the pointer to // member function when we instantiate the // SpaceStation template. SpaceStation< void (SpaceShip::*)(), &SpaceShip::dock > space_station{}; |
Listing 6 |
We can take that a step further and have the compiler deduce the type of the member function for us:
SpaceStation< decltype(&SpaceShip::dock), &SpaceShip::dock > space_station{};
That relieves us of some of the burden of having to pass the member function signature to the template. We’ll stick with this approach for the rest of article as it’s what’s used in the Folly library, but it’s worth mentioning that C++17’s
template <auto>
feature removes the need for the first template parameter entirely.
4
Passing a private pointer-to-member-function as a template parameter
Let’s combine the explicit template instantiation loophole with the ability to pass member function pointers as template parameters. The
HijackImpl
struct receives a pointer to
Widget::forbidden()
as a template parameter (see Listing 7).
// The first template parameter is the type // signature of the pointer-to-member-function. // The second template parameter is the pointer // itself. template < typename ForbiddenFun, ForbiddenFun forbidden_fun > struct HijackImpl { static void apply(Widget& w) { // Calls a private method of Widget (w.*forbidden_fun)(); } }; // Explicit instantiation is allowed to refer to // `Widget::forbidden` in a scope where it's not // normally permissible. template struct HijackImpl< decltype(&Widget::forbidden), &Widget::forbidden >; |
Listing 7 |
Brilliant! We’ve instantiated a template that is able to reach in and call the
forbidden()
member function on any
Widget
that we care to pass in. So we just have to write the free function,
hijack
and we can go back to watching cat videos on YouTube, right?
void hijack(Widget& w) { HijackImpl< decltype(&Widget::forbidden), &Widget::forbidden >::apply(w); }
The only problem is that it doesn’t work. The compiler sees through our ruse and raps us smartly on the knuckles:
error: 'forbidden' is a private member of 'Widget' HijackImpl<decltype(&Widget::forbidden), &Widget::forbidden>::hijack(w);
The use of the
HijackImpl
template inside the
hijack
function is
not
an explicit template instantiation. It’s just a ‘normal’ implicit instantiation. So the loophole doesn’t apply. It’s time to phone a friend for help with solving the next piece of the puzzle.
Puzzle piece 4: A friend comes to our aid
Because we’re not allowed to refer to
Widget::forbidden
inside our
hijack
function, we must solve the conundrum of accessing the value of the
ForbiddenFun
template parameter
withou
t making any direct reference to the
HijackImpl<...>
template. This apparently unreasonable requirement is easily solved with a shrewd application of the
friend
keyword.
Let’s take another step back from the task in hand and look at some of the different effects one can achieve when marking a free function as a
friend
of a class. The behaviour depends on whether the class contains only a declaration of the function (i.e. function signature only), or whether the complete definition (including the function body) appears inside the class.
‘friend’ function declarations
Most C++ developers will be familiar with the pattern of placing a free function declaration inside of a class definition and marking it as a
friend
. The definition of the free-function still lives outside of the class, but is now allowed to access private members of the class. (See Listing 8.)
class Gadget { // Friend declaration gives `frobnicate` access // to Gadget's private members. friend void frobnicate(); private: void internal() { // ... } }; // Definition as a normal free function void frobnicate() { Gadget g; // OK because `frobnicate()` is a friend of // `Gadget`. g.internal(); } |
Listing 8 |
If we could make
hijack
be a
friend
of
Widget
then the compiler would allow us to refer to
Widget::forbidden
inside the
hijack
function. Alas, this option is unavailable because the rules of our game don’t allow us to modify
Widget
. Let’s try something else.
Inline ‘friend’ function definitions
It’s also possible to
define
a
friend
function inside a class (as opposed to just declaring it there). This isn’t something seen as often in C++ code. Probably because when we try to call the free function, the compiler is unable to find it! (See Listing 9.) Here’s the compile error:
class Gadget { // Free function declared as a friend of Gadget friend void frobnicate() { Gadget g; g.internal(); // Still OK } private: void internal(); }; void do_something() { // NOT OK: Compiler can't find frobnicate() // during name lookup frobnicate(); } |
Listing 9 |
error: 'frobnicate' was not declared in this scope | frobnicate(); | ^
As before,
frobnicate()
is still a free function that lives in the global namespace, but it behaves quite differently under name lookup now that it is defined inside the
Gadget
class. A
friend
function defined inside a class is sometimes known as a ‘hidden friend’ [
JSS19
] [
Saks18
]. Hidden friends can
only
be found through Argument Dependent Lookup (ADL) and ADL only works if one of the arguments to the function is of the enclosing class type. In the above example
frobnicate()
takes no arguments, so argument dependent lookup won’t happen. The result is that
frobnicate()
can’t be called from anywhere. Not even from within
frobnicate()
itself!
If we add a parameter of the enclosing class type to
frobnicate()
then we’re able to call it via ADL (Listing 10).
class Gadget { friend void frobnicate(Gadget& gadget) { gadget.internal(); } private: void internal(); }; void do_something(Gadget& gadget) { // OK: Compiler is now able to find the // definition of `frobnicate` inside Gadget // because ADL adds it to the candidate set for // name lookup. frobnicate(gadget); } |
Listing 10 |
Making hidden friends visible
The hidden-friend ADL trick can be very useful; it’s an ideal tool when writing operator overloads for user-defined types. But we’ll use a slightly bigger hammer for our
hijack
function. There’s another way of allowing the compiler to find hidden friends, and that is to put a declaration of the function in the enclosing namespace (Listing 11).
class Gadget { // Definition stays inside the Gadget class friend void frobnicate() { Gadget g; g.internal(); } private: void internal(); }; // An additional namespace-scope declaration makes // the function available for normal name lookup. void frobnicate(); void do_something() { // The compiler can now find the function frobnicate(); } |
Listing 11 |
This is exactly the opposite of the usual pattern of defining a free function and then placing a
friend
declaration for it inside of a class. The new behaviour is almost identical except for one critical difference: when the enclosing class is a template, the free function has access to the template parameters!
Using friends to pilfer template parameters
I trust you will be at least a little unsettled to discover that the program in Listing 12 is valid.
#include <iostream> template <int N> class SpookyAction { friend int observe() { return N; } }; int observe(); int main() { SpookyAction<42>{}; std::cout << observe() << '\n'; // Prints 42 } |
Listing 12 |
What’s happening is that the
observe()
function is not defined until the point at which the
SpookyAction
template is instantiated (by its use in the
main
function). There is a single definition of the
observe()
function, because there is a single instantiation of the
SpookyAction
template. The really useful part is that the
observe()
function gains access to the template parameter of the
SpookyAction<42>
instantiation that caused it to be defined.
Of course things go wrong very quickly if we try to instantiate any more versions of the
SpookyAction
template, as each one results in a redefinition of the
observe()
function and an angry compiler.
Provided we use it carefully, we now have the last piece of our puzzle – a way to access the template parameters of a class from a scope external to that class.
Putting the puzzle pieces together
Let’s go back to our original
Widget
example, now that we’ve got all of the pieces that we need to be able to reach in and call its private member function,
Widget::forbidden()
. In summary:
-
We use the loophole in the explicit template instantiation rules to allow us to refer to
Widget::forbidden()
from outside of theWidget
class. -
We inject the address of
Widget::forbidden()
into ourHijackImpl
class as a template parameter. -
We define the
hijack()
function directly inside ofHijackImpl
so that it can access the template parameter containing the address ofWidget::forbidden()
. -
We mark
hijack
as afriend
so that it becomes a free function and we provide a declaration ofhijack
at namespace scope so that it participates in name-lookup. -
We can now invoke
Widget::forbidden()
on anyWidget
instance through the member-function address that is exposed to thehijack
function.
The key parts of the mechanism are shown in Listing 13.
// HijackImpl is the mechanism for injecting the // private member function pointer into the // hijack function. template < typename ForbiddenFun, ForbiddenFun forbidden_fun > class HijackImpl { // Definition of free function inside the class // template to give it access to the // forbidden_fun template argument. // Marking hijack as a friend prevents it from // becoming a member function. friend void hijack(Widget& w) { (w.*forbidden_fun)(); } }; // Declaration in the enclosing namespace to make // hijack available for name lookup. void hijack(Widget& w); // Explicit instantiation of HijackImpl template // bypasses access controls in the Widget class. template class HijackImpl< decltype(&Widget::forbidden), &Widget::forbidden >; |
Listing 13 |
Dealing with multiple definitions of the friend function
There’s still one more issue to overcome. 5 To avoid violating the One Definition Rule, there must be one – and only one – explicit instantiation of a template (with given template arguments) across all translation units.
Consider what happens when our
HijackImpl
class is put in a header and is used in multiple translation units. The explicit instantiation of the class template must live outside of that header, otherwise it will appear in every translation unit that includes the header. We need to ensure that there is just one explicit template instantiation in the whole program. What’s more, the linker is not actually required to report duplicate instantiations across multiple translation units, so it won’t even help us to avoid the problem. That’s a recipe for a big maintenance headache.
The approach employed by the Folly library is to add an extra template parameter to the
HijackImpl
class and use it to accept an empty ‘tag’ struct which is defined in an anonymous namespace.
The anonymous namespace ensures that the tag parameter is of a different type in every translation unit. Every translation unit will therefore get its own unique explicit instantiation of the
HijackImpl
class.
The final solution is short, but packs in a surprising amount of nuance. See widget.h (Listing 14), widget_hijack.h (Listing 15) and main.cc (Listing 16.)
#pragma once #include <iostream> class Widget { private: void forbidden() { std::cout << "Whoops...\n"; } }; |
Listing 14 |
#pragma once #include "widget.h" namespace { // This is a *different* type in every translation // unit because of the anonymous namespace. struct TranslationUnitTag {}; } void hijack(Widget& w); template < typename Tag, typename ForbiddenFun, ForbiddenFun forbidden_fun > class HijackImpl { friend void hijack(Widget& w) { (w.*forbidden_fun)(); } }; // Every translation unit gets its own unique // explicit instantiation because of the // guaranteed-unique tag parameter. template class HijackImpl< TranslationUnitTag, decltype(&Widget::forbidden), &Widget::forbidden >; |
Listing 15 |
#include "widget.h" #include "widget_hijack.h" int main() { Widget w; hijack(w); // Prints "Whoops..." } |
Listing 16 |
Conclusion
Should you use this access-violation hack in production code? Almost certainly not. Well, not unless you enjoy the excitement and explosive unpredictability of maintaining extremely brittle code.
The C++ class member access rules are there to help authors of types to enforce invariants. If you fool the compiler into mutating private class members then you’re likely to be violating the class invariants, which risks leaving the program in an invalid state. You’re also relying on intimate knowledge of internal implementation details, for which the library author is under no stability obligations.
Should you experiment with this access-violation technique outside of production code? Absolutely! Learning how to subvert a system in a safe environment is not only fun, but it helps to foster a deeper understanding of that system. Untangling the machiavellian mechanisms in the Folly library tested my knowledge of C++, requiring me to improve my understanding of the language features I encountered along the way. It’s almost as if someone had reached in to my brain’s internal implementation and fiddled with its contents...
A note on the origins of the technique
The idea of using explicit template instantiation to bypass class access rules pre-dates the Folly library by a few years. The first mention I can find is from a 2010 blog post by Johannes Schaub [ Schaub10 ], which describes a method using initialization of static class members. At the time, there was a discussion on the Boost mailing list about how the technique might prove to be a useful addition to the Boost serialization library.
A year later, Schaub offered the dubiously tempting promise of ‘Safer Nastiness’ in a follow-up blog post [ Schaub11 ] in which he presented an improved version of the code. This removed the need for static class members and is much closer to what’s used today by the Folly library.
Acknowledgements
The author would like to thank Geoff Hester, Anthony Kirby and Kirsty McNaught for advice on early drafts of this article and Balog Pal for very patiently explaining some finer points of the one-definition rule.
References
[GitHub]: Facebook Folly on GitHub: https://github.com/facebook/folly
[GotW]: ‘Uses and Abuses of Access Rights at http://www.gotw.ca/gotw/076.htm
[JSS19]: ‘The Power of Hidden Friends in C++’ posted 25 June 2019: https://www.justsoftwaresolutions.co.uk/cplusplus/hidden-friends.html
[Saks18]: Dan Saks ‘Making New Friends’ recorded at CPPCon 18 , available at: https://www.youtube.com/watch?v=POa_V15je8Y
[Schaub10]: Johannes Schaub ‘Access to private members. That’s easy!’, posted 3 July 2010: http://bloglitb.blogspot.com/2010/07/access-to-private-members-thats-easy.html
[Schaub11]: Johannes Schaub ‘Access to private members: Safer nastiness’, posted 30 December 2011: http://bloglitb.blogspot.com/2011/12/access-to-private-members-safer.html
Footnotes
-
C++17 introduced the
std::invoke
template, which gives a unified syntax for working with callables. - C++20 will significantly relax the restrictions on non-type template parameters.
- I imagine it’s staggeringly useful to someone .
- Readers lacking both a C++17 compiler and a certain amount of moral fibre have probably already worked out how to use a macro to remove the duplication in the template arguments.
- If you choose to ignore the pitchfork-bearing members of the C++ standards committee currently approaching your front door with a polite request that you stop doing this sort of thing at all.
started out as a robotics researcher but accidentally became a C++ build engineer because nobody else wanted to do it. These days he appreciates the fact that his customers are all in the same building as him and that they are apparently unfazed by his eagerness to delete old code.