C++20’s long awaited module system has arrived. Nathan Sidwell presents a tourist’s guide.
One of the major C++ 20 features is a module system. This has been a long time in coming. The idea predates C++98; it is about time C++ caught up with other languages! In this article, I’ll show 3 example programs, using progressively more advanced organization of code. There are a number of call-out boxes answering a few questions the main text might suggest. You can read those separately.
The road to standardization | |
|
Let’s start with a simple example showing some modular concepts. Listing 1 is a module interface file – this is the source file that provides importable entities to users of the module.
// file: ex1/hello.cc module; // legacy includes go here – not part of this module #include <iostream> #include <string_view> export module Hello; // the module purview starts here // provide a function to users by exporting it export void SayHello (std::string_view const &name) { std::cout << "Hello " << name << "!\n"; } |
Listing 1 |
The name of the file containing that code can be anything, but let’s put it in hello.cc. Listing 2 is a user of that module.
// file: ex1/main.cc import Hello; // import the Hello module, // its exports become available #include <string_view> int main () { SayHello ("World"); } |
Listing 2 |
We can compile our program using a module-aware GCC1 with:
> cd ex1 > g++ -fmodules-ts -std=c++20 -c hello.cc > g++ -fmodules-ts -std=c++20 -c main.cc > g++ -o main main.o hello.o > ./main Hello World!
You’ll notice there are some differences to using header files:
- You compile the module interface, just as a regular source file.
- The module interface can contain non-inline function definitions.
- You need to compile the interface before you compile sources that import it.
Do we need a new source suffix? | |
|
The interface is a regular source file. It just happens to create an additional artefact to the usual object file – a Compiled Module Interface (CMI). That CMI is read by importers of the module, and then code can refer to entities exported by the module. It is this dependency that forces the compilation ordering. In this particular case, the CMI contains information about the SayHello
function’s declaration, but not (necessarily) about its body. If SayHello
was an inline function, the body would also (most likely) be present in the CMI.
What is a Compiled Module Interface (CMI) and how is it used? | |
|
You may notice that modules are not namespaces. A module can export names in any namespaces it chooses. The namespaces are common across all modules, and many modules can export names into the same namespace. An importer of a module has to use a qualified name to refer to a module’s exports (or deploy using-directives).
You’ll also have noticed that the main program had to #include <string_view>
, even though the interface had already done so. The interface had done this in part of the file that precedes the module itself, and that part is not visible to importers. As the user code needs to create a std::string_view
, it needs the header file itself. The header include and the import can be in any order. I’ll get more into detail about this later, as it is an important bridge from today’s code to the future’s module code.
Export
You’ll see the example used the resurrected export
keyword in two places:
export module Hello; // declare the interface of a module
export void SayHello (…); // make a declaration visible to importers
The first use is a module-declaration, a new kind of declaration specifying the current source file is part of a module. You can only have at most one of them, and there are restrictions on what can appear before it. The intent is that you won’t get surprised with it buried in the middle of a file. As you might guess, there’s a variant of the module-declaration, which lacks the export
keyword. I’ll get to that later.
The second use allows you to make parts of the module interface visible to importers, and most importantly its lack allows you to keep parts of the interface private to the module. Only namespace-scope nameable declarations can be exported. You can’t export (just) a member of a class, nor can you export a specific template specialization (specializations are not found by name). You cannot export things from within an anonymous namespace. You can only export things from the interface of a module (see Listing 3).
export module example; // You can export a class. // Both it and its members are available (usual // access restrictions apply) export class Widget { … }; namespace Tool { // export a member of a namespace export void Frobber (); } // export a using declaration (the used things must // be exported) export using Tool::Frobber; // export a typedef export using W = Widget; // export a template definition. Users can // instantiate it export template<int I> int Number () { return I;} // you cannot explicitly export a specialization, // but you can create them for importers to use template<> int Number<0> () { return -1; /* Evil! */ } |
Listing 3 |
If you export something, you must export it upon its first declaration. This is like declaring something static
– you have to do so on its first declaration, but a later redeclaration can omit the static
. In fact, export
is described in terms of linkage – it’s how you get external-linkage from inside a module. Declarations with external linkage are nameable from other modules.
Module ownership | |
|
So, what happens if you omit the export
inside a module? In that case, you get a new kind of linkage – module-linkage. Declarations with module-linkage are nameable only within the same module, as a module can consist of several source files, this is not like the internal-linkage you have with static
. It does mean that two modules could both have their own int Frob (int)
functions, without placing them into globally unique namespaces.
Types (including typedefs) can be exported (or not exported), in the same way as functions and variables. Types already have linkage (but typedefs do not). Usually we don’t think about that, because we use header files to convey such information and they textually include the class or typedef definition. Modules has more rigorous formulation of linkage of these entities that do not themselves generate code (and hence object-level symbols).
You can also export imports (see Listing 4).
// file: ex2/hello.cc module; #include <iostream> export module Hello; export import <string_view>; // importers get <string_view> using namespace std; // not visible to importers export void SayHello (string_view const &name) { cout << "Hello " << name << "!\n"; } // file: ex2/main.cc // same contents as ex1/main.cc |
Listing 4 |
Here I’ve imported and re-exported <string_view>
, (wait, what? importing a header file!? I’ll get to that) so that users do not need to #include
(or import) it themselves. To build this, you will need to process <string_view>
:
> cd ex2 > g++ -fmodules-ts -std=c++20 -c \ -x c++-system-header2 string_view > g++ -fmodules-ts -std=c++20 -c hello.cc > g++ -fmodules-ts -std=c++20 -c main.cc > g++ -o main main.o hello.o > ./main Hello World!
World in transition
So, how do I write my lovely new modules, but have them depend on olde worlde header files? It’d be unfortunate if it could only use modules. Fortunately there’s not one, but two ways to do this (with different trade-offs).
New keywords | |
|
You saw the first way in the early example. We had a section of the source file before the module-declaration. That section is known as a Global Module Fragment (GMF). It’s introduced by a plain module;
sequence, which must be the first tokens of the file (after preprocessing and comment stripping). If there is such a GMF, there must be a module-declaration – you can’t just have an introduced GMF, why would you need that? The contents of the GMF must entirely consist of preprocessing directives (or comments). You can have a #include
there, but you can’t have the contents of that #include
directly in the top-level source. The aim of this design is to make scanning for the module-declaration simple. Both the introductory module;
and the module-declaration must be in the top-level source, unobscured by macros.
In this way, modules can get access to regular header files, and not reveal them to their users – we get encapsulation that is, in general, impossible with header files. Hurrah!
There is a missed opportunity with this kind of scheme. The compiler still has to tokenize and parse all those header files, and we might be blocking the compilation of source files that depend on this module. That’s unfortunate. Another scheme to address this is header-units. Header units are header files that have been compiled in an implementation-specified mode to create their own CMIs. Unlike the named-module CMIs that we’ve met so far, all header-unit CMIs declare entities in the Global Module. You can import header-units with an import-declaration naming the header-file:
import <iostream>;
This import can be placed in the module’s purview, without making it visible to importers.
Naturally, as header-units are built from header files, there are issues with duplicate declarations and definitions. But we can make use of the One Definition Rule, and extend it into this new domain. Thus header-units may multiply declare or define entities, and be importable into a single compilation. Unlike header files, importing a header-unit is not affected by macros already defined at the point of the import – the meaning of the header-unit is determined by the macros defined when it was compiled to a CMI.
Not all header files are convertible to header-units. The goal here is to allow most of them to be, generally the well-behaved header files. This work derives from Clang-modules, which was an effort to do this seamlessly without changing source code.
One thing header-units do, which named modules do not, is export macros. This was unfortunately unavoidable as so many header files expose parts of their interface in the form of macros. Named-modules never export macros, even from re-exported header-units.
Implementations | |
|
Splitting a module
So far I’ve only shown a module consisting of a single interface file. You can split a module up in two different ways.
The simplest way is to provide module-implementation files, distinct from the interface. An implementation file just has a module-declaration lacking the export
keyword (it doesn’t export things). While a module must have only one interface file, it can have many implementation files (or none at all). The implementation files implicitly import the interface’s CMI, but themselves only produce an object file. If you think about modules as glorified header files, then this is the natural separation of interface and implementation (but you’re probably missing out).
The interface itself can be separated into module-partitions. Partitions have names containing exactly one :
. These themselves can be interface or implementation partitions depending on whether their module-declaration has the export
keyword or not. Interface partitions may export entities, just as the primary interface does. These interface partitions must be re-exported from the primary interface. The partitions may also be imported into any unit of the same module.
- Partitions provide a way to break a large interface into smaller chunks.
- Partitions are not importable into different modules. The partitions are invisible outside of their module.
- Implementation partitions provide a way to make certain definitions available inside the module only, but have users aware of the type (for instance).
For example we could break our original example up as shown in Listing 5.
// file: ex3/hello-inp.cc module; #include <string_view> // interface partition of Hello export module Hello:inter; export void SayHello (std::string_view const &name); // file: ex3/hello-imp.cc module; #include <iostream> // implementation partition of Hello module Hello:impl; import :inter; // import the interface partition import <string_view>; using namespace std; void SayHello (string_view const &name) // matches the interface partitions’s exported // declaration { cout << "Hello " << name << "!\n"; } // file: ex3/hello-i.cc export module Hello; // reexport the interface partition export import :inter; import :impl; // import the implementation partition // export the string header-unit export import <string_view>; // file: ex3/main.cc // same contents as ex1/main.cc |
Listing 5 |
In the primary interface, the three imports can be in any order. That’s one of the design goals – import order is unimportant. You can see that the import syntax for a partition doesn’t name the module. That’s also important, so that there is no temptation to import into a different module.
Here are the build commands:
> cd ex3 > g++ -fmodules-ts -std=c++20 -c \ -x c++-system-header string_view > g++ -fmodules-ts -std=c++20 -c hello-inp.cc > g++ -fmodules-ts -std=c++20 -c hello-imp.cc > g++ -fmodules-ts -std=c++20 -c hello-i.cc > g++ -fmodules-ts -std=c++20 -c main.cc > ar -cr libhello.a hello-{i,inp,imp}.o > g++ -o main main.o -L. -lhello > ./main Hello World!
Note that in this example there was no need to import the implementation partition – it had no semantic effect.
Module ABI stability
An important part of module interface design is control of the aspects that are visible to users. Generally, the parts of the interface that can result in the importer emitting code are part of the ABI of your module. You want to control that.
The One Definition Rule | |
|
Every exported inline function’s body is visible to importers (they need to refer to the entities it names), and changing the body can change the ABI of a module. To that end, one significant change has been made to in-class function definitions. They are no-longer implicitly inline in a module’s purview! The implicit functions are still inline, as are lambdas. This means you no longer have to separate the definitions of your non-inline member functions (including template definitions), from their in-class declaration.
How will build systems be affected? | |
|
Onwards!
I hope the examples here have shown you a flavour of what is available with modules. I kept the examples simple, to show some of the core module concepts, particularly how non-modular and modular code can interact.
As mentioned elsewhere, I believe the Microsoft implementation is the most advanced, and has been used for production code. Of the other implementations, GCC’s is more complete than Clang’s (mid 2020).
Unfortunately, for GCC one must use Godbolt, which is awkward for the more advanced use, or build one’s own compiler, which is a steep cliff to climb for most users. To make things even more exciting, those that have played with GCC have fallen over bugs. As with any major new feature, ensuring it is correct is difficult, and users have imaginative ways of exercising things. Don’t let that put you off though, user bug reports are helpful.
Footnotes
- GCC’s main development trunk and released versions do not yet provide module support. See the ‘Implementations’ box for details.
- As
string_view
has no suffix, you need to tell G++ what language it is. The c++-system-header language specifies (a) searching on the system#include
path and (b) with-fmodules-ts
, specifies building a header-unit. Other possibilities arec++-header
(automatically recognized with a variety of typical header file suffixes) andc++-user-header
(use using #include path).
is a long-time developer of GCC, having discovered that Open Source is more rewarding than proprietary software, compilers are more rewarding than hardware, and hardware is more rewarding than Physics.