When we allocate memory we often forget about alignment. Paul Floyd reminds about various aligned allocation functions.
Recently I’ve been doing some work with the various Unix-like systems implementations of C functions to allocate aligned memory. These are memalign
, aligned_alloc
and posix_memalign
. Typically, you would use these functions to get memory that is aligned with cache lines or virtual memory pages. As an example of this, imagine a networking application that needs to allocate struct msghdr
and to have the fastest memory access possible. This struct has a size of 56 bytes. If you use malloc
to allocate your memory you are likely to get back a pointer that, depending on the system, is 8- or 16-byte aligned. That means that there is a fair chance that the memory will straddle a 64-byte alignment boundary. That is bad because that is what cache lines map to, meaning that accessing fields of the structure will hit two cache lines. This increases the risk of cache misses, resulting in lower performance.
I’m not going to detail the performance benefits (or drawbacks) of using these functions. Instead in this article I’ll be discussing some of the issues that I saw. The implementations that I’ve looked at are Linux glibc [GNU libc], Linux musl [musl], FreeBSD jemalloc [FreeBSD], macOS [XNU] and Illumos [illumos]. There are other malloc libraries (Illumos umem, tcmalloc, rpmalloc and snmalloc for instance) but I haven’t looked at them. Also, (almost) no Windows as I don’t use it enough to make fair comment.
History
These functions go back a long way. memalign
goes back to SunOS 4.1.3 (Aug 1992 according to Wikipedia). Despite its age it is not a ‘standard’ function. The non-standard-ness shows, as we’ll see shortly. That means it doesn’t figure in either the C standard or the POSIX standard. It doesn’t exist on macOS. glibc and musl both have implementations. Finally, FreeBSD gained a version late in the game in 2020 to add glibc compatibility.
posix_memalign
, as the name implies, is a bona fide part of the POSIX spec. IEEE Std 1003.1d-1999 Additional Realtime Extensions to be precise. All the systems and libraries that I looked at implement posix_memalign
.
aligned_alloc
was standardized in C11. Again, this was implemented on all the systems that I looked at.
What they claim to do
Here is what the Linux man page says:
The function posix_memalign() allocates size bytes and places the address of the allocated memory in *memptr. The address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *). This address can later be successfully passed to free(3). If size is 0, then the value placed in *memptr is either NULL or a unique pointer value.
The obsolete function memalign() allocates size bytes and returns a pointer to the allocated memory. The memory address will be a multiple of alignment, which must be a power of two.
The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment.
That all sounds very reasonable. The POSIX standard has similar wording for posix_memalign
. The spec can be accessed from The Open Group [opengroup], but you need to create an account and log in to access it.
Sadly, C11 does not have very much to say about aligned_alloc
:
The value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment.
Great, so the alignment can be anything, but the size needs to be a multiple of the same anything. The final draft of C11 can be found here [C11 final].
I can’t comment on memalign
since it isn’t standardized.
Musl, and more specifically Alpine Linux, doesn’t change the man page.
The FreeBSD description for posix_memalign
is very similar. For aligned_alloc
it says:
The aligned_alloc() function allocates size bytes of memory such that the allocation’s base address is a multiple of alignment. The requested alignment must be a power of 2. Behavior is undefined if size is not an integral multiple of alignment.
There is no manpage for memalign
on FreeBSD.
Illumos has the following to say of memalign
:
The memalign() function allocates size bytes on a specified alignment boundary and returns a pointer to the allocated block. The value of the returned address is guaranteed to be an even multiple of alignment. The value of alignment must be a power of two and must be greater than or equal to the size of a word.
The Illumos wording for posix_memalign
is again similar to the others, but with one exception. This time the behaviour when the size
is zero is specified:
If the size of the space requested is 0, the value returned in memptr will be a null pointer.
The macOS manpages are quite similar to FreeBSD.
To summarize so far, posix_memalign
is fairly well defined. memalign
is a bit hazy for a size of zero and I’m not sure what Solaris was getting on about saying that the return address will be an even multiple of the alignment. All of the descriptions of aligned_alloc
say that the alignment must be a power of two and the size an integral multiple of the alignment.
What they actually do?
So how do the implementations match up to the specs? I’m not going to go into internal details – all the functions may allocate more than asked or be aligned to a higher value.
Thus far I’ve been describing the functions in chronological order. This time I’m going to let posix_memalign
jump the queue. All the implementations behave as specified. Illumos does indeed not allocate if the size is zero. The other implementations allocate some unspecified amount.
The man page for Linux glibc memalign
claimed that the alignment must be a power of two. In fact, any value of alignment will be accepted and silently bumped up to the next power of two.
Two of the memalign
implementations were buggy. FreeBSD would crash if the alignment was zero – I’ve submitted a patch for that which has been merged. Illumos only restricts the memalign
alignment to being a multiple of four. That can result in some peculiar values for the alignment. I’ve opened a bug tracker item for that. There was nothing wrong with musl that I could see.
On to the last of the trio, aligned_alloc
. The Linux man page claims that this is the same as memalign
except that the size should be a multiple of the alignment. For glibc, doing that would be an amazing technical feat. The two functions are in fact the same. To be more precise they are both weak aliases of __libc_memalign
. So, there is no extra constraint on the size.
What is a 'weak alias'? | |
|
Other platforms also use a lot of code sharing. FreeBSD memalign
calls aligned_alloc
but with the size rounded up to a multiple of alignment. If anything, I would have expected the opposite, but anything goes when functions are non-standard, or implementation defined. Musl memalign
just calls aligned_allloc
. And with a nice bit of symmetry, Illumos aligned_alloc
just calls memalign
.
Just when I thought I’d covered everything, I discovered that if you use a huge value of alignment with musl aligned_alloc
then it will crash with a segfault. The crash is in version 1.2.2 and it has apparently been fixed in 1.2.3.
So far, no platform has done anything about the “the value of size shall be an integral multiple of alignment” part of the C11 standard. macOS is the remaining platform and it DOES do something about it. If the size isn’t an integral multiple of the alignment, then it will return NULL and set errno to EINVAL.
One thing that is generally not documented is that most of the functions will fail if the alignment is huge (over half the memory space). In that case they will return NULL
and set errno
to EINVAL
.
Windows almost got away without a mention. Whilst Windows doesn’t have any of the Unix aligned allocation functions (not even C11 aligned_alloc
), it does have its own variation. It’s called _aligned_malloc
[Microsoft].
Other than having an underscore and an extra ‘m’, Microsoft also has the order of the alignment and the size arguments reversed. That seems to me a source of confusion and potential bugs. I’m not sure if _aligned_alloc
predates memalign
, I see references to it going as far back as VC++ 6.0 (1998). That means that by the time C11 came around there were already functions with different argument ordering.
Advice
Whilst I must say that I was quite underwhelmed by the quality of what I saw, I don’t think that in practice these are big issues. I do recommend that you avoid using an alignment that is zero or a non-power of two. Unfortunately, Hyram’s law [hyrum] says that there is probably code out there that is taking advantage of Linux glibc working out the next power of two for the alignment. For portability, posix_memalign
and aligned_alloc
have the edge, and of the two, aligned_alloc
is easier to adapt to its Windows counterpart, _aligned_malloc
. However, you still need to take care that the size is an integral multiple of the alignment if you also port to macOS.
References
[C11 final] International Standard: https://open-std.org/JTC1/SC22/WG14/www/docs/n1570.pdf
[FreeBSD] Source for freebsd: https://github.com/freebsd/freebsd-src
[GNU libc] Source for glibc v2.37: https://elixir.bootlin.com/glibc/glibc-2.37/source
[hyrum] Hyrum’s Law: https://www.hyrumslaw.com/
[illumos] Illumos is the continuation of OpenSolaris: https://github.com/illumos/illumos-gate
[Microsoft] _aligned_malloc: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/aligned-malloc?view=msvc-170
[musl] Source for musl: https://elixir.bootlin.com/musl/v1.2.3/source
[opengroup] Open Group Library: https://publications.opengroup.org
[XNU] Source browser: https://opensource.apple.com/source/xnu/ (there are also GitHub mirrors)
has been writing software, mostly in C++ and C, for about 30 years. He lives near Grenoble, on the edge of the French Alps and works for Siemens EDA developing tools for analogue electronic circuit simulation. In his spare time, he maintains Valgrind.
Idalia is a freelance artist operating at the intersection of art and geek, using a myriad of techniques and styles to produce works that both delight and entertain.