Learning how to use our tools well is a vital skill. Paul Floyd shows us how to check for memory problems.
In the first part of this series I explained what Valgrind is. In this article, I’ll start explaining how to use it. Memcheck is the best known of the Valgrind tools. It is a runtime memory checker that validates your use of heap memory and (to a lesser extent) stack memory.
Memcheck detects the following kinds of errors
- Illegal read/write
- Use of uninitialized memory
- Invalid system call parameters
- Illegal frees
- Source/destination overlap
- Memory leaks
Some other memory checking tools have snappier TLA names for the errors that they detect. Some people would like to be able to prioritize the types of errors. I’d say that in general all of these errors can cause either incorrect operation of an application or crashes. Generally the 1st and 4th items on the list are the most likely causes of crashes, but don’t take that as advice to neglect the other four types.
General advice
Don’t overdo the options. The default options are good for most situations. Some of the options will add significantly to the already high overhead. If you discover a fault and the default output is not enough for you to pin down the error, then consider adding more options. Personally I use memcheck in two ways. Firstly in automatic regression tests each weekend. All of the results get distilled into a single summary. Secondly ‘interactively’ in a shell, and in this mode I tend to turn up the options.
All of the examples that follow use trivial examples. In real world defects, the locations of the fault, the declaration, the allocation, the initialization and the free may all be far apart.
Illegal read/write errors
Illegal read/write errors correspond to reads or writes to addresses that do not belong to any valid address.
The example in Listing 1 shows reading beyond the end of an array.
// abrw.cpp #include <iostream> void f(int *p2) { int i1 = p2[10]; // write beyond the end of p1 std::cout << "Hello\n"; } int main() { int *p1 = new int[10]; f(p1); delete [] p1; } |
Listing 1 |
Compiling this and running it under memcheck will generate the output shown in Figure 1.
==85258== Invalid read of size 4 ==85258== at 0x400AA4: f(int*) (abrw.cpp:5) ==85258== by 0x400B5E: main (abrw.cpp:12) ==85258== Address 0x1c90068 is 0 bytes after a block of size 40 alloc'd ==85258== at 0x1006BB7: operator new[](unsigned long) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==85258== by 0x400B51: main (abrw.cpp:11) |
Figure 1 |
If that had been a long long or a long on a 64 bit platform, then it would have been an
Invalid read of size 8
.
Use of uninitialized memory
You’ll get this sort of error if you read memory before assigning any value to it. For instance, if you malloc an array then read an element from it. Valgrind also propagates the state of initialization through assignments and will only trigger an error if the execution outcome could be affected by the uninitialized state of the memory. This means that harmless errors do not generate any messages (good news) but also it means that the site where memcheck says the error occurs could be far from where the uninitialized memory was allocated.
Listing 2 shows an example of this. I’ve deliberately made the error propagate through three variables in function
f()
to illustrate that no error is generated until the
if()
condition is reached.
// uninit.cpp #include <iostream> void f(long *p2) { long l1 = p2[10]; // read beyond end of p1 long l2 = l1; // propagates long l3 = l2; // propagate again if (l3) // uninitialized read { std::cout << "Hello\n"; } } int main() { long *p1 = new long[10]; f(p1); delete [] p1; } |
Listing 2 |
This will result in the output shown in Figure 2.
==93289== Invalid read of size 8 ==93289== at 0x400AA4: f(long*) (uninit.cpp:5) ==93289== by 0x400B7E: main (uninit.cpp:17) ==93289== Address 0x1c90090 is 0 bytes after a block of size 80 alloc'd ==93289== at 0x1006BB7: operator new[](unsigned long) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==93289== by 0x400B71: main (uninit.cpp:16) |
Figure 2 |
If the error is in a stack variable rather than in a heap variable, you get a bit less information (see Listing 3).
// uninit2.cpp #include <iostream> void f(long l) { long lb = l; long lc = lb; long ld = lc; if (lc) { std::cout << "Hello\n"; } } int main() { long la; // uninitialized local scalar f(la); } |
Listing 3 |
This gives just the output in Figure 3a.
==4164== Conditional jump or move depends on uninitialised value(s) ==4164== at 0x4009E9: f(long) (uninit2.cpp:8) ==4164== by 0x400A10: main (uninit2.cpp:17) by 0x400B71: main (uninit.cpp:16) |
Figure 3a |
Use
--memcheck:track- origins=yes
for more info, but this will increase the Valgrind overhead. Adding this option gives the output in Figure 3b.
==4455== Conditional jump or move depends on uninitialised value(s) ==4455== at 0x4009E9: f(long) (uninit2.cpp:8) ==4455== by 0x400A10: main (uninit2.cpp:17) ==4455== Uninitialised value was created by a stack allocation ==4455== at 0x400A00: main (uninit2.cpp:14) |
Figure 3b |
OK, so it narrows the search down to
main()
, but it doesn’t tell us the name of the variable or the line (the file and line numbers in the output are where teh functions start, not where the problem is).
Invalid system call parameters
Listing 4 is a
std::fwrite
of memory that is not initialized.
// syscall.cpp #include <cstdio> const std::size_t intArraySize = 3; int main() { std::FILE *f = std::fopen("output.dat", "w"); if (f) { int *intArray = new int[intArraySize]; std::size_t bytesWritten = 0U; intArray[0] = 1; // intArray[1] not initialized intArray[2] = 3; bytesWritten = std::fwrite(intArray, sizeof(int), intArraySize, f); // omit check std::fclose(f); delete [] intArray; } } |
Listing 4 |
This will generate the output shown in Figure 4a.
==468== Syscall param write(buf) points to uninitialised byte(s) ==468== at 0x148C82: write$NOCANCEL (in /usr/lib/libSystem.B.dylib) ==468== by 0x148BFC: _swrite (in /usr/lib/libSystem.B.dylib) ==468== by 0x148B41: __sflush (in /usr/lib/libSystem.B.dylib) ==468== by 0x14859A: fclose (in /usr/lib/libSystem.B.dylib) ==468== by 0x100000EB6: main (syscall.cpp:16) ==468== Address 0x100004134 is 4 bytes inside a block of size 4,096 alloc'd ==468== at 0xD6D9: malloc (vg_replace_malloc.c:266) ==468== by 0x1489ED: __smakebuf (in /usr/lib/libSystem.B.dylib) ==468== by 0x148959: __swsetup (in /usr/lib/libSystem.B.dylib) ==468== by 0x10ABC8: __sfvwrite (in /usr/lib/libSystem.B.dylib) ==468== by 0x15C3C4: fwrite (in /usr/lib/libSystem.B.dylib) ==468== by 0x100000EA9: main (syscall.cpp:14) |
Figure 4a |
Look carefully at the log in Figure 4a and you will see that the error occurs when the file is closed, not when the call to
std::fwrite
is performed. This is because the output is cached. And this can be quite pernicious. If I add a call to
std::setvbuf(f, 0, _IONBF, 0);
after the
std::fopen
, then the log that I get as shown in Figure 4b.
==534== Syscall param write(buf) points to uninitialised byte(s) ==534== at 0x148C82: write$NOCANCEL (in /usr/lib/libSystem.B.dylib) ==534== by 0x148BFC: _swrite (in /usr/lib/libSystem.B.dylib) ==534== by 0x10AC16: __sfvwrite (in /usr/lib/libSystem.B.dylib) ==534== by 0x15C3C4: fwrite (in /usr/lib/libSystem.B.dylib) ==534== by 0x100000E97: main (syscall.cpp:15) ==534== Address 0x1000040e4 is 4 bytes inside a block of size 12 alloc'd ==534== at 0xD6D9: malloc (vg_replace_malloc.c:266) ==534== by 0x64F04: operator new(unsigned long) (in /usr/lib/libstdc++.6.0.9.dylib) ==534== by 0x64F96: operator new[](unsigned long) (in /usr/lib/libstdc++.6.0.9.dylib) ==534== by 0x100000E5E: main (syscall.cpp:11) |
Figure 4b |
With an unbuffered stream, you see the error immediately rather than when the buffer is flushed.
Illegal frees
An example of this is freeing stack memory (Listing 5). This one is a bit of a no-brainer, the compiler complains about the code and I get a nice core dump if I run the application.
// ifree.cpp void func() { int stackArray[10]; delete stackArray; // not even array delete } int main() { func(); } |
Listing 5 |
The corresponding output is in Figure 5.
==72595== Invalid free() / delete / delete[] ==72595== at 0x1004DDC: operator delete(void*) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==72595== by 0x400680: func() (ifree.cpp:4) ==72595== by 0x400698: main (ifree.cpp:9) ==72595== Address 0x7ff000240 is on thread 1's stack |
Figure 5 |
Let’s try a somewhat more likely error, using the wrong
delete
(see Listing 6).
// ifree2.cpp void func() { int *heapArray = new int[10]; delete heapArray; // not even array delete } int main() { func(); } |
Listing 6 |
The corresponding output is in Figure 6. Here, memcheck correctly identified that there was an incorrect delete, but it doesn’t go as far as saying that the memory was allocated with array new but deleted with scalar delete.
==72950== Mismatched free() / delete / delete [] ==72950== at 0x1004DDC: operator delete(void*) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==72950== by 0x4006DE: func() (ifree2.cpp:4) ==72950== by 0x4006F8: main (ifree2.cpp:9) ==72950== Address 0x1c8f040 is 0 bytes inside a block of size 40 alloc'd ==72950== at 0x1005BB7: operator new[](unsigned long) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==72950== by 0x4006D1: func() (ifree2.cpp:3) ==72950== by 0x4006F8: main (ifree2.cpp:9) |
Figure 6 |
Source/destination overlap
The usual example of this is a
std::strcpy
where the source and destination point within the same char array (Listing 7).
// overlap.cpp #include <cstring> #include <iostream> int main() { char *str = new char[100]; std::sprintf(str, "Hello, world!"); std::strcpy(str, str+2); std::cout << "str " << str << "\n"; delete [] str; } |
Listing 7 |
Valgrind’s output is shown in Figure 7.
==74324== Source and destination overlap in strcpy(0x1c90040, 0x1c90042) ==74324== at 0x1009A61: strcpy (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==74324== by 0x400BE9: main (overlap.cpp:8) |
Figure 7 |
The standard solution to this sort of problem is to use
std::memmove
instead of
std::strcpy
or
std::memcpy
.
Memory leaks
This is the largest of the memcheck error types. Memcheck can detect 3 different types of ‘leak’. The definite leak, where the pointer has gone out of scope and the memory is leaked. Next there are possible leaks. This is where there are no longer pointers to the start of the allocated memory, but there are still pointers within the allocated memory. Finally there is still-in-use memory, where both the memory and the pointer to it still exist.
If you use a memory manager (e.g., a pool allocator), then this can complicate leak detection. For instance, if your application has a pool allocator that news blocks of 100MBytes, uses an overloaded operator new that uses this pool, optionally does some overloaded deletes, and then when it terminates deletes all of the pool blocks, memcheck won’t be able to detect any leaks, even though your application may be leaking your pool memory in the sense that it wasn’t deleted and made available for reuse before the pool was deleted. Furthermore, if you are using an allocator that allocates blocks that are handled as
{length:memory[:guard]}
, so that the pointer obtained by new is adjusted after setting the length, then you’re likely to get possible leaks detected rather than definite leaks.
There are two things that you can do in this case. One is to have a special build, where you compile with a macro like
-DDEFAULT_NEW
which disables the memory allocator and uses the standard allocators. Obviously having two sets of code is not ideal, and this will be a maintenance overhead. The alternative is to include the
valgrind.h
header and use the Valgrind
MEMPOOL
macros. More on that in a later article.
A very short example of this in Listing 8.
// leak.cpp int main() { int *leak = new int(42); } |
Listing 8 |
Valgrind’s output for this is in Figure 8.
==76314== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==76314== at 0x1005F79: operator new(unsigned long) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==76314== by 0x400681: main (leak.cpp:3) |
Figure 8 |
Suppressing errors
Memcheck will use a default suppression file that was generated on the machine where Valgrind was built. This will suppress ‘well known’ (and hopefully harmless) errors in libc and X11. You can also use user-defined suppression files with the option:
-- memcheck:suppressions=<suppression file>
This can be used more than once. I would advise that you do this only for harmless errors or errors in third party libraries that you can’t fix. As a rule, you’re better off fixing your errors than hiding them in a suppression file.
You can use
--memcheck:gen- suppressions=all
to generate suppression stacks in output log file, which look like this
{ <insert_a_suppression_name_here> Memcheck:Leak fun:_Znwm fun:main }
The opening and closing braces delimit the error callstack. The first line is intended for use as a comment. I would recommend that you change this and try to make it something unique. If you use
valgrind -v
, then in the summary, Valgrind will list all of the suppressions that it used with their comments. This can be used to see which of your suppressions are being used, which allows you to clean out your suppressions files from time to time.
The second line gives the type of error.
The third to last lines are the callstack. Each line has one of the following forms
-
fun
: function name for unstripped functions. -
obj
: name of library for stripped functions. -
…
: wildcard for any depth. This can be useful for recursive functions that would otherwise need N different suppressions for N possible depths of recursion.
You can use
*
wildcard to make suppressions more generic. For instance, if you want to use the same suppression files on both 32bit and 64bit Linux, then instead of having two separate suppressions for each platform, one with
/opt/mypkg/lib
and the other with
/opt/mypkg/lib64
, you could have just one suppression with
/opt/mypkg/lib*
.
You may want to reduce the amount of callstack that appears in the suppression. This can reduce the number of suppressions that you need (which is OK if they are all the same issue). Don’t overdo it though, you don’t want to suppress genuine errors.
Errors that memcheck does not detect
Lastly but not least, there are a few types of memory errors that memcheck does not detect.
Reading or writing beyond arrays that are global or on the stack, for instance
int x[10]; // local, global or static x[10] = 1;
Try using exp-sgcheck for this sort of error.
Now that we’ve covered the basics of memcheck, in the next article we’ll look at more advanced techniques.