Good analysis tools can really help track down problems. Paul Floyd investigates the facilities from a suite of tools.
Valgrind is a dynamic analysis tool for compiled executables. It performs an analysis of an application at run time. Valgrind can perform several different types of analysis (though not at the same time): memcheck (memory checking), cachegrind/callgrind (time profiling), massif (memory profiling), helgrind/drd (thread hazard detection). In addition there are some experimental tools: sgcheck (global/stack overrun detection, prior to Valgrind 3.7.0 this was called ptrcheck), dhat (heap use analysis) and bbv (extrapolating longer executions from smaller samples). There is also nulgrind, which does nothing other than execute your application.
Valgrind can be found at http://www.valgrind.org .
How it works
Unlike some other tools, Valgrind does not require the application under test (AUT) to be instrumented. You don’t have to perform a special compile or link, and it will even work with optimized builds, though obviously that makes it more difficult to correlate to source lines.
At the core of Valgrind is a virtual machine, VEX. When Valgrind runs your AUT, it doesn’t run it on the underlying OS, it runs it in the virtual machine. This allows it to intercept system calls and memory accesses required for the analysis it is performing. VEX performs on the fly translation of the AUT machine code to an Intermediate Representation (IR), which is RISC like and common to all architectures that Valgrind supports. This IR is more like a compiler IR than machine code.
Running your AUT in a virtual machine does of course have some disadvantages. The most obvious one is that there is a significant time penalty. The same AUT running directly on the OS will typically run about 10 times faster than under Valgrind. In addition to taking more time, the combination of Valgrind and the AUT will use more memory. In some cases, in particular on 32bit OSes, you might find that the AUT alone will fit in memory but that Valgrind+AUT doesn’t. In this case, if you can, the answer may be to recompile and test in 64bits. There are some discrepancies between Valgrind’s virtual machine and your real machine. In practice, I haven’t noticed anything significant other than small differences in floating point calculations. This is most noticeable with 32bit executables using the x87 FPU. Valgrind emulates floating point with 64bit doubles, whilst the x87 uses 80 bits for intermediate calculations.
The worst case problem with the virtual machine is an unhandled op-code. In this case you will get a message like
==[pid]== valgrind: Unrecognised instruction at address [address].
and then Valgrind will terminate. Fortunately, unless you are using some exotic/bleeding edge OS or hardware, this is unlikely to happen. Somewhat more likely, in particilar on 32bit OSes, is that you will run out of memory. Unfortunately, Valgrind may not produce any output in that case.
You can’t run the AUT directly in a debugger while it is running under Valgrind. It is possible to have Valgrind attach a debugger to the AUT when an error occurs, which gives you a snapshot of the executable rather like loading a core file. There is a second, better alternative that is now possible with Valgrind 3.7.0. This is to attach gdb to the embedded gdbserver within Valgrind. If you do this then you will be able to control the AUT much like a normal AUT under gdb, and also perform some control and querying of Valgrind. It is also possible to compile Valgrind tracing in the AUT which can output information on a specific region of memory. I'll cover these topics in more detail in a later article.
Availability
Valgrind started out on Linux, but it is now available on FreeBSD and Mac OS X. It is also available for some embedded systems. I’ve read of patches that support Windows, but it isn’t officially supported. If you are using Linux, then the easiest thing to do is to install it using your package manager. If you want to have the very latest version, then you can download the current package from the Valgrind website and build it yourself. Generally just running
./configure
,
make
and
make install
are sufficient. If you want to be able to use gdb, make sure that the version that you want to use when running Valgrind is in your PATH when you build Valgrind. If you want the very latest version, then you can get it from the Subversion server. Building is a little more involved, but the Valgrind website gives clear instructions.
Running Valgrind
So you’ve installed Valgrind. Next, how do you run it? Valgrind has a large number of options. There are 4 ways of passing options: a resource file (
.valgrindrc
) in your home directory, in an environment variable (
VALGRIND_OPTS
), in a resource file in the working directory and on the command line, in that order. If an option is repeated, the last appearance overrides any earlier ones. This means that the resource file in your home directory has the lowest priority and the command line has the highest priority.
The options can be broken down into two groups: core options and tool options. The core options are for Valgrind itself, like
--help
. The most important of these is
--tool
, used to select the tool to use, which must be passed on the command line. However, if
--tool
is omitted, the default will be memcheck.
You can specify options for more than one tool at a time by using the prefix
<toolname>:
before the option. For instance, to have memcheck report memory that is still reachable (but not leaked) when the AUT terminates, you can use
--memcheck:show-reachable=yes
. If you then use massif as the tool, this option will be ignored.
In my experience, it is best to put options that are independent of the test in the .valgrindrc and to pass options that may change on the command line.
Processes and output
Some of the tools default to sending their output to the console (e.g., memcheck). Others default to a file (e.g., massif). You can specify that the output go to a file with the
--log-file=<filename>
option. In either case, the output lines are prefixed with
==<pid>==
or
--<pid>--
showing the pid (process id) of the AUT. The
<filename>
can contain %p which will be expanded to the PID of the AUT. This is particularly useful if you will be running the same AUT more than once. For instance, say you have two AUTs, ‘parent’ and ‘child’, both of which you want to run under Valgrind. ‘parent’ runs ‘child’ more than once when it executes. If you let all of the output go to the console, then it might get all mixed up. Instead you could launch
valgrind --tool=memcheck --log-file=parent.%p.log parent
and arrange it so that ‘parent’ launches ‘child’ in a similar fashion
system("valgrind --tool=memcheck --log-file=child.%p.log child");
On completion, you would have something like parent.1234.log and child.1245.log , child.1255.log .
There is also the
--trace-children=yes
option. Whilst it is simpler than the above technique, it will cause all child processes to be traced. This is less useful if your application launches a lot of child processes, and you only want to test a small subset of them.
I’ll finish with a quick demonstration using Nulgrind. Here are my two test executables.
//parent.c #include <stdlib.h> int main(void) { system("./child"); }
and
// child.c #include <stdio.h> int main(void) { printf("Hello, child!\n"); }
Let’s compile them
gcc -o child -g -Wextra -Wall child.c -std=c99 -pedantic gcc -o parent -g -Wextra -Wall parent.c -std=c99 -pedantic
If I run ‘parent’ alone, I see the expected output. Now let’s try that with a minimum of options (Listing 1) and with the option to follow children (Listing 2).
So you can see the two PIDs, 9356 and 9357. 9356 spawns then execs sh, which in turn execs child.
Now if I run
valgrind --tool=none --trace-children=yes --log-file=nulgrind.%p.log ./parent
then I get two log files, nulgrind.10092.log and nulgrind.10091.log which basically contain the same information as the parent and child sections above.
Lastly, if I change the ‘system’ call to
system("valgrind --tool=none --log-file=child.%p.log ./child");
and I then run parent without the
–follow-children
option, like this
valgrind --tool=none --log-file=parent.%p.log ./parent
then I get two log files, parent.12191.log and child.12209.log , again with the content more or less as above.
That wraps it up for part 1. In part 2 I’ll cover the basics of memcheck.