The QNAP TS-XXX NAS boxes are nice little systems but one thing I really miss is a console. Martin Michlmayr has some instructions (TS-119, TS-219, TS-41x) on his excellent Debian On QNAP pages on how to build a suitable adaptor but sadly even though I've managed to get all the parts I've been too lazy to actually solder the thing together (it doesn't help that I nearly always burn myself when I use a soldering iron!).
This became more pressing when someone reported that Debian bug #693263 in qcontrol was not fixed in Wheezy since the issue appeared to be in the initramfs hook. Worse it seemed like something in my proposed fix was causing the system to not boot at all!
Having finally found the hardware needed to reproduce the issue my
first thought was to try netconsole. To do this I edited
/etc/initramfs-tools/modules
to add:
mv643xx_eth
netconsole netconsole=@<IP>/eth0,@<DST-IP>/<DST-MAC>
The option syntax is described in
netconsole.txt
but briefly: <IP>
is the address of the TS-419P II I'm debugging on,
and <DST-IP>
and <DST-MAC>
are the IP and MAC address of another
machine on the network.
Having done that, running update-initramfs -u
and rebooting I can
use netcat -u -l -p 6666
on the other machine to see all the kernel
messages.
So far so good but this doesn't get me any debugging from the
userspace portions of the initramfs. To get those we have to
get a bit hacky by editing /usr/share/initramfs-tools/init
, first to
change:
# Parse command line options
-for x in $(cat /proc/cmdline); do
+for x in $(cat /proc/cmdline) debug; do
case $x in
and secondly:
debug)
debug=y
quiet=n
- exec >/run/initramfs/initramfs.debug 2>&1
+ exec >/dev/kmsg 2>&1
set -x
;;
The first of these simulates adding debug
to the kernel command line
(which can't otherwise easily be edited on these systems) and the
second redirects the initramfs process's output to the kernel log. The
overall effect is that the output of the initramfs processes appears
over netcat.
What is Valgrind?
Valgrind is a framework for building dynamic analysis tools. Several useful tools are included in the Valgrind distribution including tools to check dynamic memory usage (memcheck
), a cache profiler (cachegrind
), heap profiler (massif
) and thread debugger (helgrind
) among others. Valgrind also provides a framework which can be used to build other tools.
The Valgrind tool which I find most useful and the one which I have most experience with is memcheck
. This tool can detect all manner of memory management problems, including use after free, using uninitialized data, memory leaks, double free. Between them these can result in savings of many hours of staring a core dumps and gdb backtraces.
How Does memcheck
Work?
At its core Valgrind is a dynamic binary translation engine, which is used to instrument the code at runtime. In the case of memcheck
this is used to track properties of the memory in a process, including aspects such as whether each bit (yes, bit) of memory has been initialized since it was allocated. It also tracks this information as data is loaded into registers, so it can know if a given register is currently tainted with uninitialized data or not. As well as instrumentation through binary translation Valgrind also includes instrumented versions of the C dynamic memory allocation functions which are used to track whether a each bit of memory is currently considered free or allocated, as well as tainting registers when they contain a pointer to memory which has been freed.
A large part of memcheck
's functionality is built upon Valgrind's ability to determine when memory has been initialized. However although binary translation can be used to instrument when the application itself has initialized some memory it is not currently possible to instrument the behaviour of system calls, in other words Valgrind cannot automatically determine when memory has been initialised e.g. after a read(2)
system call. For this reason Valgrind has baked in knowledge about the behaviour of system calls on various platforms.
For example lets consider the read(2)
system call. The prototype of read(2)
is:
ssize_t read(int fd, void *buf, size_t count);
This system call reads up to count
bytes from the file descriptor fd
into the memory pointed to by buf
and returns the number of bytes read. Here is the code within Valgrind which handles this system call (found in coregrind/m_syswrap/syswrap-generic.c
):
PRE(sys_read) { *flags |= SfMayBlock; PRINT("sys_read ( %ld, %#lx, %llu )", ARG1, ARG2, (ULong)ARG3); PRE_REG_READ3(ssize_t, "read", unsigned int, fd, char *, buf, vki_size_t, count); if (!ML_(fd_allowed)(ARG1, "read", tid, False)) SET_STATUS_Failure( VKI_EBADF ); else PRE_MEM_WRITE( "read(buf)", ARG2, ARG3 ); } POST(sys_read) { vg_assert(SUCCESS); POST_MEM_WRITE( ARG2, RES ); }
These two functions are called before and after the system call respectively. The main piece of functionality is the calls to PRE_MEM_WRITE
and POST_MEM_WRITE
. In this case PRE_MEM_WRITE
is used to check that the memory starting at ARG2
(recall that this is the supplied buffer) and extending for ARG3
bytes (the system calls count
argument) consists of allocated memory. After
the system call in complete the post function calls POST_MEM_WRITE
to indicate to Valgrind that RES
bytes of the buffer have know been initialized.
The above is a relatively simple case, but the above two hooks must be supplied for each system call on each platform which Valgrind wishes to support. One of the most problematic system calls is the ioctl(2)
system call. The ioctl
call takes a file descriptor, a request number and a pointer to a per-ioctl argument structure and implements per-device control requests. There
are literally dozens of devices many with their own specific ioctls. For this reason a sizable portion of Valgrind code is dedicate to decoding the behaviour of the myriad of potential ioctls
and providing pre- and post-call instrumentation for them.
Interacting with the Hypervisor from Userspace
Under Xen the toolstack is a normal userspace process running in the control domain (typically domain 0). As part of its operation the toolstack needs to communicate with the hypervisor and to do this it uses hypercalls. However a userspace process cannot simply make hypercalls on its own, instead it must request that the OS kernel make the hypercall for it. This prevents just any userspace process making a hypercall, since the kernel will first check the privilege of the process making the request.
Under Linux a userspace process makes hypercalls by using the ioctl(2)
system call on a special "Privileged Command" (privcmd
) device. In this case the toolstack uses the IOCTL_PRIVCMD_HYPERCALL
ioctl request which takes a hypercall number and the hypercall arguments as its argument. Other OSes which can be used as a control domain posses similar interfaces.
Making memcheck
Work With Xen Toolstacks?
As might be expected the majority of the work to allow Valgrind to work on Xen toolstack processes was teaching it about the IOCTL_PRIVCMD_HYPERCALL
ioctl. Fortunately it was not necessary to teach Valgrind about every hypercall but only about those which are used by toolstacks. This is a smallish subset of all hypercalls including the domctl
interfaces and a handful of others.
The initial batch of patches cover all of the system calls needed to start new domains, shut them down and migrate them using the XL toolstack. New hypercalls will be added as they are discovered to be missing.
Getting and Using This Functionality
The initial set of patches were accepted into Valgrind's Subversion repository trunk revision 13081 in October 2012 and are expected to be part of the 3.9.0 release.
Until 3.9.0 is released it will be necessary to get this functionality from SVN. Fortunately this is relatively simple and follows the usual autofoo
dance. Although the usual caveats regarding running unreleased software apply.
$ svn co svn://svn.valgrind.org/valgrind/trunk valgrind $ cd valgrind $ ./autogen.sh $ ./configure $ make $ sudo make install
Once Valgrind is built and installed simply call it passing the command to be debugged as an option:
# valgrind xl list ==16186== Memcheck, a memory error detector ==16186== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==16186== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info ==16186== Command: xl list ==16186== Name ID Mem VCPUs State Time(s) Domain-0 0 512 4 r----- 332.5 ==16186== ==16186== HEAP SUMMARY: ==16186== in use at exit: 0 bytes in 0 blocks ==16186== total heap usage: 16 allocs, 16 frees, 93,529 bytes allocated ==16186== ==16186== All heap blocks were freed -- no leaks are possible ==16186== ==16186== For counts of detected and suppressed errors, rerun with: -v ==16186== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 39 from 6)
Here we see that xl list
has no leaks! However if I comment out the call to libxl_dominfo_list_free
in tools/libxl/xl_cmdimpl.c:main_list()
then instead Valgrind reports:
[...] ==16203== HEAP SUMMARY: ==16203== in use at exit: 90,112 bytes in 1 blocks ==16203== total heap usage: 16 allocs, 15 frees, 93,529 bytes allocated ==16203== ==16203== LEAK SUMMARY: ==16203== definitely lost: 0 bytes in 0 blocks ==16203== indirectly lost: 0 bytes in 0 blocks ==16203== possibly lost: 90,112 bytes in 1 blocks ==16203== still reachable: 0 bytes in 0 blocks ==16203== suppressed: 0 bytes in 0 blocks ==16203== Rerun with --leak-check=full to see details of leaked memory ==16203== ==16203== For counts of detected and suppressed errors, rerun with: -v ==16203== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 39 from 6) [...]
So it has caught the memory leak! Rerunning with the suggests --leak-check=full
goes even further and tells me where the leaked memory was allocated:
[...] ==16204== 90,112 bytes in 1 blocks are possibly lost in loss record 1 of 1 ==16204== at 0x4024480: calloc (vg_replace_malloc.c:593) ==16204== by 0x4050CE4: libxl_list_domain (libxl.c:548) ==16204== by 0x805EEC7: main_list (xl_cmdimpl.c:3931) ==16204== by 0x804D6DD: main (xl.c:285) [...]
To ease debugging of domain creation I find it useful to use the -F
option to xl create
to stop xl
from forking and daemonising. Most xl
subcommands do not daemonize and so need no special treatment.
For more information on running Valgrind see the Valgrind Documentation.
Conclusion
Valgrind is a powerful tool in the arsenal of the C programmer, and can be used to catch a large number of hard to debug and common issues before they happen can save many hours of tedious debugging. The ability to use tools such as this on Xen toolstack processes is of enormous benefit and has already found several bugs in the XL toolstack.
Like many people I'll be going to FOSDEM again this year.
I'll be giving a talk on Saturday at 1800 in the Virtualisation devroom on Evolving Xen Paravirtualization. I'll also be manning the Xen.org booth at least some of the time but hopefully I'll get to see some of the other talks too.
Hope to see everyone there!