Wednesday, May 20, 2015

on making a DDE kit, kinda

So I have this task of porting a huge piece of software running on a proprietary OS to another OS. And I don't even have a clue how to compile it (well I do but it builds on windows so it's almost irrelevant).

But luckily all code is linked into a single ELF file and the compilation produces intermediate object files. The first thought I had was to visualize the dependency graph of object files to find out who calls what. You can find the script below that will recursively walk the supplied directory and try to parse the import/export table with objdump. There are some areas for improvement (for example, parsing also the dynsym table with -T or parsing .a archives) but it did the work for me.

Unfortunately I realized that visualizing the graph with 30K edges was not even remotely a smart idea
https://gist.github.com/astarasikov/355bf825f130fe4b5633

What I've also found out was that the OS-specific code and objects was stored in a separate location (since it was a part of an SDK). Even if it were not, we could just remove those object files that were both present in our project and in the SDK. After that, all the functions that the application was requiring from the OS unsurprisingly ended up in the "UNDEFINED" node and there were only 200 of them which gives me some hope.

This approach can also be used for other use-cases. For example, porting drivers from Linux/FreeBSD to exotic platforms - first build the binaries, then pick as many of them as you can to minimize the required functions list. I find dealing with compiled code easier because build systems, C macros and ifdefs just drive me insane.

Friday, May 15, 2015

GCOV is amazing yet undocumented

One useful technique for maintaining software quality is code coverage. While routinely used by high-level developers it is completely forgotten by many C hackers, especially when it comes to kernel. In fact, Linux is the only kernel which supports being compiled with the GCOV coverage tool.



GCOV works by instrumenting your code. It inserts some code to increment the stats counters around each basic block. These counters reside in a special section of your ELFs. During compilation, GCC generates a ".gcno" file for each ".c" file. These files allow the "gcov" tool to lookup function names and other info using the integer IDs (which are specific to each file).

At runtime, an executable built with GCOV produces a file called ".gcda" which contains the values of the counters. During the executable initialization, constructors (which are function pointers in the ".ctors" section) are called. One of them is "__gcov_init" which registers a certain ".o" file inside libgcov.

Since we're running in kernel or "bare-metal", we don't have neither libgcov nor file system to dump the ".gcda" files. But one should not fear GCOV! Adding it to your kernel is just a matter of adding one C file (which is mostly shamelessly copy-pasted from linux kernel and gcc sources) and a couple CFLAGS. In my example I'm using the LK kernel by Travis Geiselbrecht (https://github.com/travisg/lk). I've decided to just print out the contents of the ".gcda" files to the serial console (UART) as hex dump and then use an AWK script and the "xxd" tool to convert them to binaries on the host. This works completely fine since these files are typically below 2KB in size.

An important thing to notice: if your kernel does not contain the ".ctors" section and does not call the constructors, be sure to add them to the ld script and add some code to invoke them. For example, here's how LK does that: https://github.com/travisg/lk/blob/master/top/main.c#L42

You can see the whole patch below.
https://github.com/astarasikov/lk/commit/2a4af09a894194dfaff3e05f6fd505241d54d074

After running the "gcovr" tool you can get a nice HTML with summary and see which lines were executed and which were not and add the tests for the latter. Woot!