I hate software: Notes on firmware and tools

Introduction.

In this blog post I mainly want to summarize my latest experiments at using clang's static analyzer and some thoughts on what could be further done at the analyzer, and open-source software quality in general. It's mostly some notes I've decided to put up.

Here are the references to the people involved in developing clang static analyzer at Apple. I recommend following them on twitter and also reading their presentation on developing custom checkers for the analyzer.
Ted Kremenek - https://twitter.com/tkremenek
Anna Zaks - https://twitter.com/zaks_anna
Jordan Rose - https://twitter.com/UINT_MIN

"How to Write a Checker in 24 Hours" - http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf
"Checker Developer Manual" - http://clang-analyzer.llvm.org/checker_dev_manual.html - This one requires some understanding of LLVM, so I recommend to get comfortable with using the analyzer and looking at AST dumps first.

There are not so many analyzer plugins which not made at Apple.

The "Tartan" which is intended to check the GLib conventions used in GNOME projects. https://www.freedesktop.org/software/tartan/
A presentation called "MPI-Checker – Static Analysis for MPI" https://llvm-hpc2-workshop.github.io/slides/Droste.pdf

GLibc.

Out of curiosity I ran the clang analyzer on glibc. Setting it up was not a big deal - in fact, all that was needed was to just run the scan-build. It did a good job of intercepting the gcc calls and most of the sources were successfully analyzed. In fact, I did not do anything special, but even running the analyzer with the default configuration revealed a few true bugs, like the one showed in the screenshot.

For example, the "iconv" function has a check to see if the argument "outbuf" is NULL, which indicates that it is a valid case expected by the authors. The manpage for the function also says that passing a NULL argument as "outbuf" is valid. However, we can see that one of the branches is missing the similar check for NULL pointer, which probably resulted from a copy-paste in the past. So, passing valid pointers to "inbuf" and "inbytesleft" and a NULL pointer for "outbuf" leads to the NULL pointer dereference and consequently a SIGSEGV.

Fun fact: my example is also not quite correct, as pointed out by my Twitter followers. The third argument to iconv must be a pointer to the integer, not the integer. However, on my machine my example crashes at dereferencing the "outbuf" pointer and not the "inbytesleft", because the argument evaluation order in C is not specified, and on x86_64 arguments are usually pushed to the stack (and therefore evaluated) in the reverse order.

Is it a big deal? Who knows. On the one hand, it's a userspace library, and unlike OpenSSL, no one is likely to embed it into kernel-level firmware. On the other, I may very well imagine a device such as a router or a web kiosk where this NULL pointer dereference could be triggered, because internalization and text manipulation is always a complex issue.

Linux Kernel.

I had this idea of trying to build LLVMLinux with clang for quite a while, but never really had the time to do it. My main interest in doing so was using the clang static analyzer.

Currently, some parts of Linux Kernel fail to build with clang, so I had to use the patches from the LLVMLinux project. They failed to apply cleanly though. I had to manually edit several patches. Another problem is that "git am" does not support the "fuzzy" strategy when applying the patches, so I had to use a certain script found on GitHub which uses "patch" and "git commit" to do the same thing.
https://gist.github.com/kfish/7425248

I have pushed my tree to github. I based it off the latest upstream at the time when I worked on it (which was the start of February 2016). The branch is called "4.5_with_llvmlinux".
https://github.com/astarasikov/linux/tree/4.5_with_llvmlinux

I've used the following commands to get the analysis results. Note that some files have failed to compile, and I had to manually stop the compilation job for one file that took over 40 minutes.

export CCC_CXX=clang++
export CCC_CC=clang

scan-build make CC=clang HOSTCC=clang -j10 -k

Ideas.

Porting clang instrumentations to major OpenSource projects.

Clang has a number of instrumentation plugins called "Sanitizers" which were largely developed by Konstantin Serebryany and Dmitry Vyukov at Google.

Arguably the most useful tool for C code is AddressSanitizer which allows to catch Use-After-Free and Out-of-Bounds access on arrays. There exists a port of the tool for the Linux Kernel, called Kernel AddressSanitizer, or KASAN, and it has been used to uncover a lot of memory corruption bugs, leading to potential vulnerabilities or kernel panics.

Another tool, also based on compile-time instrumentation, is the ThreadSanitizer for catching data races, and it has also been ported to the Linux Kernel.

https://github.com/google/kasan/wiki

https://github.com/google/kasan/wiki/Found-Bugs

I think it would be very useful to port these tools to the other projects. There are a lot of system-level software driving the critical aspects of system initialization process. To name a few:

EDK2 UEFI Development Kit
LittleKernel bootloader by Travis Geiselbrecht. It has been extensively used in the embedded world instead of u-boot lately. Qualcomm (and Codeaurora, its Open-Source department) is using a customized version of LK for its SoCs, and nearly every mobile phone with Android has LK inside it. NVIDIA have recently started shipping a version of LK called "TLK" or "Trusted Little Kernel" in its K1 processors, with the largest deployment yet being the Google Nexus 9 tablet.
U-boot bootloader, which is still used in a lot of devices.
FreeBSD and other kernels. While I don't know how widely they are deployed today, it would still be useful, at least for junior and intermediate kernel hackers.
XNU Kernel powering Apple iOS and OS X. I have some feeling that Apple might be using some of the tools internally, though the public snapshots of the source are heavily stripped of newest features.

Btw, if anyone is struggling at getting userspace AddressSanitizer working with NVIDIA drivers, take a look at this thread where I posted a workaround. Long story short, NVIDIA mmaps memory at large chunks up to a certain range, and you need to change the virtual address which ASan claims for itself.

http://marc.info/?l=llvm-dev&m=141168078303757&w=2
http://marc.info/?l=llvm-dev&m=141169586008101&w=2

Note that you will likely get problems if you build a library with a customized ASan and link it to something with unpatched ASan, so I recommend using a distro where you can rebuild all packages simultaneously if you want to experiment with custom instrumentation, such as NixOS, Gentoo or *BSD.

As I have mentioned before, there is a caveat with all these new instrumentation tools - most of them are made for the 64-bit architecture while the majority of the embedded code stays 32-bit. There are several ways to address it, such as running memory checkers in a separate process or using a simulator such as a modified QEMU, but it's an open problem.

Other techniques for firmware quality

In one of my previous posts I have drawn attention to gathering code coverage in low-level software, such as bootloaders. With GCOV being quite easy to port, I think more projects should be exposed to it. Also, AddressSanitizer now comes with a custom infrastructure for gathering code coverage, which is more extensive than GCOV.

Here's a link to the post which shows how one can easily embed GCOV to get coverage using Little Kernel as an example.

http://allsoftwaresucks.blogspot.ru/2015/05/gcov-is-amazing-yet-undocumented.html

While userspace applications have received quite a lot of fuzzing recently, especially with the introduction of the AFL fuzzer tool, kernel-side received little attention. For Linux, there is a syscall fuzzer called Trinity.

http://codemonkey.org.uk/projects/trinity/

https://github.com/kernelslacker/trinity

There is also an interesting project at fuzzing kernel through USB stack (vUSBf).

https://github.com/schumilo/vUSBf

https://www.blackhat.com/docs/eu-14/materials/eu-14-Schumilo-Dont-Trust-Your-USB-How-To-Find-Bugs-In-USB-Device-Drivers.pdf

What should be done is adapting this techniques for other firmware projects. On the one hand, it might get tricky due to the limited debugging facilities available in some kernels. On the other one, the capabilities provided by simulators such as QEMU are virtually unlimited (pun unintended).

An interesting observation might be that there are limited sources of external data input on a system. They include processor interrupt vectors/handlers and MMIO hardware. As for the latter, in Linux and most other firmwares, there are certain facilities for working with MMIO - such as functions like "readb()", "readw()" and "ioremap". Also, if we're speaking of a simulator such as QEMU, we can identify memory regions of interest by walking the page table and checking the physical address against external devices, and also checking the access type bits - for example, the data marked as executable is more likely to be the firmware code, while the uncached contiguous regions of memory are likely DMA windows.

ARM mbed TLS library is a good example of a project that tries to integrate as many dynamic and static tools into its testing process. However, it can be built as a userspace library on desktop, which makes it less interesting in the context of Firmware security.

https://blog.gdssecurity.com/labs/2015/9/21/fuzzing-the-mbed-tls-library.html

https://tls.mbed.org/kb/generic/what-tests-and-checks-are-run-for-mbedtls

https://www.mbed.com/en/development/software/mbed-tls/

Another technique that has been catching my attention for a lot of time is symbolic execution. In many ways, it is a similar problem to the static analysis - you need to find a set of input values constrained by certain equations to trigger a specific execution path leading to the incorrect state (say, a NULL pointer dereference).

Rumors are, a tool based on this technique called SAGE is actively used at Microsoft Research to analyze internal code, but sadly there are not so many open-source and well-documented tools one can easily play with and directly plug into an existing project.

An interesting example of applying this idea to the practical problem at a large and established corporation (Intel) is presented in the paper called "Symbolic execution for BIOS security" which tries to utilize symbolic execution with KLEE for attacking firmware - the SMM handler. You can find more interesting details in the blog of one of the authors of the paper - Vincent Zimmer (https://twitter.com/vincentzimmer and http://vzimmer.blogspot.com/).

Also, not directly related, but an interesting presentation about bug hunting on Android.

http://www.slideshare.net/jiahongfang5/qualcomm2015-jfang-nforest

I guess now I will have to focus on studying recent exploitation techniques used for hacking XNU, Linux and Chromium to have a more clear picture of what's needed to achieve :)

Further developing clang analyzer.

One feature missing from clang is the ability to perform the analysis across the translation units. A Translation Unit (or TU) in clang/LLVM roughly represents a file on the disk being parsed. Clang analyzer is implemented as a pass which traverses the AST and is limited to the one translation unit. Which is not quite true - it will go and analyze the includes recursively. But if you have two separate "C" files, which do not include one another, and a function from one file is calling the function from another one, the analysis will not work.

Implementing a checker across multiple sources is not a trivial thing, as pointed out by the analyzer authors in the mailing list, though this topic is often brought up by different people.

http://lists.llvm.org/pipermail/cfe-dev/2013-January/027234.html

http://lists.llvm.org/pipermail/cfe-dev/2016-January/046905.html

Often, it is possible to come up with a workarounds, especially if one aims at implementing ad-hoc checks for their project. The simplest one would be creating a top-level "umbrella" C file which would include all files with implementations of the functions. I have seen some projects do exactly this. The most obvious shortcomings of this approach is that it will require redesigning the whole structure of the project, and will not work if some of the translation units need custom C compiler options.

Another option would be to dump/serialize the AST and any additional information during the compilation of each TU and process it after the whole project is built. It looks like this approach has been proposed multiple times on the mailing list, and there exist at least one paper which claims doing that.

"Towards Vulnerability Discovery Using Extended Compile-time Analysis" - http://arxiv.org/pdf/1508.04627.pdf

In fact, the analyzer part itself might very well be independent of the actual parser, and could be reused, for example, to analyze the data obtained by the disassembly, but it's a topic for another research area.

Developing ad-hoc analyzers for certain projects.

While it is very difficult to statically rule out many kinds of errors in an arbitrary project, either due to state explosion problem or the dynamic nature of the code, quite often we can design tools to verify some contracts specific to a certain project.

An example of such tool would be a tool called "sparse" from the Linux Kernel. It effectively works as a parser for the C code and can be made to run on every C file compiled by the GCC while building the kernel. It allows to specify some annotations to the declarations in the C code. It works similar to how attributes were implemented in GCC and Clang later.

https://en.wikipedia.org/wiki/Sparse.

http://linuxwireless.org/en/developers/Documentation/using-sparse/

One notable example of the code in the Linux Kernel, which deals with passing void pointers around and relying on pointer trickery via the "container_of" macro is the workqueue library.

While working on the FreeBSD kernel during the GSoC in 2014, I faced a similar problem while developing device drivers - at certain places pointers were cast to void, and casted back to typed ones where needed. Certainly it's easy to make a mistake while coding these things.

Now, if we dump enough information during compilation, we can implement advanced checks. For example, when facing a function pointer which is initialized dynamically, we can do two things. First, find all places where it can potentially be initialized. Second, find all functions with a matching prototype. While checking all of them might be time consuming and generate false positives, it will also allow to check more code statically at compilation time.

A notable source of problems when working with C code is that linking stage is traditionally separated from the compilation stage. The linker usually manipulates the abstract "symbols" which are just void pointers. Even though it could be possible to store enough information about the types in a section of the ELF (in fact, DWARF debugging data contains information about the types) and use it to type-check symbols when linking, it's not usually done.

It leads to certain funky problems. For example, aliases (weak aliases) are a linker-time feature. If one defines an alias to some function, where the type of the alias does not match the type of the original function, the compiler cannot check it (well, it could if someone wrote a checker, but it does not), and you will get a silent memory corruption at runtime. I once ran into this issue when porting the RUMP library which ships a custom LIBC, and our C library had different size for "off_t".

Refactoring

There are two ideas I had floating around for quite a while.

Microkernelizing Linux.

An interesting research problem would be coming up with a tool to automatically convert/rewrite linux-kernel code into a microkernel with device drivers and other complex pieces of code staying in separate protection domains (processes).

It is interesting for several reasons. One is that a lot of microkernels, such as L4, rely on DDE-Kits to run pieces of Linux and other systems (such as NetBSD in case of RUMP) to provide drivers. Another is that there's a lot of tested code with large functionality, which could possibly made more secure by minimizing the impact of memory corruption.

Besides obvious performance concerns there are a lot of research questions to this.

Converting access to global variables to IPC accesses. Most challenging part would be dealing with function pointers and callbacks.
Parsing KConfig files and "#ifdef" statements to ensure all conditional compilation cases are covered when refactoring. This in itself is a huge problem for every C codebase - if you change something in one branch of an "#ifdef" statement, you cannot guarantee you didn't break it for another branch. To get whole coverage, it would be useful to come up with some automated way to ensure all configurations of "#ifdef" are built.
Deciding which pieces of code should be linked statically and reside in the same process. For example, one might want to make sound subsystem, drm and vfs to run as separate processes, but going as far as having each TU converted to a process would be an overkill and a performance disaster.

Translating C code to another language. Not really sure if it could be really useful. It is likely that any complex code involving pointers and arrays would require a language with similar capabilities as a target (if we're speaking of generating a useful and readable idiomatic code, and not just a translator such as emscripten). Therefore, the target language might very well have the same areas with unspecified behavior. Some people have proposed for a stricter and a more well-defined dialect of C.

One may note that it is not necessary to use clang for any of these tasks. In fact, one can get away with writing a custom parser or hacking GCC. These options are perfectly legal. I had some of these ideas floating around for at least three years, but back then I didn't have skills to build GCC and hack on it, and now I've been mostly using clang, but it's not a big deal.

Conclusion

Overall, neither improving the compiler nor switching to another language alone would not save us from errors. What's often overlooked is the process of the continuous integration, with regression tests to show what became broken, with gathering coverage, with doing integration tests.

There are a lot of non-obvious parameters which are difficult to measure. For example, how to set up a regression test that would detect a software breakage due to a new compiler optimization.

Besides, we could possibly borrow a lot of practices from the HDL/Digital Design industry. Besides coverage, an interesting idea is simulating the same design in different languages with different semantics to hope that if there are mistakes at the implementation stage, they are not at the same places, and testing will show where outputs of two systems diverge.

P.S.

Ping me if you're interested in working on the topics mentioned above :)

I hate software

Thursday, February 25, 2016

Notes on firmware and tools