Sunday, October 7, 2012

thoughts on modern operating systems

For a long time I've been messing with low-level code on various embedded hardware.
While I generally find linux kernel code an outstanding example of software architecture
and one of the cleanest projects implemented in the C language, there is
still room for improvement.

I'll try to summarize my impressions on developing for linux kernel (and other bare-metal
software including u-boot, lk, netbsd (the one of two BSDs which are not completely irrelevant ancient shit, btw. Second one is DragonflyBSD)). In this post I want to consider two aspects
of an operating system kernel implementation: architectural issued (which are language-
agnostic) and the choice of a programming language.

One point of argument is how modular the kernel should be. The two major models are the monolithic kernel (where all drivers are executing in the same virtual memory space and consequently there is no isolation between the access to resources between drivers) and the microkernel approach where
every driver is running as a process in a separate address space in user-mode.

The problems of the monolithic kernels
  • All drivers have the same access to all resources. Which means, if we don't trust the driver code (say, a closed-source driver), our entire system is compromised. Moreover, any error in the driver code is a potential root exploit because if we manage to smash stack or force the kernel code to jump to a usermode helper, we have all priviliges possible [KernelExploits].
  • If a driver code hangs, we cannot easily restart it because all the code is running inside the single process and shares the global variables. That is, even if we remove and insert the module, it does not guarantee that we start over with the clean state. Moreover, shared locks will in most cases prohibit us from doing it when things go bad.

On the other hand, microkernels try to solve the problem by the following security measures:
  • Running each process in the user-mode to prevent it peeking into other processes data by exploiting supervisor mode
  • The principle of least priviliges: only grant the access to the resources needed by the driver (IO, GPIO, IRQ)
  • Reducing TCB (trusted computing base) to the minimum amount of code
  • Using capabilities to control access to resources. Capabilities are, like, pointers to the desired resource, with the ownership controlled by the kernel (or, actually, by the policy manager which can comprise multiple entities inside and outside of kernel)
There exist various microkernels. Most commercially successful are the, of course, the QNX and Mach kernels. While QNX and Mach can be used in a 'pure' microkernel sense - that is, when nothing runs in kernel in suprevisor mode and has ultimate priviliges, most practical applications include interrupt routines and drivers in kernel (QNX allows it, and Max OS X kernel, XNU, relies heavily on this feature).

One of the most interesting projects are the L4 Fiasco.OC [FOC] kernel from TU Dresden and the Genode Framework from Genode labs.

Fiasco.OC is not particularly interesting per se (there are far more interesting projects like l4.verified from NICTA which is formally prooven), but it has quite a set of nice properties

  • It is opensource. Well, kind of. The development is done behind closed doors and it's a bit of pain to push a patch to upstream but at least we can download the source code and adapt it for our purposes
  • It has L4Re (L4 Runtime Environment) which is a set of libraries to ease porting software from conventional POSIX-compatible environments
  • It can be used to paravirtualize other kernels. Most notably, there exists the [L4Linux] project allowing one to run linux on top of l4 to allow seamless code reuse.
Genode Framework is a step aside from a multitude of attempts to write a successful microkernel capable of surviving in the cruel world. It can run on top of many kernels (L4 from TU Dresden, OKL4, linux) and even directly on bare-metal hardware and provides a unified capability-based security model on top of any kernel it uses. It has some nice features.

  • Supports ARM architecture and recent SoC (OMAP4). This means we can use it for paravirtualizing, say, Android on armv7 CPUs with no hardware virtualization support. That's important because ARM does not have such a choice of virtualization solutions like, say, x86.
  • Implemented in C++ using templates. Enforcing type checks at compile time reduces the amount of programming errors
  • Has a nice RPC/IDL implementation in C++. Programming multi-server software has never been that easy.

Unfortunately, there's a heckuva lot of work ahead if we want to use Genode for world domination. Here are some ideas I have about improving the current state of Genode framework and I do hope I'll get a chance to work on some of these things eventually. Most of the troubles are caused by the poor hardware support.

  • Lack of device drivers. What should be done is to write abstractions (interfaces and sample implementations) for various hardware classes - voltage regulators, MMC cards, GPIO controllers et cetera. Like in linux, but using kernelization (decomposing software into standalone servers communicating via a trusted RPC guarded by security tokens - capabilities) for improving security and C++ or a custom DSL for enforcing type checks and preventing various programming errors. Ideally, I would love that we could enforce coarse-grained access policies for every resource - be that a GPIO pin or an I2C device register.
  • No power management. I mean, at all. Not even DVFS (Dynamic Voltage/Frequency Scaling). No suspend-to-ram support. Which quite limits the use of the framework on mobile devices.
  • Poor display management support. I propose that nitpicker or some other display server should be extended to allow hotplug, runtime resolution switching and l4linux/genode framebuffer driver should be completed to allow the use of all the complex features of modern display adapters - blitting, rotation, overlays and what not.
  • Proper fault handling. I think some kind of a system (call it a paradigm, a design pattern or whatever) should be implemented to gracefully restart services in the case of failure (it is indeed very difficult to decide what to do when a driver fails). There should probably be some abstract policy class with various implementation. One implementation (for example, for the user desktop) would try to restart failing tasks. The other one (say, for trusted computing environments) would kill the world if any single process fails.
  • Some nice UI should be written for easily switching and creating new virtualized linux instances.

One huge security hole worth mentioning is hardware. Not only it's very hard to verify the lack of backdoors in hardware (a modern smartphone actually has more than 20 CPUs of various architectures running closed-source code) but some hardware features can be used to circumvent software security measures.

The most famous kind of hardware-assisted attacks are the DMA-based attacks. In fact, in older systems, a DMA controller typically had an unrestricted access to all physical memory making it possible for a malicious driver to control the whole system. Moreover, some protocols, like ieee1394 (FireWire) and PCI by design allowed arbitrary device to access all system memory. There have been various exploits to circumvent desktop operating systems protection (like, [WinLockPwn]). Moreover, the same techniques have been successfully used to crack the security and hypervisors of various gaming consoles as shown in [Console] and [StateOfTheHack]. One solution to this is using IOMMU which is a virtual memory translation buffer between the device and the IO bus. Besides the ability to define protection domains on a per-page basis it also allows using discontiguous memory regions with the devices that don't support it (like, most of camera ISPs (image signal processor) or GPUs) and remapping hardware memory into arbitrary process (as an example, this allows you to pass through your VGA card into a paravirtualized instance of linux on Xen)

The other flame-provoking area is the choice of a programming language for implementing
the operating system. Historically most kernels, including Linux, *BSD and to some extent
Windows NT, were written in plain C, while most of the others.

To be honest, I'm sick to death of programming in the C language. While C is often called the 'cross-platform assembly' and you can always understand what your code will get translated into, it offers a multitude of ways to shot yourself in feet and I have some ideas about what a language for kernel-level programming should look like. I'll divide the ideas into two sides: the ones will focus on improving the language in a non-intrusive way, without breaking semantics and code efficiency. The others will discuss the alternative approaches to system software construction.

First off, let's discuss common problems of programming in C and ideas on improvement. Here's what's been bothering me daily since I started my journey into the world of embedded development.

  • Uninitialized memory. Seriously, this is the cause of 90% of misbehaving software. Just enable the corresponding flag if your compiler supports it. I wish this were always enforced by the C compiler. The performance penalty is negligible im most cases. When one needs uninitialized data or it is impossible to store the initial data values in the BSS or clear the buffer at runtime, a separate language construct should be used so that it is possible to identify all such places at compile time.
  • Buffer Overflows. This is one huge security issue. In most cases it is caused by the errors in handling null-terminated strings (whoever thinks they're a good idea is probably a cruel genius hoping to have a backdoor everywhere). One possible way to work around would be to use typed wrappers for arrays - a data structure containing array pointer and element count with accessors and mutators checking for out-of-range errors. Like in Java, but this can be perfectly well simulated using C++ templates or less comfortably with C macros.
  • Null pointer dereferences. All pointers should be checked for validity by the compiler. Besides, casting pointers to integers should be banned, and IO buffers should also be implemented within the compiler so that a programmer does not mess manually with memory.
  • Lack of both OOP classes, Haskell type classes and functional-style pattern matching. While combining all of these features usually makes the language very complicated (like Ocaml), the lack of any of them makes programming a constant exercise in reinventing the compiler with the help of fugly (or beautiful) macros like container_of.
  • (C++) it would be nice to be able to define abstract class implementations at compile-time without vtbl. To some extent you can get away with using templates for substituting types in class definition. C++ templates in generally is a powerful system similiar to hindley-millner type inference systems in ML or Haskell, use it wisely :) 
  • Stupid breaking up code into the header and the 'implementation'. This is especially tiring in C++, because you have to jump between the header and the source for any change. I'm not even going deep into discussing other issues. Like, most C++ compilers being unable to handle external templates.


Now, let's consider alternative approaches, their strong and weak points. Here are some ideas that we could borrow from the userland world.

  • Garbage Collection. This alone can save a lot of memory leaks and debugging efforts. While it is true that GC activity is quite hard to predict and there's been a (false) belief that GC and realtime systems are incompatible, it should be noted that manual memory management also suffers from VM thrashing and malloc() calls can consume arbitrary amount of time. Therefore, in some cases where performance and reliability are the question of life and death, dynamic memory allocation is not used at all and all memory buffers are statically defined at compile time.
  • Runtime type introspection (reflection). It is very useful to be able to query object state without writing tons of macro boilerplate. While we can put the task of type-checking on the compiler, sometimes we need to interact with remote code and userspace (e.g., sysfs) and then RTTI comes handy. For example, most userlands use some kind of a typed message bus (dbus, p9fs) for IPC (inter-process communication) which we could plug directly into the kernel public interface


  • Monitor primitive for atomic/critical sections to avoid manual locking control. In fact, I want something like the 'synchronized' primitive in Java or 'lock' C#. Besides, it would be very helpful if the monitor knew the execution context (interrupt, bottom part handlers) and use the sleeping or non-sleeping locking accordingly (spinlocks vs mutexes). Like: "atomic [x <- readBus(); waitBus(x & ~1)]


There exist multiple projects aiming to exploit the advantages of very high-level programming languages to improve the quality of system software. To name just a few most interesting for me:

  • .Net Micro framework by Microsoft [DotNetMicro] which runs on bare hardware. It requires just around 64K RAM. While this is a huge waste of computational power for a tiny controller like Atmega 48, it is still smaller than most fully-featured kernels like Linux, Windows NT or hypervisors like Xen or L4. The nice thing about this project is that it is open-source and supports a lot of common hardware (pardon my ignorance, but the last time I checked 5 years ago it supported the Intel/Marvell PXA SoC which was still quite popular back then)
  • Singularity OS by Microsoft. In fact, it is similiar to .Net Micro in that it uses the dialect of the C# programming languages, but it also relies on the Ahead-Of-Time (AOT) precompilation to native code.
  • House is an operating system implemented in the Haskell programming language [HouseOs]. This is quite interesting because Haskell is a strongly-typed language with type inference which makes it ideal for writing large-scale complex systems. Unfortunately it seems the development rate has slowed down lately and hardware support is limited to a subset of x86-compatible PCs
  • [SqueakNOS] is an operating system built around the Squeak Smalltalk environment running on bare metal, with everything down to interrupt handling done in Smalltalk. Smalltalk is a very nice object-oriented language (in spite of the bad fame OOP has gained recently due to the perverted nature of C++ and Java) built around the signal-slot calling mechanism.

Further reading:
[KernelExploits] Writing kernel exploits http://ugcs.net/~keegan/talks/kernel-exploit/talk.pdf

[FOC] TU Dresden Fiasco.OC L4 kernel website http://os.inf.tu-dresden.de/fiasco/overview.html
[L4Linux] TU Dresden L4Linux website http://os.inf.tu-dresden.de/L4/LinuxOnL4/
[Genode] Genode Framework official website http://genode.org/

[WinLockPwn] Adventures with Daisy in Thunderbolt-DMA-land: Hacking Macs through the Thunderbolt interface http://www.breaknenter.org/tag/winlockpwn/
[Console] Modern Game Console Exploitation http://www.cs.arizona.edu/~collberg/Teaching/466-566/2012/Resources/presentations/2012/topic1-final/report.pdf
[StateOfTheHack] All your consoles are belong to us http://delroth.net/state-of-the-hack.pdf

[DotNetMicro] Microsoft .NET Micro Framework http://www.netmf.com/
[Singularity] Microsoft Singularity OS http://research.microsoft.com/en-us/projects/singularity/
[HouseOs] House Operating System website http://programatica.cs.pdx.edu/House/
[SqueakNos] SqueakNos - Smalltalk on bare hardware http://squeaknos.blogspot.com/