Sunday, October 7, 2012

thoughts on modern operating systems

For a long time I've been messing with low-level code on various embedded hardware.
While I generally find linux kernel code an outstanding example of software architecture
and one of the cleanest projects implemented in the C language, there is
still room for improvement.

I'll try to summarize my impressions on developing for linux kernel (and other bare-metal
software including u-boot, lk, netbsd (the one of two BSDs which are not completely irrelevant ancient shit, btw. Second one is DragonflyBSD)). In this post I want to consider two aspects
of an operating system kernel implementation: architectural issued (which are language-
agnostic) and the choice of a programming language.

One point of argument is how modular the kernel should be. The two major models are the monolithic kernel (where all drivers are executing in the same virtual memory space and consequently there is no isolation between the access to resources between drivers) and the microkernel approach where
every driver is running as a process in a separate address space in user-mode.

The problems of the monolithic kernels
  • All drivers have the same access to all resources. Which means, if we don't trust the driver code (say, a closed-source driver), our entire system is compromised. Moreover, any error in the driver code is a potential root exploit because if we manage to smash stack or force the kernel code to jump to a usermode helper, we have all priviliges possible [KernelExploits].
  • If a driver code hangs, we cannot easily restart it because all the code is running inside the single process and shares the global variables. That is, even if we remove and insert the module, it does not guarantee that we start over with the clean state. Moreover, shared locks will in most cases prohibit us from doing it when things go bad.

On the other hand, microkernels try to solve the problem by the following security measures:
  • Running each process in the user-mode to prevent it peeking into other processes data by exploiting supervisor mode
  • The principle of least priviliges: only grant the access to the resources needed by the driver (IO, GPIO, IRQ)
  • Reducing TCB (trusted computing base) to the minimum amount of code
  • Using capabilities to control access to resources. Capabilities are, like, pointers to the desired resource, with the ownership controlled by the kernel (or, actually, by the policy manager which can comprise multiple entities inside and outside of kernel)
There exist various microkernels. Most commercially successful are the, of course, the QNX and Mach kernels. While QNX and Mach can be used in a 'pure' microkernel sense - that is, when nothing runs in kernel in suprevisor mode and has ultimate priviliges, most practical applications include interrupt routines and drivers in kernel (QNX allows it, and Max OS X kernel, XNU, relies heavily on this feature).

One of the most interesting projects are the L4 Fiasco.OC [FOC] kernel from TU Dresden and the Genode Framework from Genode labs.

Fiasco.OC is not particularly interesting per se (there are far more interesting projects like l4.verified from NICTA which is formally prooven), but it has quite a set of nice properties

  • It is opensource. Well, kind of. The development is done behind closed doors and it's a bit of pain to push a patch to upstream but at least we can download the source code and adapt it for our purposes
  • It has L4Re (L4 Runtime Environment) which is a set of libraries to ease porting software from conventional POSIX-compatible environments
  • It can be used to paravirtualize other kernels. Most notably, there exists the [L4Linux] project allowing one to run linux on top of l4 to allow seamless code reuse.
Genode Framework is a step aside from a multitude of attempts to write a successful microkernel capable of surviving in the cruel world. It can run on top of many kernels (L4 from TU Dresden, OKL4, linux) and even directly on bare-metal hardware and provides a unified capability-based security model on top of any kernel it uses. It has some nice features.

  • Supports ARM architecture and recent SoC (OMAP4). This means we can use it for paravirtualizing, say, Android on armv7 CPUs with no hardware virtualization support. That's important because ARM does not have such a choice of virtualization solutions like, say, x86.
  • Implemented in C++ using templates. Enforcing type checks at compile time reduces the amount of programming errors
  • Has a nice RPC/IDL implementation in C++. Programming multi-server software has never been that easy.

Unfortunately, there's a heckuva lot of work ahead if we want to use Genode for world domination. Here are some ideas I have about improving the current state of Genode framework and I do hope I'll get a chance to work on some of these things eventually. Most of the troubles are caused by the poor hardware support.

  • Lack of device drivers. What should be done is to write abstractions (interfaces and sample implementations) for various hardware classes - voltage regulators, MMC cards, GPIO controllers et cetera. Like in linux, but using kernelization (decomposing software into standalone servers communicating via a trusted RPC guarded by security tokens - capabilities) for improving security and C++ or a custom DSL for enforcing type checks and preventing various programming errors. Ideally, I would love that we could enforce coarse-grained access policies for every resource - be that a GPIO pin or an I2C device register.
  • No power management. I mean, at all. Not even DVFS (Dynamic Voltage/Frequency Scaling). No suspend-to-ram support. Which quite limits the use of the framework on mobile devices.
  • Poor display management support. I propose that nitpicker or some other display server should be extended to allow hotplug, runtime resolution switching and l4linux/genode framebuffer driver should be completed to allow the use of all the complex features of modern display adapters - blitting, rotation, overlays and what not.
  • Proper fault handling. I think some kind of a system (call it a paradigm, a design pattern or whatever) should be implemented to gracefully restart services in the case of failure (it is indeed very difficult to decide what to do when a driver fails). There should probably be some abstract policy class with various implementation. One implementation (for example, for the user desktop) would try to restart failing tasks. The other one (say, for trusted computing environments) would kill the world if any single process fails.
  • Some nice UI should be written for easily switching and creating new virtualized linux instances.

One huge security hole worth mentioning is hardware. Not only it's very hard to verify the lack of backdoors in hardware (a modern smartphone actually has more than 20 CPUs of various architectures running closed-source code) but some hardware features can be used to circumvent software security measures.

The most famous kind of hardware-assisted attacks are the DMA-based attacks. In fact, in older systems, a DMA controller typically had an unrestricted access to all physical memory making it possible for a malicious driver to control the whole system. Moreover, some protocols, like ieee1394 (FireWire) and PCI by design allowed arbitrary device to access all system memory. There have been various exploits to circumvent desktop operating systems protection (like, [WinLockPwn]). Moreover, the same techniques have been successfully used to crack the security and hypervisors of various gaming consoles as shown in [Console] and [StateOfTheHack]. One solution to this is using IOMMU which is a virtual memory translation buffer between the device and the IO bus. Besides the ability to define protection domains on a per-page basis it also allows using discontiguous memory regions with the devices that don't support it (like, most of camera ISPs (image signal processor) or GPUs) and remapping hardware memory into arbitrary process (as an example, this allows you to pass through your VGA card into a paravirtualized instance of linux on Xen)

The other flame-provoking area is the choice of a programming language for implementing
the operating system. Historically most kernels, including Linux, *BSD and to some extent
Windows NT, were written in plain C, while most of the others.

To be honest, I'm sick to death of programming in the C language. While C is often called the 'cross-platform assembly' and you can always understand what your code will get translated into, it offers a multitude of ways to shot yourself in feet and I have some ideas about what a language for kernel-level programming should look like. I'll divide the ideas into two sides: the ones will focus on improving the language in a non-intrusive way, without breaking semantics and code efficiency. The others will discuss the alternative approaches to system software construction.

First off, let's discuss common problems of programming in C and ideas on improvement. Here's what's been bothering me daily since I started my journey into the world of embedded development.

  • Uninitialized memory. Seriously, this is the cause of 90% of misbehaving software. Just enable the corresponding flag if your compiler supports it. I wish this were always enforced by the C compiler. The performance penalty is negligible im most cases. When one needs uninitialized data or it is impossible to store the initial data values in the BSS or clear the buffer at runtime, a separate language construct should be used so that it is possible to identify all such places at compile time.
  • Buffer Overflows. This is one huge security issue. In most cases it is caused by the errors in handling null-terminated strings (whoever thinks they're a good idea is probably a cruel genius hoping to have a backdoor everywhere). One possible way to work around would be to use typed wrappers for arrays - a data structure containing array pointer and element count with accessors and mutators checking for out-of-range errors. Like in Java, but this can be perfectly well simulated using C++ templates or less comfortably with C macros.
  • Null pointer dereferences. All pointers should be checked for validity by the compiler. Besides, casting pointers to integers should be banned, and IO buffers should also be implemented within the compiler so that a programmer does not mess manually with memory.
  • Lack of both OOP classes, Haskell type classes and functional-style pattern matching. While combining all of these features usually makes the language very complicated (like Ocaml), the lack of any of them makes programming a constant exercise in reinventing the compiler with the help of fugly (or beautiful) macros like container_of.
  • (C++) it would be nice to be able to define abstract class implementations at compile-time without vtbl. To some extent you can get away with using templates for substituting types in class definition. C++ templates in generally is a powerful system similiar to hindley-millner type inference systems in ML or Haskell, use it wisely :) 
  • Stupid breaking up code into the header and the 'implementation'. This is especially tiring in C++, because you have to jump between the header and the source for any change. I'm not even going deep into discussing other issues. Like, most C++ compilers being unable to handle external templates.


Now, let's consider alternative approaches, their strong and weak points. Here are some ideas that we could borrow from the userland world.

  • Garbage Collection. This alone can save a lot of memory leaks and debugging efforts. While it is true that GC activity is quite hard to predict and there's been a (false) belief that GC and realtime systems are incompatible, it should be noted that manual memory management also suffers from VM thrashing and malloc() calls can consume arbitrary amount of time. Therefore, in some cases where performance and reliability are the question of life and death, dynamic memory allocation is not used at all and all memory buffers are statically defined at compile time.
  • Runtime type introspection (reflection). It is very useful to be able to query object state without writing tons of macro boilerplate. While we can put the task of type-checking on the compiler, sometimes we need to interact with remote code and userspace (e.g., sysfs) and then RTTI comes handy. For example, most userlands use some kind of a typed message bus (dbus, p9fs) for IPC (inter-process communication) which we could plug directly into the kernel public interface


  • Monitor primitive for atomic/critical sections to avoid manual locking control. In fact, I want something like the 'synchronized' primitive in Java or 'lock' C#. Besides, it would be very helpful if the monitor knew the execution context (interrupt, bottom part handlers) and use the sleeping or non-sleeping locking accordingly (spinlocks vs mutexes). Like: "atomic [x <- readBus(); waitBus(x & ~1)]


There exist multiple projects aiming to exploit the advantages of very high-level programming languages to improve the quality of system software. To name just a few most interesting for me:

  • .Net Micro framework by Microsoft [DotNetMicro] which runs on bare hardware. It requires just around 64K RAM. While this is a huge waste of computational power for a tiny controller like Atmega 48, it is still smaller than most fully-featured kernels like Linux, Windows NT or hypervisors like Xen or L4. The nice thing about this project is that it is open-source and supports a lot of common hardware (pardon my ignorance, but the last time I checked 5 years ago it supported the Intel/Marvell PXA SoC which was still quite popular back then)
  • Singularity OS by Microsoft. In fact, it is similiar to .Net Micro in that it uses the dialect of the C# programming languages, but it also relies on the Ahead-Of-Time (AOT) precompilation to native code.
  • House is an operating system implemented in the Haskell programming language [HouseOs]. This is quite interesting because Haskell is a strongly-typed language with type inference which makes it ideal for writing large-scale complex systems. Unfortunately it seems the development rate has slowed down lately and hardware support is limited to a subset of x86-compatible PCs
  • [SqueakNOS] is an operating system built around the Squeak Smalltalk environment running on bare metal, with everything down to interrupt handling done in Smalltalk. Smalltalk is a very nice object-oriented language (in spite of the bad fame OOP has gained recently due to the perverted nature of C++ and Java) built around the signal-slot calling mechanism.

Further reading:
[KernelExploits] Writing kernel exploits http://ugcs.net/~keegan/talks/kernel-exploit/talk.pdf

[FOC] TU Dresden Fiasco.OC L4 kernel website http://os.inf.tu-dresden.de/fiasco/overview.html
[L4Linux] TU Dresden L4Linux website http://os.inf.tu-dresden.de/L4/LinuxOnL4/
[Genode] Genode Framework official website http://genode.org/

[WinLockPwn] Adventures with Daisy in Thunderbolt-DMA-land: Hacking Macs through the Thunderbolt interface http://www.breaknenter.org/tag/winlockpwn/
[Console] Modern Game Console Exploitation http://www.cs.arizona.edu/~collberg/Teaching/466-566/2012/Resources/presentations/2012/topic1-final/report.pdf
[StateOfTheHack] All your consoles are belong to us http://delroth.net/state-of-the-hack.pdf

[DotNetMicro] Microsoft .NET Micro Framework http://www.netmf.com/
[Singularity] Microsoft Singularity OS http://research.microsoft.com/en-us/projects/singularity/
[HouseOs] House Operating System website http://programatica.cs.pdx.edu/House/
[SqueakNos] SqueakNos - Smalltalk on bare hardware http://squeaknos.blogspot.com/

Sunday, September 30, 2012

Doing it wrong: application preferences

One thing that's been always bothering me is the lack of a unified way to keep persistent settings.

When you're installing a shiny new piece of software, you never know which surprises it brings. The location and syntax of the configuration files varies between applications and environments. Let's take a look at the most commonly used persistent storage formats and analyze their strong and weak points. Note that I only want to discuss the issue only in the aspect of desktop usage, where usability and maintainability should not be sacrificed over performance as we do in the embedded world. Also, note that I'm mainly focusing on the formats used in the desktop *NIX distributions because that's what I'm mostly using.

One key difference between many formats is whether they allow text data with arbitrary formatting and spacing or have a fixed binary layout. The former is obviously human-readable (and sometimes, editable), the latter can in most cases be easier parsed by the machine.

Plain text
Pro:

  • Human-readable

Contra:

  • No unified syntax (everyone invents their own)
  • No way to contol the correctness of the file contents (as opposed to xml schema for which there exists a multitude of tools and libraries)

XML
+

  • Can be edited by a human (to some extent)
  • Strict structure definition is a part of a standard (user can define custom types and limits for variable values)
  • Supported by many tools

-
  • Hard to read by a human
  • Inefficient to parse (though, better than a plain text without schema)

GConf is an XML-based configuration system for the GNOME and GTK applications. Basically, it inherits all the pluses and minuses of the XML.

JSON
+

  • Easy to read and edit by a human
  • Easy API in most libraries (we can think of it either as of a tree of objects or as hashmap of hashmaps of hashmaps...)

-

  • Lack of built-in schema checking (i.e., it has a formal syntax and we can check that a JSON is valid, but we cannot verify that the data structures are initialized with correct values)

Now, let's discuss binary formats.

Custom formats

Windows Registry
+

  • It's fast.

-

  • Most system-critical data is kept inside a single file (well, actually around 6 files in %windir%\System32\config). A single failure means the system is unable to boot.
  • All applications store their data inside a single database. If an application is running with 'Administrator' (something like root, but for retards), it can easily modify or read the data of other applications. This is a huge security issue.
  • Application activity is not logged (at least, by default). It means mischievous software leaves trails even after it is uninstalled which is especially true of shareware and other crapware.
  • Application and system registry are not isolated. You cannot just reinstall the system without keeping all preferences (like, keep the home dir but replace the rootfs in *NIX)
  • There are only several basic data types (DWORD, Text, Binary) and the user cannot introduce their own


DConf is a replacement for GConf from the GNOME project. The only real difference is using a binary format instead of XML. While it is arguably faster, it is completely incomprehensible to a human-being and when things go bad, I have no idea what to do.

BSON is a binary version of JSON.
+

  • It is faster
  • It consumes less space

-

  • It introduces custom types (like UUID) which means that any JSON can be converted to BSON, but not the other way round.


MessagePack is a binary serialization format that is aiming to be compatible with BSON while being optimized for network usage
+

  • It offers high compression ratio reducing network traffic
  • Comes with the IDL/serialization code for most mainstream languages (C++, ruby, python)
  • Offers type checking when used with statically typed languages
  • Comes with an RPC implementation which can be useful for network applications


-
  • Performance can theoretically be a bit lower than BSON due to bit-packing 
There of course exists a multitude of other config formats, but the most used on a linux destkop are plain text and GConf. Therefore, I'm lazy to describe every other thing under the sun.


Now, here is what I think application developers should do

  • Use glib/qt4/java built-in APIs for keeping preferences. Don't reinvent the wheel
  • Don't break the config format with time (ideally, you don't know anything about the config format if you use the existing API)
  • Make it easy for the users to backup the configuration data
Ideally, I would like that all apps (at least, all the free and open-source apps on linux) would use the same API for keeping preferences so that the system administrator can choose various backends, be it xml, json, sql or what not without modifying the applications.

Tuesday, September 4, 2012

Why I adore Java NOT.

Ok, here are some of my thoughts about Java as both a language and a development platform.

TL;DR: when all you have is a hammer, everything looks like a nail. Don't try shoving java up every task. Even better, don't use it at all.

Let me warn you: I am fully aware that Java was created by enterprise consultants to allow building enterprise bullshit quickly and to make sure you don't have to make efforts to support your crap once written. I fully recognize the importance of Java language as such, but any claims to use Java as a general-purpose language make me cry.

The language and the JVM:
Ok, first of all, Java language is very dull and unpleasant to program in. And it sucks hard as an OO language.

1. It has both objectified and non-objectified POD. That is, it has both int and Integer. And this leads to several interesting consequences. You can't just call a toString() method on an integer. like "1.toString()". Or, well, here's a better example. Try it out yourself in java or BeanShell


$ cat Foo.java 
class Foo {
static Integer a = 9999;
static Integer b = 9999;

static Integer c = 9;
static Integer d = 9;

public static void main(String args[]) {
System.out.println(a == b);
System.out.println(c == d);
}
}

$ java Foo
false
true


Do you find it logical? Well, if you've been programming in java for a while, you may be braindamaged enough to think it's OK, but it is not. In fact, the behaviour of a complex system must be predictable without reading through all the dark corners of documentation. (for those who are still living in the bliss of ignorance: the behaviour observed is because JVM keeps cached copies of Integers in the range [-127, 128])

Ok, you may argue that it's a nice idea performance-wise, but it is not. Proper VMs and languages (CLR/C#, ocaml) have found better ways around this problem. For example, CLR allows to directly work with the raw untyped memory and all numeric types actually consume just the amount of bits needed to keep precision while all 'object' methods are implemented as syntactic sugar - i.e., compiler voodoo)

2. Generic types just suck. The reason is that in order to keep compatibility with the ancient crap, generics were introduced as a syntactic sugar. Once you compile your 'generic' code, type information is erased. Therefore, you lose type safety when using reflection - you can't determine the real type of a generic type variable if the concrete type is a class inheriting from multiple interfaces. Moreover, type erasure is the same precise reason why you can't create an array of generic types.

3. The language has no lambda closures. Really, creating tons of one-function interfaces like Runnable is just a pain. Even more pain is declaring an interface just to pass a single closure.

4. No on-stack memory allocation. This is a problem for applications using little amounts of memory, because all memory, even for locals, has to be heap-allocated. This raises several performance problems (of course, it ends up being placed in stack or registers, but there's no explicit way to state that).

5. There's no proper way to interact with native code and untyped data. For example, the already mentioned .Net has the p/invoke interface which allows to conveniently call into native libraries without modifying them, without writing a C stub calling into the VM, without additional marshalling overhead. And you even have the same flexibility when specifying structure layout as you would in C.

[StructLayout(LayoutKind.Explicit, Size=8)]
public struct Foo
{
    [FieldOffset(0)]
    public byte A;
    [FieldOffset(1)]
    public int B;
    [FieldOffset(5)]
    public short C;
    [FieldOffset(7)]
    public byte D;
}
[DllImport("Foo.dll", EntryPoint="TransmitToHell")]
public static extern int SendToLimbo(struct Foo foo);

6. No pointer arithmetic. C# does have that. While one may argue that it is unsafe, in a managed language running in an environment with virtual memory, it is possible to check boundary crossing and prevent typical C buffer overflow and stack smashing vulnerabilities while still delivering flexibility.

7. No type inference. Seriously, what the fuck is that?
CoolFoo coolFoo = new CoolFoo();
why couldn't I just write
let coolFoo = new CoolFoo();?
The compiler can clearly deduce the variable type here. And proper languages (ML, Haskell) can actually type-check the whole program without explicitely specifying types in most cases due to Hindley-Milner type inference.

8. Badly programmed standard library violating OO principles. Javatards like to claim their language is fully OO, but violate the principle multiple times. The most obvious cases:

  • Empty interfaces (like, Cloneable). Like, they ain't doing anything. Well, they should've not been doing anything, but then reflection voodoo comes in.
  • Unmutable mutable data. You all know that the final modifier makes a memory location immutable. But not all people know that using reflection you can obtain write access to it. Even standard streams (System.out, System.in) can be reassigned using System.setOut, System.setIn calls. R U MAD?

The aforementioned factors make programming in java a miserable exercise in futility when it comes to integrating with native code or writing algorithms that have to crunch large arrays of numbers (DSP). One interesting example is the Android platform where most high-level Java code is just a set around native libraries written mainly in C and DSP assembly extensions (NEON, SSE, AVX). The lack of built-in implementations of many DSP routines (convolution, FFT) and the buggy implementation of the RenderScript library make writing high-performance video processing applications an extremely daunting task.

Enterprise crap:
  • Counter-intuitive poorly documented frameworks. Indeed, do you know anyone who could figure out Spring or Hibernate without digging through tutorial blog posts?
  • XML, XML everywhere. Completely human-unfriendly and resource-consuming. There are better formats out there - JSON which is kind of human-readable and BSON/MsgPack when speed matters.
  • AbstractFactoryFactoryImpl and so on. Design patterns are a real cargo cult of java pseudoengineering

So, to sum up. If you want to set up yet another enterprise-level megaproject and drain all the money of a local enterprise, go ahead and use Java. But please don't make me have any business with java ever again.

Monday, September 3, 2012

x86 vs arm. lies, blatant lies and standards

I wanted to publish a rant about x86 vs ARM architecture long ago but had no time to really do so. Therefore, I'm just publishing a slightly edited draft written back in spring.

In short: x86 sucks.

The last two decades of personal computing were dominated by the x86 architecture, while embedded hardware, like cell phones, media players and microwave ovens traditionally used simpler and less power-hungry designs, namely, ARM, MIPS and SH3 architectures.

Ever since first 'smart' devices (PDAs/phones where you could install your own custom applications) have appeared, it has been a dream of many people to reduce the gap in both computing power and abilities of these devices and desktop computers. There have been a lot of community projects to run full-blown linux/BSD distros on handheld devices, but it was not until recently that the major players like M$ and Apple turned their attention to the non-x86 architectures. 2012 is going to see the end of the world appearance of a lot of ARM-based laptops, since Windows 8 is going to be ported to this architecture.

Migration to (or, rather, adoption of) a new architecture has caused a lot of FUD and confusion. Let's just consider some common myths about x86 vs non-x86 debate and analyze the strong and weak points of both camps.

Myth 1: x86 has a huge software base and you cannot replace it.
Reality: most commercial vendors (including M$, Adobe etc) are porting or have already ported their software to the new architecture. Open-source software, because of the efforst of major distributions like Debian, is already ported to virtually anything that can compute.

Of course, there are tons of legacy software, for which even source code may be lost. But that's just because someone was stupid enough to make their software closed-source from the start. Even though this appeals to managers, from an engineering point of view closed source code is the worst sin because it reduces interoperability with other pieces of software, causes a lot of problems when API/ABI breaks (at major updates) and of course you cannot trust the binaries when you don't see the code.

One approach to the problem of legacy software is what the industry has been doing for ages - virtualization. Hardware emulation, to be more precise. While it reduces performance by multiple orders of magnitude, it may be a savior in some extreme cases. For example, when you need your ancient business app once in two months, but want to enjoy your cool and light ARM laptop the rest of the time.

Myth 2: x86 is standardized. Even without vendor support, a generic set of drivers works anywhere. ARM, where is your ACPI?
Reality: The reality is that even though standards exist, they are rarely implemented properly. Even on x86 things like ACPI, hybrid graphics, function keys are often broken and non-standard which means chances are your laptop will not work until you install the customized set of drivers from the OEM.  Or will work but drain battery like crazy.

Myth 3: ARM will never be as fast as X86. While this is true that modern ARM SoCs are typically lag almost a decade behind X86 in terms of processing power, it is possible to build a high-performance RISC SoC. The problem is that increasing performance by the means of reducing die size will raise the power exponentially. Emulating an X86 CISC on a RISC core increases the complexity of the CPU and therefore reduces performance per watt. The reality is that ARM CPUs are mainly utilized in portable hardware, e.g., laptops, smartphones, tablets, where power saving is preferred over performance.

The real problem with X86 is that it is a collective term used to refer to a whole bunch of hardware dating back to calculators from the prehistoric area to the newest 'ivy bridge' systems. One notable problem of this architecture is the presence of three incompatible execution modes: 16-bit real mode, 32-bit protected mode, 64-bit long mode. Legacy software tends to be a mix of these and involves context switches which increase latency and potentially reduce the level of security.

While X86 is ridden by a multitude of remnants of the past, it still has advantages over ARM/MIPS typically used in embedded hardware:

  • Harware virtualziation support (hypervisor). ARM, for example, is yet to release the Cortex A15 core and armv8 instruction set adding virtualization extensions. While it is possible to use the TrustZone hypervisor to virtualize hardware, it should be noticed that there are absolutely no standards describing what APIs must a TZ OS provide, and TrustZone is designed to protect the user from tampering with their hardware and not to protect the user's data from misbehaving execution environment.
  • Standardized BIOS/EFI support for common hardware: VESA framebuffer, SATA/EHCI/XHCI controllers, i8042 keyboard etc. While these interfaces typically do not allow to use the advanced computational and power-saving capabilities of the hardware, they allow to get a custom kernel running on a new board lacking documentation from the hardware vendor.


So, the question is: Is migration to a new architecture needed?
The answer is: yes and no.

What is needed is adapting software to the new computing paradigm. 80-s and 90-s were dominated by the impression that CPU frequency can be increased indefinitely and RAM is expensive. Recently, however, RAM has become very affordable and CPUs are hitting the barrier of frequency due to the technology used and powersaving constraints (well, there are some alternatives like using optical interconnects or returning back to using germanium for semiconductors, but we'll eventually hit the bottleneck once again in a couple of years). Modern computing involves assymetric multiprocessor systems with heterogenous architectures and using vector processors (SSE, NEON, CUDA) for exploiting data-level parallelism.

Monday, June 4, 2012

linux gone wrong


I've been hacking on linux kernel for embedded hardware (mainly PDAs)
for quite a while already and I'm sick to death of the bullshit that
OEMs keep doing. That's why I decided to put up this rant on some
common failures found in OEM code and some advice to OEM programmers/managers
on how to reduce development costs and improve the quality of their code.

Short version: don't suffer NIH and be friendly with the community.
Long version is below. It will probably get edited to become shorter and I'll add more points to it

Ok, there are two levels of problems.
One level is the people at
SoC manufacturers writing support code for the CPU and its builtin peripherals.
Second level, much more wilder and uneducated, are the coders (well, actually,
most of them are electrical engineers) who write the support code for the
finished products - handsets you can go grab at a local store.

The first level is generally friendly and adequate, but that's not the
rule of thumb. Most problems here are caused by managers, lawyers and
other fucktards who're trying to conceal their proprietary shit or
'intellectual property' as you would say in a better company.

Let's take Qualcomm as a good (bad) example. They tried to do their best
about hiding as much details about their hardware as possible. As the result,
audio, camera, gps and almost all drivers are implemented as close-sourced
bits and kernel-side drivers are just stubs that export kernel code to userland.
Naturally, this would be rejected by all sane developers. Luckily mainline
linux kernel is one of the few places where sanity and technical advantages
are still valued.
What's the result of this? Well, MSM/QSD kernel tree severely lags behind
both vanilla kernel and google trees, and up until recently (when the architecture
was re-designed to be more open and standard APIs like ALSA replaced some
proprietary bits) any major Android release (like Gingerbread) meant
HTC, Sony Ericsson and other unhappy souls haunted by Qualcomm had
to waste almost a year of development efforts just to forward-port their crap.

On the other hand, there are good examples like Samsung and Texas Instruments
whose SoCs have all the code mainlined, including V4L2 drivers for camera and
ALSA for audio. And there are those who started like Qcom and rolled out
a mess of crappy BSP code but then realized that it's better to play by the
rules than invest money into doing everything the other way round (NVIDIA).

------
Ok, let's come to the second level - finished products and OEM code.
Judging by the code we can see released publically,
there are no code conventions, no source control. There are tons of #ifdeffery
and duplicate code. That's not the problem per se, that's just the result
of the lack of development culture.

Some notable examples that make me mad.
Qualcomm/HTC: totally ignoring existing interfaces for power supply,
regulators. Reimplementing the existing code (mostly copy-pasting). This leads
to maintainability problems.

Asus [TF101]: instead of using platform data, they decided to hack up
the existing drivers and add code directly to gpio/mmc/sound drivers.

Samsung [Galaxy S2]: a lot of duplicate drivers for the same piece of
hardware, broken makefiles and kconfigs, hacks in drivers.

------
Ranting must be productive. Therefore below I will write a list of
advice on how to write and maintain code for certain subsystems and give
a brief rationale so that it doesn't sound like moaning of a hardware geek.

General
-- Release early, release often, push your code upstream
Rationale: this way your code will get reviewed and potential
problems will get detected at earlier stages. You will not end up
with several mutually incompatible code versions.

-- Do not use the same machine type for all boards. Register one with
arm-linux website.
Rationale: the machine type was introduced to allow having one kernel
binary support multiple boards. When a vendor hardcodes the same
machine type for a range of devices, it makes it impossible to build
a single kernel for these devices and makes maintaining kernel difficult.
Therefore, such code is not accepted upstream and supporting new releases
costs more development efforts.

-- Avoid compile-time #ifdef, especially in board code
Rationale: they prevent multiple boards from being built into a single
kernel image. Also, the code in the rarely used #ifdef branches
tends to get broken. Therefore, use machine type or system revision
to select the code path at runtime.

-- Do not use non-static 'extern' calls. If strictly needed,
use EXPORT_SYMBOL declarations.
Rationale: this adds an extra level of NULL pointer checks and
improves stability and security.

-- Do not reinvent what is already implemented
Rationale: your code will not get accepted upstream. Your code will
not receive bug and security fixes when someone fixes upstream faults.
You will waste time porting it between kernel versions. I will write
a rant about you. If you need some functionality missing upstream,
please submit a patch for it to the mailing list and fix upstream instead
of copy-pasting.

-- Do not pollute the source code with dead #ifdef code, comments
from ancient proprietary VCS systems. Do follow code conventions.
Rationale: this eases up understanding your code and maintaining it.

-- Use public version control repositories.
Rationale: this allows third-party coders to find patches for specific
bugfixes, submit their improvements and generallty reduces the complexity
of keeping up with the upstream and improves code quality.


-- Do not try to hide the code.
Rationale: what you will end up with is that your driver will not
get upstream, enthusiasts will blame you for all deadly sins,
you will have to fix your crap every release.

some notes on drivers

ISP driver (image capture pipeline, video camera)
-- Use V4L2 API
Rationale: you can push the driver upstream, reuse camera sensor
drivers already upstream, reuse user-space libraries and video
processing applications.

Power supply (battery, chargers)
-- Use pda_power driver.
Rationale: it provides most of the functionality custom charger drivers
reimplement, including monitoring usb/ac power supply and notifying battery
drivers.

Sound drivers
-- Use Alsa. Do not use custom IOCTLs.
Rationale: you will be able to reuse existing userland sound libraries and
kernel-level codec drivers.

So, what I want to see is a finished device, like a phone, a tablet, that
can work with the kernel from kernel.org without hackery and additional patches.