Sunday, September 7, 2014

GSoC results or how I nearly wasted summer

This year I got a chance to participate in the Google Summer of Code with the FreeBSD project.
I have long wanted to learn about the FreeBSD kernel internals and port it to some ARM board but was getting sidetracked whenever I tried to do it. So it looked like a perfect chance to stop procrastinating.

Feel free to scroll down to the "other stuff" paragraph if you don't feel like reading thre paragraphs of a typical whinig associated with debugging C code.

Initially I wanted to bring up some real piece of hardware. My choice was the Nexus 5 phone which has Qualcomm SoC. However, some of the FreeBSD developers suggested that instead I port the kernel to Android Emulator. It sounded like a sane idea since Android Emulator is available for the major OSes and is easy to set up which will allow to expose more people to the project. Besides, unlike QEMU, it can emulate some peripherals specific to a mobile device such as sensors.

What is Android Emulator? In fact, it is an emulator based on QEMU which emulates a virtual board called "Goldfish" which includes a set of devices such as interrupt controller, timer, input devices. It is worth noting that Android Emulator is based on an ancient version of QEMU though starting with Android L one can obtain the binaries of the emulator from git which is based on a more recent build.

I have started from implementing the basics: the interrupt controller, the timer and the UART driver. That has allowed me to see the boot console but then I got stuck for literally months with kernel crashing at various points in what seemed like a totally random fashion. I have spent nights single-stepping the kernel with my sight slowly turning from barely red eyes to emitting hard radiation. It was definitely a problem with caching but ultimately I gave up trying to fix it. Since I knew that FreeBSD kernel worked fine on ARM in a recent version of QEMU, it was clearly a bug in Android Emulator, even though it has not manifested itself in Linux. Since Android Emulator is going to eventually get updated to a new QEMU base and I was running out of time, I decided to just use a nightly build of emulator instead of the one coming with the Android SDK and that fixed this particular issue. 
But hey! At least I've learnt about the FreeBSD virtual memory management and the UMA allocator the hard way.

Next up was fighting the timer and random freezes while initializing the random number generator (sic!). That was easy - turns out I just forgot to call the function. Anyway it would make sense to move that call out of the board files and call it for every board that does not define the "NO_EVENTTIMERS" option in the kernel config.

Fast forward to the start of August when there are only a couple days left and I still don't have a working block device driver to boot rootfs! Writing a MMC driver turned out to be very easy and I started celebrating when I got the SD card image from Raspberry PI booting in Android Emulator and showing the login prompt.

It worked but with a huge kludge in the OpenFirmware bindings. Somehow one of the functions which should've returned the driver bus name returned (seemingly) random crap so instead of a NULL pointer check I had to add a check that the address points to the kernel VM range. I have then tracked down the real source of the problem.
I was basing my MMC driver on the LPC SoC driver. LPC driver itself is suspicious - for example, the place where DMA memory is allocated says "allocating Framebuffer memory" which may be an indicator that it was written in a hurry and possibly only barely tested. At the MMC bus attach routine, it was calling the "device_set_ivars" function and setting the ivars pointer to the mmc host structure. I have observed the similiar pattern in some other drivers. In the OpenFirmware code, the ivars structure was dereferenced as a completely other structure and a string was taken from a member of this structure.

How the hell did it happen in a world where people are using the recent Clang compiler and compile with "-Wall" and "-Werror"? Uhh, well, if you're dealing with some kind of OOP in plain C, casting to void pointer and back and other ill-typed wicked tricks are inevitable. Just look at that prototype and cry with me:

>> device_set_ivars(device_t dev, void *ivar);

For now, I'm pushing the changes to the following branch. Please note that I'm rebasing against master and force-pushing in process if you're willing to test.
So, what has been achieved and what am I still going to do? Well, debugging MMU issues and OpenFirmware bug has delayed me substantially so I've not done everything I've planned. Still
  • Basic hardware works in Goldfish: Timer, IRQs, Ethernet, MMC, UART
  • It is using NEWCONS for Framebuffer Console (which is cool!)
  • I've verified that Android Emulator works in FreeBSD with linuxulator
  • I have fixed Android Emulator to compile natively on FreeBSD and it mostly works (at least, console output) except graphics (which I think should be an easy fix. there are a lot of places where "ifdef __linux__" is completely unjustified and should rather be enabled for all unices)
How to test and contribute?
You can use the linux build of Android Emulator using Linuxulator. Unfortunately, linuxulator only supports 32-bit binaries so we cannot run nightly binaries which have the MMU issues fixed.
Therefore, I have fixed the emulator to compile natively on FreeBSD! I've pushed the branch named "l-preview-freebsd" to github. You will also need the gtest. I cloned the repo from CyanogenMod and pushed a tag named "l-preview-freebsd", which is actually "cm-11.0".

Compiling the emulator:
>> git clone
>> cd cd android_external_gtest
>> git checkout cm-11.0
>> cd ../qemu.git
>> git checkout l-preview-freebsd
>> bash

Please refer to README file in git for the detailed instructions on building kernel.

The only thing that really bothers me about low-level and embedded stuff is that it is extremely difficult to estimate how long it may take to implement a certain feature because every time you end up debugging obscure stuff and the "useful stuff you learn or do" / "time wasted" ratio is very small. On a bright side, while *BSD lag a bit behind Linux in terms of device drivers and performance optimizations, reading and debugging *BSD code is much easier than GNU projects like EGLIBC, GCC and GLib.

Other stuff.
Besides GSoC at the start of summer I had a chance to work with Chris Wade (who is an amazing hacker, by the way). We started working on an ARM virtualization project and have spent nice days debugging caching issues and instruction decoding while trying to emulate a particular cortex A8 SoC on an A15 chip. Unfortunately working on GSoC, going to a daily job and doing this project simultaneously turned out to be surprisingly difficult and I had to quit at least one of the activities. Still I wish Chris good luck with this project and if you're interested in virtualization and iPhones, sign up for early demo at

Meanwhile I'm planning to learn ARMv8 ISA. It's a pity there is no hackable hardware available for reasonable prices yet. QEMU works fine with VirtIO peripherals though. But I'm getting increasingly worried about devkits costing more and more essentially making it more difficult for a novice to become an embedded software engineer.

Friday, June 6, 2014

On Free Software and if proprietary software should be considered harmful

I communicate with a lot of people on the internet and they have various opinions on FOSS ranging from "only proprietary software written by a renowned company can deliver quality" to "if you're using binary blobs, you're a dick". Since these issues arise very often in discussions, I think I need to write it up so I can just shove the link next time.

On the one hand, I am a strong proponent of free and open-source software and I have contributed to several projects related to porting Linux to embedded hardware, especially mobile phones (, Replicant and I also consulted some of the people). Here are the reasons I like free software (as well as free hardware and free knowledge):

  • The most important for me is that you can learn a lot. I have mostly learnt C and subsequently other languages by hacking on the linux kernel and following interesting projects done by fellow developers
  • Practically speaking, it is just easier to maintain and develop software when you have the source code
  • When you want to share a piece of data or an app with someone, if you deal with closed software, you force them into buying a paid app which may compromise their security
  • You can freely distribute your contributions, your cool hacks and research results without being afraid of pursuit by some patent troll
But you know, some people are quite radical in their FOSS love. They insist that using anything non-free is a sin. While I highly respect them for their attitude, I have a different point of view and I want to comment on some common arguments against using or developing non-free software:
  • "Oh, it may threaten your privacy, how can you run untrusted code"? My opinion here is that running untrusted firmwares or drivers for devices is not a big deal because unless you manufacture the SoC and all the peripherals yourself, you can not be sure of what code is running in your system. For example, most X86 CPUs have SMM mode with a proprietary hypervisor, most ARMs have TrustZone mode. If your peripheral does not require you to load the firmware, it just means that the firmware is stored in some non-volatile memory chip in hardware, and you don't have the chance to disable the device by providing it with a null or fake binary. On the other hand, if your device uses some bus like I2C or USB which does not have DMA capabilities or uses IOMMU to restrict DMA access, you are relatively safe even if it runs malicious code.
  • "You can examine open source software and find backdoors in it". Unfortunately this is a huge fallacy. First of all, some minor errors which can lead to huge vulnerabilities can go unnoticed for years. Recent discoveries in OpenSSL and GnuTLS came as a surprise to many of us. And then again, have you ever tried to develop a piece of legacy code with dozens of nested ifdefs when you have no clue which of them get enabled in the final build? In this case, analyzing disassembled binaries may even be easier.
  • "By developing or using non-free software you support it". In the long run it would be better for humanity to have all knowledge to be freely accessible. However, if we consider a single person, their priorities may differ. For example, until basic needs (which are food, housing, safety) are satisfied, one may resort to developing close-sourced software for money. I don't think it's bad. For me, the motivation is knowledge. In the end, even if you develop stuff under an NDA, you understand how it works and can find a way to implement a free analog. This is actually the same reason I think using proprietary software is not bad in itself. For example, how could you expect someone to write a good piece of free software - an OpenGL driver, an OS kernel, a CAD until they get deeply familiar with existing solutions and their limitations?
Regarding privacy, I am more concerned about the government and "security" agencies like NSA which have enough power to change laws and fake documents. Officials change, policy changes and even people who considered themselves safe and loyal patriots may be suddenly labeled traitors.

In the end, it boils down to individual people and communities. Even proprietary platforms like Palm, Windows Mobile or Apple iOS had huge communities of helpful people who were ready to develop software, reverse-engineer hardware and of course help novices. And there are quite some douchebags around free software. Ultimately, just find the people you feel comfortable around, it is all about trust.

Minor notes about recent stuff

I've been quite busy with work and university recently so I did not have much time to work on my projects or write rants, so I decided to roll out a short post discussing several unrelated ideas.

On deprecating Linux (not really)
Recently several Russian IT bloggers have been writing down their ideas about what's broken with Linux and how it should be improved. Basically, one guy started by saying we need to throw away POSIX and write everything from scratch and the other two are trying to find a compromise. You may fire up Google Translate and follow the discussion at

I personally think that these discussions are a bit biased because all these guys are writing from the point of view of an engineer working on distributed web systems. At the same time, there are domains like embedded software, computer graphics, high-performance computations which have other challenges. What's common is that people are realizing the obvious idea: generic solutions designed to work for everyone (like POSIX) limit the performance and flexibility of your system, while domain-specific solutions make your job easier but they may not fit well into what other people are doing.

Generally both in high-performance networking and graphics people are trying to do the same: reduce the number of context switches (it is true that on modern X86 a context switch is relatively cheap and we may use a segmented memory model like in Windows CE 5.0 instead of the common user/kernel split, but the problem is not the CPU execution mode switch but that system calls like "flush", "ioctl" and library calls like "glFlush()" are used as a point of synchronization where a lot of additional work is done besides the call itself) and move all execution into userspace. Examples include asynchronous network servers using epoll and cooperative scheduling, Intel's networking stack in user land (Netmap, DPDK), modern Graphics APIs (Microsoft DirectX 12, AMD Mantle, Apple Metal). The cost of maintaining abstractions - managing buffers and contexts in drivers - has risen so high that neither hardware vendors want to write complex drivers nor they can deliver performance. So, we'll have to step back and learn to use hardware wisely once again.

Actually I like the idea of using minimalistic runtimes on top of hypervisors like Erlang on Xen from the point of simplicity. Still, security and access control are open questions. For example, capability-based security as in L4, has always looked interesting, but whether cheap system calls and dramatically reduced "trusted code base" promises have been fulfilled is very arguable. Then again, despite the complexity of linux, its security is improving because of control groups which are heavily utilized by docker and systemd distros. Another issue is that lightweight specific solutions are rarely reusable. Well, from the economic point of view a cheap solution that's done overnight and does its job is just perfect, but generally the amount of NIH and engineers basically writing the same stuff - drivers, applications, libraries and servers in an absolutely identical way but for a dozen identical OSs is amazing and rather uncreative.

Anyway, it looks like we're there again: rise of Worse is Better

Work (vaapi, nix).
At work, among other things, I was asked to figure out how to use the Intel GPU for H.264 video encoding. Turns there are two alternatives: the open source VAAPI library and the proprietary Intel Media SDK. Actually, the latter still uses a modified version of VAAPI, but I feel that fitting it into our usual deployment routine is going to be hard, because even basic critical components of the driver stack, such as the kernel KMS module and are provided in the binary form.

Actually VAAPI is very low-level. One thing that puzzled me initially is that it does not generate H.264 bitstream. You have to make it yourself and feed into the encoder via a special buffer type. Besides, you have to manually take the reconstructed picture and feed it as a reference for subsequent frames. There are several implementations using this API for encoding: gstreamer, ffmpeg, vlc. I have spent quite some time until I got it to encode a sample chunk of YUV data. Ultimately my code started looking identical to the "avcenc.c" sample from libva except that encoder data is stored in a "context" structure instead of global variables.

My advice is that if you want to learn about video decoding APIs on linux and Android, take a look at Chromium source code (well, you may expect to find a solution for any relevant computer science or engineering problem given how much code it contains). And also take a look at GST, FFMPEG and VLC. Notice how each of them has its own way of managing buffers (and poor error handling btw).
Another thing we're using at work is the Nix package manager. I have always wanted to try it but did not really do it until coming to work to this place. Nix is a fantastic concept (Even +Norman Feske got inspired by it for their Genode OS). However, we've notices a slowdown when compiling software under Nix. Here are some of my observations:
  • Compiling a simple C file with just "main" function takes 30ms in linux but >300ms in Nix chroot. Nix installs each library to a separate prefix and uses LDFLAGS and CFLAGS to direct the compiler to use them. Gcc then iterates over each of these directories trying to find each library which ends up being slow. Anyone knows a workaround?
  • Running gcc under the "perf" profiler shows that it spends most of its time in multibyte string collation functions. I find it ridiculous that exporting "LC_ALL=C" magically makes the compilation time fall down to 100ms.
Hacking on FreeBSD
As a part of my GSoC project I got the FreeBSD kernel booting on Android emulator and I've just have to write the virtual ethernet and MMC drivers. Unfortunately my university has distracted me a lot from this project but now that I have time I hope to finish it by midterm. And then I'll probably port FreeBSD to Qualcomm MSM8974 SoC. Yay, red Nexus 5 is an awesome phone!

My little project (hacking is magic)
Long time ago I decided to give Windows Mobile emulation a shot and got the kernel to start booting in QEMU. Since Microsoft's Device Emulator was open-source and emulated a real Samsung S3C2410 SoC, it was easy. I still plan to get it fully working one day.

But QEMU is a boring target actually. What's more interesting is developing a bare-metal hypervisor for A15 and A7 CPUs. Given the progress made by Rob Clark on reverse-engineering Qualcomm Adreno GPU driver, I think it could be doable with reasonable effort to emulate the GPU and consequently enough hardware to run Windows Mobile and Windows RT. A very interesting thing is that ARMs trap almost all coprocessor registers for guest access (privilege levels 0 and 1) meaning you can fake any CPU ID or change memory access behavior by modifying caching and buffering settings.

What is really interesting is that there are a lot of Chinese phones which copy Nokia smartphones, iPhones, Android phones. Recent revisions of Mediatek MTK SoCs are Arm A7 meaning they support virtualization. Ultimately it means someone could make a high quality replica of any phone and virtualize any SoC without a trace which has interesting security implications.

Software sucks!
The other day some systemd update came out which totally broke my debian bootup. Now, there's a race condition between the encrypted disk password entry and root password entry for "recovery". Then, latest kernel (3.15-rc3) OOPSes and crashes randomly on ACPI events which subsequently breaks networking and spawning new processes.

Ultimately after a day of struggle my rootfs broke and after and FSCK everything was gone. So now I'm deciding what distro I should try instead. Maybe ArchLinux? Either way, I have to build a lot of software from source and install to /usr/local or custom prefixes for development.

The easiest way would be to install into a VM in OS X. But then, I want to fix issues with drivers, especially GPU switching in the macbook.  On the other hand, I spent ages fixing switchable graphics on an older Sony VAIO laptop and resorted to an ugly hack to force Intel GPU. Since GPU switching is not working properly in Linux, maybe I should write a graphics multiplexor driver for FreeBSD and ditch Linux altogether? FreeBSD looks much more attractive these days with clang and no systemd and no and no Lennart Poettering.

Tuesday, March 18, 2014

A story about clang tools

I've always wanted to try writing a toy compiler, but have not made myself actually learn the theory of parsing (I plan to do it and post some notes into the blog soon though). However, recently I've been playing with Clang and LLVM. I've not yet used it for compiling, but I want to share my experience of using and extending Clang's error detection tools.

LLVM is a specification of platform-independent bytecode and a set of tools aimed to make the development of JITed interpreters and portable native binaries easier. A significant portion of work for LLVM was done by Apple and nowadays it is widely used in industry. For example, NVIDIA uses it for compiling CUDA code, AMD uses it to generate shaders in its open-source driver. Clang is a parser and a compiler for a set of C-like languages, which includes C, C++ and Objective-C. This compiler has several properties that may be interesting for developers:
  • It allows you to traverse the parsed AST and transform it. For example, add or remove the curly brackets around if-else conditionals to troll your colleagues.
  • It allows you to define custom annotations via the __attribute__ extension which again can be intercepted after the AST is generated but is not yet compiled.
  • It supports nearly all the features of all revisions of C and C++ and is compatible with the majority of GCC compiler options which allows to use it as a drop-in replacement. By the way, FreeBSD has switched to Clang, and on Apple OS X gcc is actually a symlink to clang!
So, why should you care? Imagine how many times you wanted to add some cool feature but realized macros were not enough. Or you wanted to enforce some code convention? With clang one could easily write a static code analyzer that would catch the notorious Apple "double fail" bug. Which makes me wonder why they did not use their own technology :)

LLVM provides several frameworks for finding bugs at runtime. For example, AddressSanitizer and MemorySanitizer to catch access to uninitialized or unallocated memory.

I was given the following interesting problem at work: build some solution that would allow to detect where the application is leaking memory. Sounds like a common problem, with no satisfying answer.
  • Using Valgrind is prohibitively slow - a program running under it can be easily 50 times slower than without it.
  • Using named SLAB areas (like linux kernel does) is not an option. First of, in the worst case using SLAB means only half of the memory is available for the allocation. Secondly, such approach allows to know objects of what class are occupying the memory, but now where and why they were allocated
  • Using TCMalloc which hooks malloc/free calls also turned out to be slow enough to cause different behaviour in release and debugging environment, so some lightweight solution had to be designed.
Anyway, while thinking of a good way to do it, I found out that Clang 3.4 has something called LeakSanitizer (also lsan and liblsan) which is already ported to GCC 4.9. In short, it is a lightweight version of tcmalloc used in Google Perftools. It collects the information about memory allocations and prints leak locations when the application exits. It can use the LLVM symbolizer or GCC libbacktrace to print human-readable locations instead of addresses. However, it has some issues:
  • It has an explicit check in the __lsan::DoLeakCheck() function which disallows it to be called twice. Therefore, we cannot use it to print leaks at runtime without shutting down the process
  • Leak detection cannot be turned off when it is not needed. Hooked malloc/memalign functions are always used, and the __lsan_enable/disable function pair only controls whether statistics should be ignored or not.
The first idea was to patch the PLT/GOT tables in ELF to dynamically choose between the functions from libc and lsan. It is a very dirty approach, though it will work. You can find a code example at

However, patching GOT we only divert the functions for a single binary, and we'd have to patch the GOT for each loaded shared library which is, well, boring. So, I decided to patch liblsan instead. I had to patch it either way, to remove the dreaded limitation in DoLeakCheck. I figured it should be safe to do. Though there is a potential deadlock while parsing ELF header (as indicated by a comment in lsan source), you can work around it by disabling leak checking in global variables.

What I did was to set up a number of function pointers to the hooked functions, initialized with lsan wrappers (to avoid false positives for memory allocation during libc constructors) and add two functions, __lsan_enable_interceptors and __lsan_disable_interceptors to switch between libc and lsan implementations. This should allow to use leak detection for both our code and third-party loadable shared libraries. Since lsan does not have extra dependencies on clang/gcc it was enough to stick a new CMakeLists.txt and it can now be built standalone. So now one can load the library with LD_PRELOAD and query the new functions with "dlsym". If they're present - it is possible to selectively enable/disable leak detection, if not - the application is probably using vanilla lsan from clang.

There are some issues, though
  • LSAN may have some considerable memory overhead. It looks like it doesn't make much sense to disable leak checking since the memory consumed by LSAN won't be reclaimed until the process exits. On the other hand, we can disable leak detection at application startup and only enable it when we need to trace a leak (for example, an app has been running continuously for a long time, and we don't want to stop it to relaunch in a debug configuration).
  • We need to ensure that calling a non-hooked free() on a hooked malloc() and vice-versa does not lead to memory corruption. This needs to be looked into, but it seems that both lsan and libc just print a warning in that case, and corruption does not happen (but a memory leak does, therefore it is impractical to repeatedly turn leak detection on and off)
We plan to release the patched library once we perform some evaluation and understand whether it is a viable approach.

Some ideas worth looking into may be:
  • Add a module to annotate and type-check inline assembly. Would be good for Linux kernel
  • Add a module to trace all pointer puns. For example, in Linux kernel and many other pieces of C code, casting to a void pointer and using the container_of macro is often used to emulate OOP. Now, using clang, one could possibly allow to check the types when some data is registered during initialization, casted to void and then used in some other function and casted back or even generate the intermediate code programmatically.
  • Automatically replace shared variables/function pointer calls with IPC messages. That is interesting if one would like to experiment with porting Linux code to other systems or turning Linux into a microkernel

Monday, January 27, 2014

porting XNU to ARM (OMAP5 CPU)

Two weeks ago I have taken some time to look at the XNU port to the ARM architecture done by the developer who goes by the handle "winocm". Today I've decided to summarize my experience.

Here is a brief checklist if you want to start porting XNU to your board:
  • Start from reading Wiki
  • Clone the DeviceTrees repository: . Basically, you can use slightly modified DTS files from Linux, but due to the fact that DTC compiler is unfinished, you'll have to rename and comment out some entries. For example, macros in included C headers are not expanded, so you'll have to hardcode values for stuff like A15 VGIC IRQs
  • Get image3maker which is a tool to make images supported both by GenericBooter and Apple's iBoot bootloaders
  • Use the DTC from to compile the abovementioned DTS files
  • Take a look at init/main.c . You may need to add a hackish entry the way it's done for the "HD2" board to limit the amount of RAM available.
  • I have built all the tools on OS X, but then found out that it's easier to use the prebuilt linux chroot image available at:

The most undocumented step is actually using the image3maker tool and building bootable images. You have to put the to the "images" directory in the GenericBooter-next source. As for the ramdisk, you may find some on github or unpack the iphone firmware, but I simply created an empty file, which is OK since I've not got that far in booting.

Building GenericBooter-next is straightforward, but you need to export the path to the toolchain, and possibly edit the Makefile to point to the correct CROSS_COMPILE prefix

rm images/Mach.*
../image3maker/image3maker  -f ../xnu/BUILD/obj/DEBUG_ARM_OMAP5432_UEVM/mach_kernel -t krnl -o images/Mach.img3
make clean

For the ramdisk, you should use the "rdsk" type, and "dtre" for the Device Tree (apparently there's also xmdt for xml device tree, but I did not try that);

Booting on the omap5 via fastboot (without u-boot):
./usbboot -f &
fastboot -c "-no-cache serial=0 -v -x cpus=1 maxmem=256 mem=256M console=ttyO2" -b 0x83ff8040 boot vmlinux.raw

Some notes about the current organization of the XNU kernel and what can be improved:
  • All makefiles containing board identifiers should be sorted alphabetically, just for convenience when adding newer boards.
  • Look into memory size limit. I had to limit the RAM to 256Mb in order to boot. If one enables the whole 2Gb available on the OMAP5432 uEVM board, the kernel fails to "steal pages" during the initialization (as can be seen in the screenshot).
  • I have encountered an issue 

OMAP5 is an ARM Cortex-A15 core. Currently XNU port only has support for the A9 CPU, but if we're booting without SMP and L2 cache, the differences between these architectures are not a big problem. OMAP5 has a generic ARM GIC (Global Interrupt Controller), which is actually compatible to the GIC in the MSM8xxx CPUs, namely with APQ8060 in HP TouchPad, the support for which was added by winocm. UART is a generic 16550, compatible to the one in OMAP3 chip. Given all this, I have managed to get the kernel basically booting and printing some messages to the UART.

Unfortunately, I have not managed to bring up the timer yet. The reason is that I was booting the kernel via USB directly after OMAP5 internal loader, skipping u-boot and hardware initialization. Somehow the internal eMMC chip in my board does not allow me to overwrite the u-boot partition (although I did manage to do it once when I received the board)

I plan to look into it once again, now with hardware pre-initialized by u-boot, and write a detailed post. Looks like the TODO is the following:

  • Mirror linux rootfs (relying on 3rd party github repos is dangerous)
  • Bringing up OMAP5 Timer
  • Building my own RAMDisk
  • Enabling eMMC support
  • Playing with IOKit

Friday, December 27, 2013

I don't even

Look, to some extent I like Mac OS X. It's a UNIX, it has some software (though, very little compared to linux). I like the objective-c language, and developing for iOS is a huge buzz with high salaries. Oh, and it has DTrace. Other than that I don't really have a reason to like it.

Some things about this OS are undocumented and badly broken. Take file system management for example. Tonight I looked at the free disk space and found out that some entity named "Backups" occupied 40GB. Turns out it's Time Machine's local snapshots. The proper way to get rid of them would be to disable automatic backups in Time Machine. One can also disable local snapshots from command line like:

tmutil disablelocal
And this is where I've effed up. This did not remove any space. So, I went ahead and removed the ".MobileBackups" and "Backups.backupdb" folders. NEVER EVER FUCKING DO IT. The thing is that Time Machine, according to some reports, creates hard links to directories (sic!), and now I just lost those 40 gigs - they ain't showing up in "du -s", but they show up as "Others" in the disk space info. Sooo. next time, use the "tmutil delete" to delete those directories.

Ok, I've re-enabled the snapshots with "tmutil enablellocal" and disabled them with the GUI. After that, I opened the Disk Utility and clicked "Verify Disk". It reported that the root FS was corrupted, I had to reboot to the recovery image and run the "Disk Repair". It's really confusing that OS X can perform live fsck (it just freezes all IO operations until fsck is done) but can't repair a live FS.

And a bonus picture for you. In the process I've unplugged my USB disk several times without unmounting and the FS got corrupted. This is what I got for trying to repair it.

Tuesday, November 26, 2013

KVM on ARM Cortex A15 (OMAP5432 UEVM)

Hi! In this post I'll summarize the steps I needed to do in order to get KVM working on the OMAP5 ARM board using the virtualization extensions.

ARM A15 HYP mode.

In Cortex-A15, ARM have introduced a new operating mode called HYP (hypervisor). It has lower permissions than TruztZone. In fact, HYP splits the "insecure" world into two parts, one for hypervisor and the other one for the guests. By default on most boards the system boots into the insecure non-HYP mode. To enter the HYP mode, one needs to use platform-specific ways. For OMAP5 this involves making a call to the TrustZone which will restart the insecure mode cores.

A good overview of how virtualization support for ARM was added to Linux is available at LWN.

Ingo Molnar HYP patch

There was a patch for u-boot to enable entering HYP mode on OMAP5 by Ingo Molnar. Too bad, it was either written for an early revision of omap5 or poorly tested. It did not work for my board, so I had to learn about OMAP5 TrustZone SCM commands from various sources and put up a patch (which is integrated to my u-boot branch).
If you're interested, you can take a look at the corresponding mailing list entry.

Preparing u-boot SD card

Get the android images from TI or build them yourself. You can use the usbboot tool to boot images from the PC. Or even better, you can build u-boot (this is the preferred way) and then you won't need android images. But you may need the TI GLSDK for the x-loader (MLO). Creating an SD card with u-boot is the same as for omap3 and omap4, so I'll leave this out. There is some magic with creating a proper partition table, so I advise that you get some prebuilt image (like ubuntu for pandaboard) and then replace the files in the FAT partition.

Please consult the OMAP5432 manual on how to set up the DIP switches to boot from SD card.

Source code

For u-boot:

For linux kernel:

Linux kernel is based on the TI omapzoom 3.8-y branch. I fixed a null pointer in the DWC USB3 driver and some issues with the 64-bit DMA bitmasks (I hacked the drivers to work with ARM LPAE, but this probably broke them for anything else. The upstream has not yet decided on how this should be handled).

Compiling stuff

First, let's build the u-boot
export PATH=/home/alexander/handhelds/armv6/codesourcery/bin:$PATH
export ARCH=arm
export CROSS_COMPILE=arm-none-eabi-
make clean
make distclean
make ${U_BOARD}_config
make -j8

you'll get the u-boot.bin and the u-boot.img (which can be put to the SD card). Besides, that will build the mkimage tool that we'll need later.

Now, we need to create the boot script for u-boot that will load the kernel and the device tree file to RAM.

i2c mw 0x48 0xd9 0x15
i2c mw 0x48 0xd4 0x05
setenv fdt_high 0xffffffff
fdt addr 0x80F80000
mmc rescan
mmc part
fatload mmc 0:1 0x80300000 uImage
fatload mmc 0:1 ${fdtaddr} omap5-uevm.dtb
setenv mmcargs setenv bootargs console=ttyO2,115200n8 root=/dev/sda1 rw rootdelay=5 earlyprintk nosmp
run mmcargs
bootm 0x80300000 - ${fdtaddr} 

Now, compile it to the u-boot binary format:
./tools/mkimage -A arm -T script -C none -n "omap5 boot.scr" -d boot.txt boot.scr

Building linux:

export PATH=/home/alexander/handhelds/armv6/linaro-2012q2/bin:$PATH
export ARCH=arm
export CROSS_COMPILE=/home/alexander/handhelds/armv6/linaro-2012q2/bin/arm-none-eabi-
export OMAP_ROOT=/home/alexander/handhelds/omap

pushd .
cd ${OMAP_ROOT}/kernel_omapzoom
make $MAKE_OPTS omap5uevm_defconfig
make $MAKE_OPTS zImage

Now, we need to compile the DTS (device tree source code) using the dtc tool. If you choose to use the usbboot instead of u-boot, you can enable the config option in kernel and simply append the DTB blob to the end of zImage.
(Boot Options -> Use appended device tree blob to zImage)

./scripts/dtc/dtc arch/arm/boot/dts/omap5-uevm.dts -o omap5-uevm.dtb -O dtb
cat kernel_omapzoom/arch/arm/boot/zImage omap5-uevm.dtb > kernel
./usbboot -f; fastboot -c "console=ttyO2 console=tty0 rootwait root=/dev/sda1" -b 0x83000000 boot kernel


For userspace part, I've followed the manual from VirtualOpenSystems for versatile express. The only tricky part was building qemu for the ArchLinux ARM host, and the guest binaries are available for download.

P.S. please share your stories on how you're using or plan to use virtualization on ARM