Monday, January 18, 2016

Abusing QEMU and mmap() or Poor Man's PCI Pass-Through

The problem.

Another tale from the endless firmware endeavours.
The other day I ended up in a weird situation. I was locked in a room with three firmwares - one of them was built by myself from source, while the other two came in binary form from the vendor. Naturally the one I built was not fully working, and one of the binaries worked fine. So I set out on a quest to trace all PCI MMIO accesses.

When one thinks about it, several solutions come to mind. The first idea is to run a QEMU/XEN VM and pass-through the PCI device. Unfortunately the board we had did not have IOMMU (or VT-D in Intel terminology). The board had the Intel BayTrail SoC which is basically a CPU for phones.
Another option would be to binary-patch the prebuilt firmware and introduce a set of wrapper functions around MMIO access routines and add corresponding calls to printf() for debugging but this approach is very tedious and error-prone, and I try to avoid patching binaries because I'm extremely lazy.

So I figured the peripherals of interest (namely, the I2C controller and a NIC card) were simple devices configured by simple register writes. The host did not pass any shared memory to the device, so I used a simple technique. I patched QEMU and introduced "fake" PCI devices with VID and PID of the devices I wanted to "pass-through". By default the MMIO read access return zero (or some other default value) while writes are ignored. Then I went on and added the real device access. Since I was running QEMU on Linux, I just blacklisted and rmmod'ed the corresponding drivers. Then, I used "lspci" to find the addresses of the PCI BARs and mmap() syscall on "/dev/mem" to access the device memory.

A picture is worth a thousand words


The code.

The code is quite funky, but thanks to the combination of many factors (X86 being a strongly ordered architecture and the code using only 4-byte transfers) the naive implementation worked fine.
1. I can now trace all register reads and writes
2. I have verified that the "good" firmware still boots and everything is working fine.
So, after that I simply ran all three firmwares and compared the register traces only to find out that the problems might be unrelated to driver code (which is a good result because it allowed to narrow down the amount of code to explore).

The branch with my customized QEMU is here. Also, I figured QEMU now has the ultra-useful option called "readconfig" which allows you to supply the board configuration in a text file and avoid messing with cryptic command line switches for enabling PCI devices or HDDs.
https://github.com/astarasikov/qemu/tree/our_board
https://github.com/astarasikov/qemu/commit/2e60cad67c22dca750852346916d6da3eb6674e7

The code for mmapping the PCI memory is here. It is a marvel of YOLO-based design, but hey, X11 does the same, so it is now industry standard and unix-way :)
https://github.com/astarasikov/qemu/commit/2e60cad67c22dca750852346916d6da3eb6674e7#diff-febf7a335ad7cd658784899e875b5cc7R31

Limitations.

There is one obvious limitation, which is also the reason QEMU generally does not support passthrough of MMIO devices on systems without IOMMU (Like VT-D or SMMU).

Imagine a case where the device is not only configured via simple register transfers (such as setting certain bits to enable IRQ masking or clock gating), but system memory is passed to the device via storing a pointer in a register.

The problem is that the address coming from the guest VM (GPA, guest physical address) will not usually match the corresponding HPA (host physical address). The device however operates the bus addresses and is not aware of the MMU, let alone VMs. IOMMU solves this problem by translating the addresses coming from the device into virtual addresses, and the hypervisor can supply a custom page table to translate bus HPAs into GPAs.

So if one would want to provide a pass-through solution for sharing memory in systems without IOMMU, one would either need to ensure that the VM and guest have 1:1 GPA/HPA mappings for the addresses used by the device or come up with an elaborate scheme which would detect register writes containing memory buffers and set up a corresponding buffer in the host memory space, but then one will need to deal with keeping the host and guest buffers consistent. Besides, that would require the knowledge of the device registers, so it would be non-trivial to build a completely generic, plug-and-play solution.