Monday, October 24, 2016

running without an ARM and a leg: Windows Mobile in QEMU


One project I had in mind long time ago was getting Windows Mobile to run in QEMU.
I think it's a lovely OS with a long history and the project seemed like a nice tecnhical challenge.

Initially I started working on it two years ago back in 2014 and the plan was to later run it in KVM on Cortex-A15 with Virtualization Extensions. However, I had to suspend it because I started working on two other challenges - running XNU on Xen (aka LLC) and later doing GSoC (running FreeBSD in ARM emulator).

Now since I've got some free time on my hands, I decided to finally get back to this project and cross it off my TODO list.

Choosing the emulation target.

In order to run Windows CE on QEMU (or any OS for that matter) it would be necessary to either develop a Board Support Package with all the drivers for a specific virtual machine or take the opposite approach and emulate some machine for which there already exists a ROM image.

For Windows, there is the emulator developed by Microsoft which is unsurprisingly called just Device Emulator. It emulates a real board - MINI2440 based on Samsung S3C2440 SoC which is an ancient ARMv4 CPU. Turns out, this is the same SoC that's used in OpenMoko so there is an old fork of QEMU with the support for most of the peripherals. So the choice of the platform seemed a no-brainer.

So I took the QEMU fork supporting MINI2440 and tried to adapt it to running the unmodified Windows Mobile images from Microsoft. Needless to say, I made sure the images are placed into memory at the correct addresses but the code seemed to crash spontaneously and never got past enabling MMU.

The first idea that comes to mind is of course to take the latest QEMU and see if it fixes anything. However, trying random changes until something works is actually quite a crappy approach.

So, I decided to single-step the execution and see what happens. QEMU provides the very useful GDB interface (which can be activated with the "-s -S" switches) for this purpose.

Caches are hard.

The first surprise came from the bootloader code. Before launching the kernel, Windows CE bootloader disables the MMU. At this moment QEMU crashes spectacularly. Initially I tried hacking around the issue by adding the code to translate the addresses into the "exec-all.h". However, it didn't solve the problem and the heuristic started looking too complex which suggested I'm on the wrong way.

I started thinking of and realized that disabling MMU is a tricky thing because on ARM Program Counter is usually ahead of the current instruction so the CPU has to fetch a couple instructions ahead. So we have a caching problem. In this Windows CE code, there is a NOP instruction between disabling the MMU and jumping to the new PA from a register. The hypothesis was that QEMU did not fetch the needed instructions correctly and it was necessary to add a translation cache for them. The reality was more funny.

As it turned out, QEMU contained a hack to cache a couple instructions because... because Linux kernel for PXA270 had the same problem in the opposite scenario - when the MMU was enabled. I decided to comment out the hack and it made the boot-up progress further. (PXA Cache Hack).

Stacks are not easy either.

Next thing I know is that soon after enabling the MMU the code crashes when trying to access the stack at a very peculiar virtual address of 0xffffce70. I examined the QEMU translation code and found out that it correctly locates the physical address but the permission bits are incorrect. I decided to force it to return RW access for the particular address range and voila - Windows Mobile boots to Home Screen successfully. (Hack to force RW stack).

Windows Mobile 5.0 Smartphone Edition on QEMU on Mac OS X.
Now that everything seemed to boot, I decided to take another look at the MMU issue and fix it properly. The first idea I had was to compare ARM920T (ARMv4) and ARM1136 (ARMv5) page table formats. It is worth noting that ARMv4 did not have TEX remap bits, and also last level page tables had different type (last two bits). It turned out that QEMU (probably as real ARM CPUs) supported all types of pages, even ARMv4 on ARMv6 target, and TEX/caching bits were simply ignored. After careful examination of the code and all the page table parsing I found a typo that was already fixed in the upstream QEMU (MMU Typo).

You can grab the cleaner tree without the ugly intermediate hacks: . If you need to have a look at the older WIP stuff, it's at .

Running Pocket PC version.

Preparing PocketPC Image.

Windows Mobile Standard aka Smartphone comes shipped as the NOR flash image and it is XIP (Execute-in-Place). Pocket PC Image comes in a different form. It comes as an image already relocated to the RAM so we need to launch it directly via the "-kernel" parameter. We need to create the NOR memory image for it. We can use the smartphone image as a base, but an empty 32MB file should work as well.

  • Grab the "romtools" scripts at github:
  • python ../ppc_50/_208PPC_USA_bin
  • dd if=_208PPC_USA_bin-80070000-81491ed0.bin of=SPHIDPI_BIN bs=$((0x30000)) seek=1 conv=notrunc
  • ./arm-softmmu/qemu-system-arm -show-cursor -m 256 -M mini2440 -serial stdio -kernel ../wm5/50_ppc/_208PPC_USA_bin-80070000-81491ed0.bin  -pflash ../wm5/50_ppc/SPHIDPI_BIN

This one did not boot and kept hanging at the boot splash screen. The host CPU usage was high and it indicated there was some busy activity inside the VM like an IRQ storm. After hooking up the debugger it turned out that the VM was hanging in two busy loops.

One of them was in the Audio driver - during the splash screen Windows plays a welcome sound. This was worked around by setting the "TX FIFO Ready" status in the sound codec. The second freeze was in the touchscreen driver but that looks like a MINI2440 emulation bug - once the touchscreen is disabled, the workqueue timer in qemu is permanently disabled and the status bits are not updated properly. I commented out the routine which disabled the touchscren controller since it's a VM anyway (not a real device where it would have a power-saving impact).
Obligatory screensot: Windows Mobile 5.0 Pocket PC

Retarded idea.

I was fantasizing about adding a logic to QEMU which would detect if emulation was stuck invoking MMIO device handler in a loop and fuzzing the returned register value until it was unstuck. While not very practical, one could reverse-engineer the register bits blindly this way.

Further ideas.

I don't think I'll invest more time into this project because there's little value but I'm considering developing an Android port of this QEMU fork just for the fun of it. Perhaps a better option would be to emulate a newer SoC such as Qualcomm 8250 and run an ARMv7 image from HTC HD2.

It's quite sad that most if not all Android phones ship with HYP mode (Hypervisor) disabled so KVM is a no-go. On the other hand, allowing to run custom hypervisors opens up a pandora box of security issues so it might be a good decision. Luckily for us who like to tinker with hardware, most TrustZone implementations also contain exploitable bugs so there are some possibilities to uncover the potential :)

Today, many companies working with embedded SoCs, seek GPU virtualization solutions. The demand is particularly high in the automotive sector where people are forced to use eclectic mixes of relatively old SoCs and retaining binary compatibility. So it would be interesting to prototype a solution similar to Intel's GVT.

I think it would be nice to use Qualcomm Adreno GPU as an emulation target. It is relatively well reverse-engineered - there exists Mesa FreeDreno driver for it, Qualcomm commits patches to the upstream KMS driver and most importantly the GPU ISA is similar to older AMD Radeon GPUs which is extensively documented by AMD. Besides, a similar GPU is used in Xbox 360 so one more place to learn is Xenia emulator which simulates the Xbox GPU.