Saturday, October 5, 2013

Sleeping considered harmful


Synchronization is a very controversial topic in computer science and software engineering. We're caught up in a vicious circle where everyone is told syncronization is difficult and therefore tries to stay away from the topic, and when one has to deal with threading, they prefer to stay away from reading about synchronization because reading is even more difficult.

One thing I have noticed while developing linux software for various devices is that drivers written by hardware vendors inevitably suck. Today I will tell you about one thing that freaks me out (sure there's a whole universe of things that freak me out but that's beyond the scope of this post).

Now, sometimes, when dealing with hardware, one has to wait (sleep) some time after writing to registers to let the device change its state. One prominent example are the drivers for the LED controllers. Yeah, your fancy new Android paperweight probably has a color LED to notify when your friends sign on to FaceBoob. LEDs can have various triggers, for example, one can set a LED to blink when memory is accessed, just like on old desktop boxes with HDDs. Somehow every time you try to set a trigger to a LED in Android kernel (as opposed to using the Android's proprietary ioctls), the device just freezes dead. What actually happens?

If we take a look at the LED class definition in the linux kernel, we'll see that the functions controlling the LED state (which are to be implemented by the hardware vendor) must not use sleeping primitives like msleep or usleep. Why is such a requirement posed? There are two reasons. One is that LEDs can be used in a context that does not allow sleeping - for example, inside the scheduler code to indicate the CPU load. Sleeping while scheduling could be handled by kernel developers, of course, but such an operation will cause time drifting and increase the complexity of the scheduler. Another reason is that LED control code may be called thousands of times per second, and any delay will cause the huge CPU load.

Anyway, even the performance critical code needs to be correct, therefore the need for the synchronization. The lightweight synchronization primitive used throughout the whole kernel is the spinlock. As you may guess from its name, it does not block the calling thread (like a semaphore or a mutex would do), but busy-waits till it can get into critical section. Polling (or busy-waiting) has always been discouraged because it consumes CPU time. However, it allows the algorithm to progress instead of blocking (therefore, most lock-free algorithms use memory barriers and busy waiting which is equivalent to using a spin lock in kernel). Either way, what you should need about spinlocks is that in most cases they are used to guard writing to an IO register which is a very fast operation, and while a spinlock is held, interrupts are disabled, therefore if you called sleep(), your thread would have no way to resume.

And yeah, most newbie driver developers don't know that some external busses (like I2C) require sleeping due to the nature of hardware. So, when one issues an I2C transfer inside the spinlock, they just want to see the kernel freeze.

Now that we've figured out what causes freezing, let's think of what we can do about it. On the one hand, if we have enough money, we can just assume customers will accept any crap we sell since they have no alternative. And that does work for Samsung, Qualcomm, Sony and many other vendors. On the other hand, we may want to fix it. We know we can't sleep in a lock, but we need a lock to keep data consistent. Linux has several solutions to the rescue and I'll give an overview of the two of them that are easy and suitable for a typical embedded/desktop system
(i.e., an SMP processor with a small number of cores).


First solution is using a workqueue. A workqueue is an abstraction (on top of kernel threads) which allows to schedule jobs (work) to be executed at some moment later. You can think of a workqueue as of a "future" or "async" concept used in userland.

I will use the example of a LED driver I have written for one smartphone (HTC Kovsky also known as Sony Ericsson Xperia X1) to illustrate the workqueue.

First, we need to include the relevant header. Don't trust me here, stuff changes all the time in linux, so you'd rather use grep to find the actual header.
  1. #include <linux/workqueue.h>

We need to declare a "work" that we want to schedule. A work is essentially a routine which would be called, that accepts the instance of a "struct work_struct" class. We can use the "DECLARE_WORK" macro to define a work.

  1. static DECLARE_WORK(colorled_wq, htckovsky_update_color_leds);

Now, let's look at how to implement the work function.

  1. static void htckovsky_update_button_light(struct work_struct* work) {
  2. char buffer[3] = {MICROP_KEYPAD_BRIGHTNESS_KOVS, 0, 0};
  3. char brightness = kovsky_leds[BUTTONS].brightness;
  4. if (brightness) {
  5. buffer[1] = 0x94;
  6. buffer[2] = brightness >> 2;
  7. }
  8. microp_ng_write(client, buffer, 3);
  9. }

As one may notice, we're using the microp_ng_write function which is actually initiating an I2C transfer. We could also add an msleep() call or even add a mutex and that would still work.

Now, how do we call the code? Simple, in your non-sleeping routine (which may be an interrupt handler or an implementation of a function from the LED class), call the "schedule_work" routine.

  1. static void htckovsky_set_brightness_color(struct led_classdev *led_cdev,
  2. enum led_brightness brightness)
  3. {
  4. schedule_work(&colorled_wq);
  5. }

Now, what if you wanted to pass some data to the work routine without using the global variables? The macro "container_of" comes to rescue! Container_of is essentially an implementation of the OOP-like classes in C based on routines and composition instead of inheritance. It works like the following: assume you have a routine which only accepts the "struct work_struct" pointer, and you want to pass arbitrary data. Modifying linux code and creating a copy of the workqueue implementation each time would be silly and boring (although that's what windows developers would do anyway). Instead, we create a structure to hold our data, and add the pointer to the workqueue as a member to it. The container_of macro calculates the offset of a workqueue in our structure, and substracts it from the workqueue pointer. Woops, we have a pointer to our structure and can unmarshal our data from it. This works like the following:

struct work_struct *our_work = kmalloc(sizeof(struct work_struct), GFP_KERNEL);
/* Init the work structure */
INIT_WORK(our_work, handler_routine);
struct my_work_data {
struct work_struct *cool_work;
} work = {
cool_work = our_work;

The full source code of the driver is available at the corresponding linux tree at gitorious. It does not synchronize the LED state, though. One should be careful when writing code without synchronization: even though for a LED which updates thousands of times per second, displaying a mixed state is not an issue (a human eye will not notice glitches), if you go this way, only keep your "virtual", driver state global and concurrently accessible, but make sure that the data written to the hardware registers can not be modified concurrently!

Kernel Threads
Another alternative is using the kernel threads. In fact, kernel threads are almost like a userspace process running in kernel mode. It can receive signals and do many other things, but programming kernel threads is more difficult and almost all the time you can get away with using workqueues. Here's some sample code showing how to spawn a thread and do your dirty business inside:

static int my_thread(void *v) {
 struct completion exit;
 siginfo_t info;


 while (1) {
  if (signal_pending(current)) {
   if (dequeue_signal_lock(current, &current->blocked, &info) == SIGKILL) {
    goto cleanup;
  else {
   //Do your very bad stuff here

 complete_and_exit(&exit, 0);
 return 0;

//should be somewhere in the driver data
struct task_struct *my_task = NULL;

static void start_thread(void) {
 void *arg = NULL; //Can be used to pass data to the handler
 my_task = kthread_create(my_thread, arg, "mydriver_baddaemond");

static void stop_thread(void) {
 send_sig(SIGKILL, my_task, 0);
 my_thread_task = NULL;

Now you know how to make your drivers more stable and less power-hungry. Almost every time you are thinking of an ad-hoc implementation of a common concept, be sure linux has a clean and working solution already.


  1. in stop_thread, I think you have confused my_thread_task for my_task as well, my_thread_task was never defined. I think it should have been my_thread in start_thread and my_task in stop_thread.

    As for the rest, It's good!

    1. Hi! Thank you, I've fixed the typo. I did not compile the code, just took it out of my old project and renamed variables, and mixed them up indeed.