Tuesday, November 12, 2013

Using a hybrid graphics laptop on linux

Introduction.

I have a laptop with so-called hybrid graphics. That is, it has two GPUs - one of them is part of the SoC (Intel HD3000 GPU), the other one is the "discrete" PCIe Radeon HD 6630M from AMD. Typicaly, older models of dual-GPU laptops (and new Apple Macbook Pro machines) have a multiplexer that switches the output from the GPU to display. As the majority of modern laptops, my features a "muxless" combination of the GPUs - that is, the more powerful GPU does not have any physical connection to the display panel. Instead, it can write to the framebuffer memory of the less powerful GPU (or it can perform only the computational part of OpenGL/DirectX and let the primary GPU handle 2D drawing and blitting).

Linux support for the hybrid graphics if far from perfect. To be honest, it just sucks even now that such kind of laptop have been dominating the market for some 4-5 years already. With the recent advent of the "DRI-PRIME" interface the linux kernel and userspace now supports using the secondary GPU for offloading intensive graphic workloads. Right now to utilize the separate GPU, an application has to be launched with a special environment variable (namely, "DRI_PRIME=1") and the system is supposed to dynamically power-up the GPU when it is needed and turn it off to reduce power consumption and heating at other times. Unfortunately, the support for power management is still deeply broken. In this post I summarize my findings and scripts which allow to turn off the external GPU permanently to save power and increase the battery life of the laptop. I am not using the secondary GPU because OpenGL and X11 drivers for the Intel GPU are more stable, and the open-source Radeon driver/mesa does not yet support OpenCL (the computational API for GPUs).

vga_switcheroo

The kernel interface for controlling the power for the external GPU is called "vga_switcheroo" and it allows to power-down the GPU. Unfortunately, I have found out that my laptop (and most others) enable the GPU after a suspend-resume cycle. I think this behaviour is intentional, because ACPI calls should be used to control the power of the GPU. However, it confuses the vga_switcheroo which thinks the card is still powered down. Meanwhile, the card drains some 10-30 watts of power, effectively reducing the laptop battery life from 7 hours to 2 hours or so.

My first stab at this problem was a patch that forced the vga_switcheroo to disable all the cards that were not held by the userspace at resume. That did solve the problem for a while, but was hackish and never made it into the mainline kernel. However, it is still useful for linux kernels up to 3.9.
http://lkml.indiana.edu/hypermail/linux/kernel/1204.3/02530.html

kernel 3.10+

As of linux version 3.10, several changes were made in regards to the hybrid graphics power management. First is the introduction of the dynamic power management (in the form of clock control code and parsing the predefined power states provided by the OEM BIOS) for the Radeon chips. Second one is the change to the vga_switcheroo which allowed it to power down the card when unused without the user interaction. It does work to some extent, but the problem with the card being powered on after a sleep-resume cycle remains.

The problem is that now, when I manually disable the card via the vga_switcheroo, the PCI device is gone - it is removed and never appears again. The same behaviour could be exhibited on pre-3.10 kernels if one did issue the "remove" command to the DRM node (/sys/class/drm/card1/). Besides, my hack to the vga_switcheroo stopped working since these upgrades. Now, this did not make me happy and I set out to figure out the solution.

acpi_call

Linux powers off the radeon GPU using the PCIe bridge and GPU PCIe registers. A "portable" and "official" way to control the power of a GPU is using an ACPI call. ACPI is an interface for the Intel-X86 based computers (although now also implemented for the ARM CPUs in an attempt to provide the support for UEFI, Windows RT and an unified linux kernel binary capable of running on any ARM SoC) intended to provide abstraction of hardware enumeration and power management. It contains tables with lists of PCI and other PNP (plug-and-play) peripherals which can be used by the OS kernel instead of the unsafe "probing" mechanism. Moreover, it contains a specification for the interpreted bytecode language. Some methods are implemented by the OEM inside the BIOS ACPI tables to perform certain functions - query battery status, power up- and down the devices etc.

Folks have since long figured out to use ACPI calls in linux to control the power of the discrete GPU. Although it could interfere with the vga_switcheroo interface, in case we either completely disable the external GPU or power it off via vga_switcheroo first, we're safe to use it. Moreover, I have found out that I can use an ACPI call to power on the device and make it visible to the system after it is removed as described in the previous paragraph!

There exist a module for the linux kernel that allows to perform arbitrary ACPI calls from the userspace. Turns out that a direct ACPI call can even work around the new vga_switcheroo. There's a good guide on installing the acpi_call module into the DKMS subsystem so that it is automatically packaged and built any time you upgrade the linux kernel on your machine.

The module contains the turn_off_gpu.sh script in the examples folder which can be used to power down the GPU. I ran it and took a notice of the method used for my laptop - it was the "\_SB.PCI0.PEG0.PEGP._OFF" (which means South Bridge -> PCI controller 0 -> Pci Express Graphics -> Pci Express Graphics Port).

Now, I did the following magical trick:
echo "\_SB.PCI0.PEG0.PEGP._ON" > /proc/acpi/call
And BANG! after being gone when turned off via the vga_switcheroo, the GPU was identified by linux and enabled again. Neato.

Putting it all together.
Now the real showstopper is that the card still drains power after resume. I figured I had to write a script which would power down the card. It turned out that sometimes if the laptop was woken up and then it immediately went to sleep due to some race condition, systemd did not execute the resume hook.

Instead, I used the udev. Udev is a daemon that listens for the events that the kernel sends when hardware state changes. Each time the laptop wakes up, the power supply and battery drivers are reinitialized. Besides, the primary GPU also sends wakeup events. Another two observations are that I'm not using the discrete GPU and it is safe to call the ACPI "OFF" method multiple times. So, it is not a problem if we call the script too many times. I decided to just call it on any event from the two aforementioned subsystems. Here is how it can be done (note that you need to have root permissions to edit the files in /etc. If you are a newcomer to linux, you can become root by issuing a "sudo su" command):

Create the file "/etc/udev/rules.d/10-battery-hook.rules" with the following contents:
ACTION=="change", SUBSYSTEM=="power_supply", RUN+="/etc/gpu_poweroff.sh"
ACTION=="change", SUBSYSTEM=="drm", RUN+="/etc/gpu_poweroff.sh"

Now, create the /etc/gpu_poweroff.sh script with the contents mentioned below. You can uncomment the "echo" calls to debug the script and verify it is getting called. By the way, the power_profile part is not necessary but is an example of how to put a radeon GPU into a low-power state without disabling it.

gpu_poweroff.sh

#!/bin/bash

#this is a script for lowering the power consumption
#of a radeon GPUs in laptops. comment out whatever portion
#of it you don't need

#echo "gpu script @`date`" >> /tmp/foo

if [ -e /sys/class/drm/card0 ] ;then
for i in /sys/class/drm/card*; do
if [ -e $i/device/power_method ]; then
echo profile > $i/device/power_method
fi
if [ -e $i/device/power_profile ]; then
echo low > $i/device/power_profile
fi
done
fi

if [ -d /sys/kernel/debug/vgaswitcheroo ]; then
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch 
fi

acpi_methods="
\_SB.PCI0.P0P1.VGA._OFF
\_SB.PCI0.P0P2.VGA._OFF
\_SB_.PCI0.OVGA.ATPX
\_SB_.PCI0.OVGA.XTPX
\_SB.PCI0.P0P3.PEGP._OFF
\_SB.PCI0.P0P2.PEGP._OFF
\_SB.PCI0.P0P1.PEGP._OFF
\_SB.PCI0.MXR0.MXM0._OFF
\_SB.PCI0.PEG1.GFX0._OFF
\_SB.PCI0.PEG0.GFX0.DOFF
\_SB.PCI0.PEG1.GFX0.DOFF
\_SB.PCI0.PEG0.PEGP._OFF
\_SB.PCI0.XVR0.Z01I.DGOF
\_SB.PCI0.PEGR.GFX0._OFF
\_SB.PCI0.PEG.VID._OFF
\_SB.PCI0.PEG0.VID._OFF
\_SB.PCI0.P0P2.DGPU._OFF
\_SB.PCI0.P0P4.DGPU.DOFF
\_SB.PCI0.IXVE.IGPU.DGOF
\_SB.PCI0.RP00.VGA._PS3
\_SB.PCI0.RP00.VGA.P3MO
\_SB.PCI0.GFX0.DSM._T_0
\_SB.PCI0.LPC.EC.PUBS._OFF
\_SB.PCI0.P0P2.NVID._OFF
\_SB.PCI0.P0P2.VGA.PX02
\_SB_.PCI0.PEGP.DGFX._OFF
\_SB_.PCI0.VGA.PX02
\_SB.PCI0.PEG0.PEGP.SGOF
\_SB.PCI0.AGP.VGA.PX02
"

# turn off the dGPU via an ACPI call
if [ -e /proc/acpi/call ]; then
for i in $acpi_methods; do
echo $i > /proc/acpi/call
done
#echo "turned gpu off @`date`" >> /tmp/foo
fi

exit 0

Make it executable by issuing a "chmod +x /etc/gpu_poweroff.sh" command.
Also, take a look at the "/etc/rc.local" script. If it does not exist, simply create it with the following contents and make it executable:

#!/bin/sh -e
bash /etc/gpu_poweroff.sh
exit 0

If it does exist, insert the call to the "/etc/gpu_poweroff.sh" before the "exit" line.

Bonus

To silence the fans on the Sony Vaio laptop, you can add the following to your rc.local script (I advise against putting it to the gpu_poweroff script because this command has a noticeable delay):

if [ -e /sys/devices/platform/sony-laptop/thermal_control ]; then
echo silent > /sys/devices/platform/sony-laptop/thermal_control
fi