Tuesday, March 18, 2014

A story about clang tools

I've always wanted to try writing a toy compiler, but have not made myself actually learn the theory of parsing (I plan to do it and post some notes into the blog soon though). However, recently I've been playing with Clang and LLVM. I've not yet used it for compiling, but I want to share my experience of using and extending Clang's error detection tools.

LLVM is a specification of platform-independent bytecode and a set of tools aimed to make the development of JITed interpreters and portable native binaries easier. A significant portion of work for LLVM was done by Apple and nowadays it is widely used in industry. For example, NVIDIA uses it for compiling CUDA code, AMD uses it to generate shaders in its open-source driver. Clang is a parser and a compiler for a set of C-like languages, which includes C, C++ and Objective-C. This compiler has several properties that may be interesting for developers:
  • It allows you to traverse the parsed AST and transform it. For example, add or remove the curly brackets around if-else conditionals to troll your colleagues.
  • It allows you to define custom annotations via the __attribute__ extension which again can be intercepted after the AST is generated but is not yet compiled.
  • It supports nearly all the features of all revisions of C and C++ and is compatible with the majority of GCC compiler options which allows to use it as a drop-in replacement. By the way, FreeBSD has switched to Clang, and on Apple OS X gcc is actually a symlink to clang!
So, why should you care? Imagine how many times you wanted to add some cool feature but realized macros were not enough. Or you wanted to enforce some code convention? With clang one could easily write a static code analyzer that would catch the notorious Apple "double fail" bug. Which makes me wonder why they did not use their own technology :)

LLVM provides several frameworks for finding bugs at runtime. For example, AddressSanitizer and MemorySanitizer to catch access to uninitialized or unallocated memory.

I was given the following interesting problem at work: build some solution that would allow to detect where the application is leaking memory. Sounds like a common problem, with no satisfying answer.
  • Using Valgrind is prohibitively slow - a program running under it can be easily 50 times slower than without it.
  • Using named SLAB areas (like linux kernel does) is not an option. First of, in the worst case using SLAB means only half of the memory is available for the allocation. Secondly, such approach allows to know objects of what class are occupying the memory, but now where and why they were allocated
  • Using TCMalloc which hooks malloc/free calls also turned out to be slow enough to cause different behaviour in release and debugging environment, so some lightweight solution had to be designed.
Anyway, while thinking of a good way to do it, I found out that Clang 3.4 has something called LeakSanitizer (also lsan and liblsan) which is already ported to GCC 4.9. In short, it is a lightweight version of tcmalloc used in Google Perftools. It collects the information about memory allocations and prints leak locations when the application exits. It can use the LLVM symbolizer or GCC libbacktrace to print human-readable locations instead of addresses. However, it has some issues:
  • It has an explicit check in the __lsan::DoLeakCheck() function which disallows it to be called twice. Therefore, we cannot use it to print leaks at runtime without shutting down the process
  • Leak detection cannot be turned off when it is not needed. Hooked malloc/memalign functions are always used, and the __lsan_enable/disable function pair only controls whether statistics should be ignored or not.
The first idea was to patch the PLT/GOT tables in ELF to dynamically choose between the functions from libc and lsan. It is a very dirty approach, though it will work. You can find a code example at  https://gist.github.com/astarasikov/9547918.

However, patching GOT we only divert the functions for a single binary, and we'd have to patch the GOT for each loaded shared library which is, well, boring. So, I decided to patch liblsan instead. I had to patch it either way, to remove the dreaded limitation in DoLeakCheck. I figured it should be safe to do. Though there is a potential deadlock while parsing ELF header (as indicated by a comment in lsan source), you can work around it by disabling leak checking in global variables.

What I did was to set up a number of function pointers to the hooked functions, initialized with lsan wrappers (to avoid false positives for memory allocation during libc constructors) and add two functions, __lsan_enable_interceptors and __lsan_disable_interceptors to switch between libc and lsan implementations. This should allow to use leak detection for both our code and third-party loadable shared libraries. Since lsan does not have extra dependencies on clang/gcc it was enough to stick a new CMakeLists.txt and it can now be built standalone. So now one can load the library with LD_PRELOAD and query the new functions with "dlsym". If they're present - it is possible to selectively enable/disable leak detection, if not - the application is probably using vanilla lsan from clang.

There are some issues, though
  • LSAN may have some considerable memory overhead. It looks like it doesn't make much sense to disable leak checking since the memory consumed by LSAN won't be reclaimed until the process exits. On the other hand, we can disable leak detection at application startup and only enable it when we need to trace a leak (for example, an app has been running continuously for a long time, and we don't want to stop it to relaunch in a debug configuration).
  • We need to ensure that calling a non-hooked free() on a hooked malloc() and vice-versa does not lead to memory corruption. This needs to be looked into, but it seems that both lsan and libc just print a warning in that case, and corruption does not happen (but a memory leak does, therefore it is impractical to repeatedly turn leak detection on and off)
We plan to release the patched library once we perform some evaluation and understand whether it is a viable approach.

Some ideas worth looking into may be:
  • Add a module to annotate and type-check inline assembly. Would be good for Linux kernel
  • Add a module to trace all pointer puns. For example, in Linux kernel and many other pieces of C code, casting to a void pointer and using the container_of macro is often used to emulate OOP. Now, using clang, one could possibly allow to check the types when some data is registered during initialization, casted to void and then used in some other function and casted back or even generate the intermediate code programmatically.
  • Automatically replace shared variables/function pointer calls with IPC messages. That is interesting if one would like to experiment with porting Linux code to other systems or turning Linux into a microkernel

Monday, January 27, 2014

porting XNU to ARM (OMAP5 CPU)

Two weeks ago I have taken some time to look at the XNU port to the ARM architecture done by the developer who goes by the handle "winocm". Today I've decided to summarize my experience.

Here is a brief checklist if you want to start porting XNU to your board:
  • Start from reading Wiki https://github.com/darwin-on-arm/wiki/wiki/Building-a-bootable-system
  • Clone the DeviceTrees repository: https://github.com/darwin-on-arm/DeviceTrees . Basically, you can use slightly modified DTS files from Linux, but due to the fact that DTC compiler is unfinished, you'll have to rename and comment out some entries. For example, macros in included C headers are not expanded, so you'll have to hardcode values for stuff like A15 VGIC IRQs
  • Get image3maker which is a tool to make images supported both by GenericBooter and Apple's iBoot bootloaders https://github.com/darwin-on-arm/image3maker
  • Use the DTC from https://github.com/winocm/dtc-AppleDeviceTree to compile the abovementioned DTS files
  • Take a look at init/main.c . You may need to add a hackish entry the way it's done for the "HD2" board to limit the amount of RAM available.
  • I have built all the tools on OS X, but then found out that it's easier to use the prebuilt linux chroot image available at: https://github.com/stqism/xnu-chroot-x86_64

The most undocumented step is actually using the image3maker tool and building bootable images. You have to put the to the "images" directory in the GenericBooter-next source. As for the ramdisk, you may find some on github or unpack the iphone firmware, but I simply created an empty file, which is OK since I've not got that far in booting.

Building GenericBooter-next is straightforward, but you need to export the path to the toolchain, and possibly edit the Makefile to point to the correct CROSS_COMPILE prefix

rm images/Mach.*
../image3maker/image3maker  -f ../xnu/BUILD/obj/DEBUG_ARM_OMAP5432_UEVM/mach_kernel -t krnl -o images/Mach.img3
make clean
make

For the ramdisk, you should use the "rdsk" type, and "dtre" for the Device Tree (apparently there's also xmdt for xml device tree, but I did not try that);

Booting on the omap5 via fastboot (without u-boot):
./usbboot -f &
fastboot -c "-no-cache serial=0 -v -x cpus=1 maxmem=256 mem=256M console=ttyO2" -b 0x83ff8040 boot vmlinux.raw

Some notes about the current organization of the XNU kernel and what can be improved:
  • All makefiles containing board identifiers should be sorted alphabetically, just for convenience when adding newer boards.
  • Look into memory size limit. I had to limit the RAM to 256Mb in order to boot. If one enables the whole 2Gb available on the OMAP5432 uEVM board, the kernel fails to "steal pages" during the initialization (as can be seen in the screenshot).
  • I have encountered an issue 

OMAP5 is an ARM Cortex-A15 core. Currently XNU port only has support for the A9 CPU, but if we're booting without SMP and L2 cache, the differences between these architectures are not a big problem. OMAP5 has a generic ARM GIC (Global Interrupt Controller), which is actually compatible to the GIC in the MSM8xxx CPUs, namely with APQ8060 in HP TouchPad, the support for which was added by winocm. UART is a generic 16550, compatible to the one in OMAP3 chip. Given all this, I have managed to get the kernel basically booting and printing some messages to the UART.

Unfortunately, I have not managed to bring up the timer yet. The reason is that I was booting the kernel via USB directly after OMAP5 internal loader, skipping u-boot and hardware initialization. Somehow the internal eMMC chip in my board does not allow me to overwrite the u-boot partition (although I did manage to do it once when I received the board)

I plan to look into it once again, now with hardware pre-initialized by u-boot, and write a detailed post. Looks like the TODO is the following:

  • Mirror linux rootfs (relying on 3rd party github repos is dangerous)
  • Bringing up OMAP5 Timer
  • Building my own RAMDisk
  • Enabling eMMC support
  • Playing with IOKit

Friday, December 27, 2013

I don't even

Look, to some extent I like Mac OS X. It's a UNIX, it has some software (though, very little compared to linux). I like the objective-c language, and developing for iOS is a huge buzz with high salaries. Oh, and it has DTrace. Other than that I don't really have a reason to like it.

Some things about this OS are undocumented and badly broken. Take file system management for example. Tonight I looked at the free disk space and found out that some entity named "Backups" occupied 40GB. Turns out it's Time Machine's local snapshots. The proper way to get rid of them would be to disable automatic backups in Time Machine. One can also disable local snapshots from command line like:

tmutil disablelocal
And this is where I've effed up. This did not remove any space. So, I went ahead and removed the ".MobileBackups" and "Backups.backupdb" folders. NEVER EVER FUCKING DO IT. The thing is that Time Machine, according to some reports, creates hard links to directories (sic!), and now I just lost those 40 gigs - they ain't showing up in "du -s", but they show up as "Others" in the disk space info. Sooo. next time, use the "tmutil delete" to delete those directories.

Ok, I've re-enabled the snapshots with "tmutil enablellocal" and disabled them with the GUI. After that, I opened the Disk Utility and clicked "Verify Disk". It reported that the root FS was corrupted, I had to reboot to the recovery image and run the "Disk Repair". It's really confusing that OS X can perform live fsck (it just freezes all IO operations until fsck is done) but can't repair a live FS.

And a bonus picture for you. In the process I've unplugged my USB disk several times without unmounting and the FS got corrupted. This is what I got for trying to repair it.

Tuesday, November 26, 2013

KVM on ARM Cortex A15 (OMAP5432 UEVM)

Hi! In this post I'll summarize the steps I needed to do in order to get KVM working on the OMAP5 ARM board using the virtualization extensions.

ARM A15 HYP mode.

In Cortex-A15, ARM have introduced a new operating mode called HYP (hypervisor). It has lower permissions than TruztZone. In fact, HYP splits the "insecure" world into two parts, one for hypervisor and the other one for the guests. By default on most boards the system boots into the insecure non-HYP mode. To enter the HYP mode, one needs to use platform-specific ways. For OMAP5 this involves making a call to the TrustZone which will restart the insecure mode cores.

A good overview of how virtualization support for ARM was added to Linux is available at LWN.
http://lwn.net/Articles/557132/

Ingo Molnar HYP patch

There was a patch for u-boot to enable entering HYP mode on OMAP5 by Ingo Molnar. Too bad, it was either written for an early revision of omap5 or poorly tested. It did not work for my board, so I had to learn about OMAP5 TrustZone SCM commands from various sources and put up a patch (which is integrated to my u-boot branch).
If you're interested, you can take a look at the corresponding mailing list entry.

http://u-boot.10912.n7.nabble.com/RFD-OMAP5-Working-HYP-mode-td163302.html

Preparing u-boot SD card

Get the android images from TI or build them yourself. You can use the usbboot tool to boot images from the PC. Or even better, you can build u-boot (this is the preferred way) and then you won't need android images. But you may need the TI GLSDK for the x-loader (MLO). Creating an SD card with u-boot is the same as for omap3 and omap4, so I'll leave this out. There is some magic with creating a proper partition table, so I advise that you get some prebuilt image (like ubuntu for pandaboard) and then replace the files in the FAT partition.

http://software-dl.ti.com/omap/omap5/omap5_public_sw/OMAP5432-EVM/5AJ_1_5_Release/index_FDS.html
http://www.omappedia.com/wiki/6AJ.1.1_Release_Notes

Please consult the OMAP5432 manual on how to set up the DIP switches to boot from SD card.

Source code

For u-boot:
https://github.com/astarasikov/uboot-tegra/tree/omap5_hyp_test

For linux kernel:
https://github.com/astarasikov/linux/tree/omap5_kvm_hacks

Linux kernel is based on the TI omapzoom 3.8-y branch. I fixed a null pointer in the DWC USB3 driver and some issues with the 64-bit DMA bitmasks (I hacked the drivers to work with ARM LPAE, but this probably broke them for anything else. The upstream has not yet decided on how this should be handled).

Compiling stuff

First, let's build the u-boot
#!/bin/bash
export PATH=/home/alexander/handhelds/armv6/codesourcery/bin:$PATH
export ARCH=arm
export CROSS_COMPILE=arm-none-eabi-
U_BOARD=omap5_uevm
make clean
make distclean
make ${U_BOARD}_config
make -j8

you'll get the u-boot.bin and the u-boot.img (which can be put to the SD card). Besides, that will build the mkimage tool that we'll need later.

Now, we need to create the boot script for u-boot that will load the kernel and the device tree file to RAM.

i2c mw 0x48 0xd9 0x15
i2c mw 0x48 0xd4 0x05
setenv fdt_high 0xffffffff
fdt addr 0x80F80000
mmc rescan
mmc part
fatload mmc 0:1 0x80300000 uImage
fatload mmc 0:1 ${fdtaddr} omap5-uevm.dtb
setenv mmcargs setenv bootargs console=ttyO2,115200n8 root=/dev/sda1 rw rootdelay=5 earlyprintk nosmp
printenv
run mmcargs
bootm 0x80300000 - ${fdtaddr} 

Now, compile it to the u-boot binary format:
./tools/mkimage -A arm -T script -C none -n "omap5 boot.scr" -d boot.txt boot.scr


Building linux:

export PATH=/home/alexander/handhelds/armv6/linaro-2012q2/bin:$PATH
export ARCH=arm
export CROSS_COMPILE=/home/alexander/handhelds/armv6/linaro-2012q2/bin/arm-none-eabi-
export OMAP_ROOT=/home/alexander/handhelds/omap
export MAKE_OPTS="-j4 ARCH=$ARCH CROSS_COMPILE=$CROSS_COMPILE"

pushd .
cd ${OMAP_ROOT}/kernel_omapzoom
make $MAKE_OPTS omap5uevm_defconfig
make $MAKE_OPTS zImage
popd

Now, we need to compile the DTS (device tree source code) using the dtc tool. If you choose to use the usbboot instead of u-boot, you can enable the config option in kernel and simply append the DTB blob to the end of zImage.
(Boot Options -> Use appended device tree blob to zImage)

./scripts/dtc/dtc arch/arm/boot/dts/omap5-uevm.dts -o omap5-uevm.dtb -O dtb
cat kernel_omapzoom/arch/arm/boot/zImage omap5-uevm.dtb > kernel
./usbboot -f; fastboot -c "console=ttyO2 console=tty0 rootwait root=/dev/sda1" -b 0x83000000 boot kernel

VirtualOpen

For userspace part, I've followed the manual from VirtualOpenSystems for versatile express. The only tricky part was building qemu for the ArchLinux ARM host, and the guest binaries are available for download.
http://www.virtualopensystems.com/media/kvm-resources/kvm-arm-guide.pdf


P.S. please share your stories on how you're using or plan to use virtualization on ARM

Monday, November 25, 2013

An update on OSX Touchscreen driver

After playing with the HID interface in OS X, I have found out there exists an API for simulating input events from user space, so I've implemented the touchscreen driver using it.

One unexpected caveat was that you need to set up an increasing event number to be able to click the menu bar. Even worse, when launched from XCode, the app would hang if you clicked the XCode menu bar. If you click any other app or launch it outside XCode, everything's fine. Caused some pain while debugging.

Now I still want to implement the driver as a kernel module. On the other hand, the userspace version is also fine. I want to add the support for multitouch gestures, though I'm not sure it is possible to provide multiple pointers to applications.

<iframe width="560" height="315" src="//www.youtube.com/embed/ccMOPNel-2k" frameborder="0" allowfullscreen></iframe>

The code is at github https://github.com/astarasikov/osxhidtouch

You can get the binary OSX touchscreen driver for the Dell S2340T screen from the link below. Basically, you can replace the HidTouch binary with your own, and if you put it to "HidTouch.app/Contents/MacOS/HidTouch", you will get an app bundle that can be installed to Applications and added to the auto-start.
https://drive.google.com/file/d/0B7wcN-tOkdeRRmdYQmhJSWsta1U/edit?usp=sharing

Friday, November 22, 2013

Multitouch touchscreen support in OS X

Hi there!

I happen to have a Dell S2340T multitouch monitor (quite an expensive toy btw) which has a touch controller from 3M. It works fine in Windows (which I don't have any plans to use), sort of works in linux (which is my primary work environment) and does not work at all in OS X (not a big deal but it would be cool to have it).

So I set out on the search for a driver and have figured out the following:

  • There is some old driver from TouchBase that sucks. I mean, it's using some stinky installer that pollutes the OS X root file system, it has the UI from 90s. More than that, it's not signed and does not support my touchscreen. Even adding the USB Vendor ID to its Info.plist did not fix a thing
  • There is some new trial driver from TouchBase that should work for most devices but is limited to 100 touches (a demo version). Well, I didn't want to grovel before them and sign up at their stupid site. And still, even if I patched the driver to remove the trial limitations, I would have to sign it with my certificate and there would be no legal way for me to distribute it on the internetz
  • There is a full version of the TouchBase driver that costs $100. Are they fucking crazy? On the other hand, I would do the same. See, home users don't care, multitouch displays are very rare, and people are used to pirating sofrware. But one could raise quite some money selling the driver to the workshops that build customized Apple computers (like ModBook) or car PCs.
  • There are some drivers for other touchscreens, but they're for the old single-touch devices

Doesn't look promising. Now, one may wonder "WTF ain't it working out of the box? It's a HID device, should work everywhere". Well, there are two problems:

  • Multitouch devices have the new HID event type (called HID usages) for the Digitizer class which the OS X does not support in IOKit.  Actually, since it uses a new HID class (different to single-touch monitors), it is not even classified as a touchscreen by the OS (rather as a multitouch trackpad)
  • The touchscreen gets recognized by the generic HID driver as a mouse. And here comes the problem - when a finger is released, the device reports the pointer to be at the origin (point <0, 0>) which causes the cursor to be stuck at the top left corner
I decided to learn about the HID system in OS X and write a driver. I started with the sample code from the HID documentation and added some event debugging. I found out that the device reports the events both as a digitizer and as a pointing device. So for now I prepared a demo app in Qt5 that can recognize multiple touches and draw points in different colors when the screen is touched. Have a look at it in action:


The source code is at https://github.com/astarasikov/osx-multitouch-demo .


Well, looks like I should figure out how to turn this into a device driver. I will probably need to figure out how to enumerate HID devices based on their class and make a kext file (Kernel EXTension) for XNU (the kernel used in OS X) and then.. sell my driver for some $20, right?

Tuesday, November 12, 2013

Using a hybrid graphics laptop on linux

Introduction.

I have a laptop with so-called hybrid graphics. That is, it has two GPUs - one of them is part of the SoC (Intel HD3000 GPU), the other one is the "discrete" PCIe Radeon HD 6630M from AMD. Typicaly, older models of dual-GPU laptops (and new Apple Macbook Pro machines) have a multiplexer that switches the output from the GPU to display. As the majority of modern laptops, my features a "muxless" combination of the GPUs - that is, the more powerful GPU does not have any physical connection to the display panel. Instead, it can write to the framebuffer memory of the less powerful GPU (or it can perform only the computational part of OpenGL/DirectX and let the primary GPU handle 2D drawing and blitting).

Linux support for the hybrid graphics if far from perfect. To be honest, it just sucks even now that such kind of laptop have been dominating the market for some 4-5 years already. With the recent advent of the "DRI-PRIME" interface the linux kernel and userspace now supports using the secondary GPU for offloading intensive graphic workloads. Right now to utilize the separate GPU, an application has to be launched with a special environment variable (namely, "DRI_PRIME=1") and the system is supposed to dynamically power-up the GPU when it is needed and turn it off to reduce power consumption and heating at other times. Unfortunately, the support for power management is still deeply broken. In this post I summarize my findings and scripts which allow to turn off the external GPU permanently to save power and increase the battery life of the laptop. I am not using the secondary GPU because OpenGL and X11 drivers for the Intel GPU are more stable, and the open-source Radeon driver/mesa does not yet support OpenCL (the computational API for GPUs).

vga_switcheroo

The kernel interface for controlling the power for the external GPU is called "vga_switcheroo" and it allows to power-down the GPU. Unfortunately, I have found out that my laptop (and most others) enable the GPU after a suspend-resume cycle. I think this behaviour is intentional, because ACPI calls should be used to control the power of the GPU. However, it confuses the vga_switcheroo which thinks the card is still powered down. Meanwhile, the card drains some 10-30 watts of power, effectively reducing the laptop battery life from 7 hours to 2 hours or so.

My first stab at this problem was a patch that forced the vga_switcheroo to disable all the cards that were not held by the userspace at resume. That did solve the problem for a while, but was hackish and never made it into the mainline kernel. However, it is still useful for linux kernels up to 3.9.
http://lkml.indiana.edu/hypermail/linux/kernel/1204.3/02530.html

kernel 3.10+

As of linux version 3.10, several changes were made in regards to the hybrid graphics power management. First is the introduction of the dynamic power management (in the form of clock control code and parsing the predefined power states provided by the OEM BIOS) for the Radeon chips. Second one is the change to the vga_switcheroo which allowed it to power down the card when unused without the user interaction. It does work to some extent, but the problem with the card being powered on after a sleep-resume cycle remains.

The problem is that now, when I manually disable the card via the vga_switcheroo, the PCI device is gone - it is removed and never appears again. The same behaviour could be exhibited on pre-3.10 kernels if one did issue the "remove" command to the DRM node (/sys/class/drm/card1/). Besides, my hack to the vga_switcheroo stopped working since these upgrades. Now, this did not make me happy and I set out to figure out the solution.

acpi_call

Linux powers off the radeon GPU using the PCIe bridge and GPU PCIe registers. A "portable" and "official" way to control the power of a GPU is using an ACPI call. ACPI is an interface for the Intel-X86 based computers (although now also implemented for the ARM CPUs in an attempt to provide the support for UEFI, Windows RT and an unified linux kernel binary capable of running on any ARM SoC) intended to provide abstraction of hardware enumeration and power management. It contains tables with lists of PCI and other PNP (plug-and-play) peripherals which can be used by the OS kernel instead of the unsafe "probing" mechanism. Moreover, it contains a specification for the interpreted bytecode language. Some methods are implemented by the OEM inside the BIOS ACPI tables to perform certain functions - query battery status, power up- and down the devices etc.

Folks have since long figured out to use ACPI calls in linux to control the power of the discrete GPU. Although it could interfere with the vga_switcheroo interface, in case we either completely disable the external GPU or power it off via vga_switcheroo first, we're safe to use it. Moreover, I have found out that I can use an ACPI call to power on the device and make it visible to the system after it is removed as described in the previous paragraph!

There exist a module for the linux kernel that allows to perform arbitrary ACPI calls from the userspace. Turns out that a direct ACPI call can even work around the new vga_switcheroo. There's a good guide on installing the acpi_call module into the DKMS subsystem so that it is automatically packaged and built any time you upgrade the linux kernel on your machine.

The module contains the turn_off_gpu.sh script in the examples folder which can be used to power down the GPU. I ran it and took a notice of the method used for my laptop - it was the "\_SB.PCI0.PEG0.PEGP._OFF" (which means South Bridge -> PCI controller 0 -> Pci Express Graphics -> Pci Express Graphics Port).

Now, I did the following magical trick:
echo "\_SB.PCI0.PEG0.PEGP._ON" > /proc/acpi/call
And BANG! after being gone when turned off via the vga_switcheroo, the GPU was identified by linux and enabled again. Neato.

Putting it all together.
Now the real showstopper is that the card still drains power after resume. I figured I had to write a script which would power down the card. It turned out that sometimes if the laptop was woken up and then it immediately went to sleep due to some race condition, systemd did not execute the resume hook.

Instead, I used the udev. Udev is a daemon that listens for the events that the kernel sends when hardware state changes. Each time the laptop wakes up, the power supply and battery drivers are reinitialized. Besides, the primary GPU also sends wakeup events. Another two observations are that I'm not using the discrete GPU and it is safe to call the ACPI "OFF" method multiple times. So, it is not a problem if we call the script too many times. I decided to just call it on any event from the two aforementioned subsystems. Here is how it can be done (note that you need to have root permissions to edit the files in /etc. If you are a newcomer to linux, you can become root by issuing a "sudo su" command):

Create the file "/etc/udev/rules.d/10-battery-hook.rules" with the following contents:
ACTION=="change", SUBSYSTEM=="power_supply", RUN+="/etc/gpu_poweroff.sh"
ACTION=="change", SUBSYSTEM=="drm", RUN+="/etc/gpu_poweroff.sh"

Now, create the /etc/gpu_poweroff.sh script with the contents mentioned below. You can uncomment the "echo" calls to debug the script and verify it is getting called. By the way, the power_profile part is not necessary but is an example of how to put a radeon GPU into a low-power state without disabling it.

gpu_poweroff.sh

#!/bin/bash

#this is a script for lowering the power consumption
#of a radeon GPUs in laptops. comment out whatever portion
#of it you don't need

#echo "gpu script @`date`" >> /tmp/foo

if [ -e /sys/class/drm/card0 ] ;then
for i in /sys/class/drm/card*; do
if [ -e $i/device/power_method ]; then
echo profile > $i/device/power_method
fi
if [ -e $i/device/power_profile ]; then
echo low > $i/device/power_profile
fi
done
fi

if [ -d /sys/kernel/debug/vgaswitcheroo ]; then
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch 
fi

acpi_methods="
\_SB.PCI0.P0P1.VGA._OFF
\_SB.PCI0.P0P2.VGA._OFF
\_SB_.PCI0.OVGA.ATPX
\_SB_.PCI0.OVGA.XTPX
\_SB.PCI0.P0P3.PEGP._OFF
\_SB.PCI0.P0P2.PEGP._OFF
\_SB.PCI0.P0P1.PEGP._OFF
\_SB.PCI0.MXR0.MXM0._OFF
\_SB.PCI0.PEG1.GFX0._OFF
\_SB.PCI0.PEG0.GFX0.DOFF
\_SB.PCI0.PEG1.GFX0.DOFF
\_SB.PCI0.PEG0.PEGP._OFF
\_SB.PCI0.XVR0.Z01I.DGOF
\_SB.PCI0.PEGR.GFX0._OFF
\_SB.PCI0.PEG.VID._OFF
\_SB.PCI0.PEG0.VID._OFF
\_SB.PCI0.P0P2.DGPU._OFF
\_SB.PCI0.P0P4.DGPU.DOFF
\_SB.PCI0.IXVE.IGPU.DGOF
\_SB.PCI0.RP00.VGA._PS3
\_SB.PCI0.RP00.VGA.P3MO
\_SB.PCI0.GFX0.DSM._T_0
\_SB.PCI0.LPC.EC.PUBS._OFF
\_SB.PCI0.P0P2.NVID._OFF
\_SB.PCI0.P0P2.VGA.PX02
\_SB_.PCI0.PEGP.DGFX._OFF
\_SB_.PCI0.VGA.PX02
\_SB.PCI0.PEG0.PEGP.SGOF
\_SB.PCI0.AGP.VGA.PX02
"

# turn off the dGPU via an ACPI call
if [ -e /proc/acpi/call ]; then
for i in $acpi_methods; do
echo $i > /proc/acpi/call
done
#echo "turned gpu off @`date`" >> /tmp/foo
fi

exit 0

Make it executable by issuing a "chmod +x /etc/gpu_poweroff.sh" command.
Also, take a look at the "/etc/rc.local" script. If it does not exist, simply create it with the following contents and make it executable:

#!/bin/sh -e
bash /etc/gpu_poweroff.sh
exit 0

If it does exist, insert the call to the "/etc/gpu_poweroff.sh" before the "exit" line.

Bonus

To silence the fans on the Sony Vaio laptop, you can add the following to your rc.local script (I advise against putting it to the gpu_poweroff script because this command has a noticeable delay):

if [ -e /sys/devices/platform/sony-laptop/thermal_control ]; then
echo silent > /sys/devices/platform/sony-laptop/thermal_control
fi