Friday, August 13, 2021

building CLVK OpenCL support for Android phones + OpenCV notes

Compiling CLVK for Android.

Many Android devices, especially Google Pixel, ship without the OpenCL library.
At some point I needed OpenCL for my OpenCV prototyping, and I was also interested in using either CU2CL or a similar project to run CUDA code.
Needless to say, as soon as I saw a project which promised to implement OpenCL on top of Vulkan, I decided to see if I can run it on Android.

It worked fine, although my approach was kind of nasty: integrating the project along with its LLVM library into the app code. That was good enough for prototyping, although the debug binaries took a few hundred megabytes.
I've added instructions to cross-compile this project for Android.
Additionally, I wrote a simple Android app to demonstrate how to integrate the pre-built
OpenCL library and how to deploy the OpenCL compiler ("clspv" binary) to the device.

Two example OpenCL apps are compiled: "clinfo" which prints some basic information and "BitonicSort" from Intel OpenCL demos.
FWIW both of them work so it's a good start.

Also, it might be curious to compare the run times for this app on the phone with CLVK and on the desktop with native OpenCL drivers and with CLVK + RADV.
It seems that on desktop CLVK with RADV is 10 times slower than the native driver.

However, since RADV or any other Vulkan driver uses most of the same LLVM-based codegen as the native OpenCL drivers, this difference is very likely caused by some hardcoded allocation size or another similar parameter rather than some major architectural issue.
I have not looked into it yet for lack of time though.

  • ARM Mali G72 MP18: 96.818924 ms
  • Qualcomm Adreno 630: 33.18 ms
  • AMD RX480 with CLVK and RADV: 7.491112 ms.
  • AMD RX480 with ROCm OpenCL driver: 0.651121 ms.

OpenCL with CLVK on top of Android GL driver on a device with no OpenCL

OpenCL with CLVK on top of Android GL driver on a device with no OpenCL

Not a real fix, but that's enough to make most OpenCL samples run.

Building OpenCV with OpenCL support for Android.

For my personal project in 2016 I needed to check if it is possible to use the GPGPU accelerated version of OpenCV that is implemented using OpenCL on Android phones.

OpenCL SDK for Android.

Where to get the SDK? Welp. Build one yourself!
To make OpenCV recognize our SDK and build successfully, we need the following things:
  • - can be empty, but the OpenCV build system needs it to be present
  • Khronos OpenCL headers - can be gathered using OpenCL headers and CL-CPP SDK.
  • The loadable dynamic libraries - can be pulled from the device. Generally you can use one from any other device with the same architecture because the ABI and API is the same as it is defined by the OpenCL standard.
Here is how out "SDK" tree should look like. For the time being I only used the "armeabi-v7a" architecture, but one can also add the 64-bit binaries to the "arm64-v8a" directory. I've put up the "SDK" to GitHub, but only the header part (which are publicly available from Khronos). You will need to find the "" yourself. (If you're using Google Pixel with MSM8996, you can take the proprietary binaries from Xiaomi Mi5 or Zuk Z2 Pro).

├── include
│   └── CL
│       ├── cl.h
│       ├── cl.hpp
│       ├── cl_d3d10.h
│       ├── cl_d3d11.h
│       ├── cl_dx9_media_sharing.h
│       ├── cl_dx9_media_sharing_intel.h
│       ├── cl_egl.h
│       ├── cl_ext.h
│       ├── cl_ext_intel.h
│       ├── cl_gl.h
│       ├── cl_gl_ext.h
│       ├── cl_platform.h
│       ├── cl_va_api_media_sharing_intel.h
│       └── opencl.h
└── lib
    └── armeabi-v7a


Building OpenCV with OpenCL support.

The most important change is to actually enable the OpenCL on Android in CMakeLists.txt:
+OCV_OPTION(WITH_OPENCL "Include OpenCL Runtime support" (NOT IOS AND NOT WINRT) )
I also had to disable some warning flags and unsupported compiler options in "OpenCVCompilerOptions.cmake"

Compiler Options.
Extra debugging.
CMake options.

Please also see the following blog which describes building OpenCV with OpenCL, although not the case when you don't have a ready-made SDK:

OpenCL without root.

Many vendors who ship the phones with the official Android (that passes the CTS, compatibility test suite) do not ship the OpenCL drivers. Some vendors (most Chinese ones like such as Xiaomi and Zuk, and also Sony), however, do ship the drivers. If you have root on your phone or are building a custom firmware, you can just take the binaries from the other device's ROM and that's it.

Dynamic linking banned by Google?

Currently it seems that one can still use mmap() and mprotect() to write their own dynamic library loader, but this might get patched in future because Google is looking into both security and control.

What could we do then?
In principle, we could develop an application that would take a bunch of ".so" libraries and produce a single object file (.o) containing the data and code from all the libraries, with all dynamic symbols resolved. In other words, write a static compile-time linker.

The ultimate obstacle to this approach would be if device vendors re-worked the driver architecture in such a way that the OpenCL frontend would not be loaded to the application address space but a separate server. If that happens, the only way forward would be using a device that ships the OpenCL driver or building a custom firmware.

Package all the relevant ".so" objects directly to your application APK.
Paths. For this prototype, I manually edited the paths to the libraries using a hex editor. This part can be automated, but it was good enough for the proof of concept stage.
Essentially packaging a good half of the other phone's firmware into your app :)

Tweaking the application makefiles.

I edited to specify the STL version and disable exceptions.

The fail.

OpenCL library loaded fine, but now we had two independent EGL/OpenGL contexts - one from GL and one from CL with no standard way of sharing textures between them.
(2020 edit) In retrospect, I could have hooked "open" routine to steal KGSL file descriptor from GL for CL, but still there might have been some globals shared between libraries.

Zero-copy memory sharing.

It should be observed that all modern SoCs have the shared RAM for the CPU, GPU and other units (such as the camera frontend). Moreover, on more recent models, there is guaranteed cache coherency between the units (typically maintained by the AXI bus). Android used the kernel-side API called ION to manage memory buffers shared between the devices.

It should be possible to map the GPU texture into the CPU memory using either the OpenGL extension or the underlying GraphicBuffer API which is a higher-level interface for ION. I may be looking into this option in future. For now, I've decided to use the phone that ships with the OpenCL driver.

One other direction to explore might be using the Java API. I was able to create a GraphicBuffer from Java using reflection .
Apparently John Carmack has managed to do it the other way round, using SurfaceTexture API ("SurfaceTexture -> Surface, pass the surface in an intent to the other process, turn that into an ANativeWindow -> EGLSurface") .

EDIT 2021:
You really should just use ImageReader class, that's it.

Further thoughts.

Sadly, it looks like there's not much we (the users/independent developers) can do to change the situation. Some projects like OpenCV indeed contain a lot of code that would be difficult to port to other languages, especially if you don't want to introduce additional bugs in the process.

Perhaps as a developer the best plan is to limit your app to the phones that support OpenCL out of the box or build a custom firmware for the phone to add the drivers. While this limits your app to a very narrow set of devices, it will allow to re-use existing OpenCL code and build a working prototype quickly which is crucial for many projects at the early stage.

This line is from the blog draft in 2017 before CLVK came out :)

I think an interesting direction to explore would be to create the custom OpenCL/CUDA driver and runtime that would generate code in GLSL (for OpenCL ES with Compute Shaders) or Metal.