OpenCL

Erik Schnetter <eschnetter@perimeterinstitute.ca>

May 9 2012

Abstract

OpenCL is a programming standard for heterogeneous systems, i.e. for programming CPUs, GPUs, and other types of accelerators. OpenCL is implemented as a library, and OpenCL codes are compiled at run time by passing OpenCL routines, as strings, to the OpenCL library. This is different e.g. from CUDA, which is implemented as a language such as C or C++.

This thorn OpenCL provides the configuration bits that ensure that Cactus applications can use OpenCL libraries.

1 Introduction

OpenCL describes itself as:

OpenCL is the first open, royalty-free standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. OpenCL (Open Computing Language) greatly improves speed and responsiveness for a wide spectrum of applications in numerous market categories from gaming and entertainment to scientific and medical software.

More information is available at http://www.khronos.org/opencl/.

2 Availability

There seem to be four OpenCL implementations available at this time. Unfortunately, they each have their drawbacks:

AMD
Available at http://developer.amd.com/zones/openclzone/pages/default.aspx. This supports both CPUs and ATI GPUs. Unfortunately, the OpenCL compiler seems to produce code with a low quality.
Apple
Included with the operating system, available by default. This supports both CPU and GPU. The compiler is based on LLVM. Unfortunately, there seem to be serious bugs – for example, I can’t get the cos function to provide correct results.
Intel
Available at http://software.intel.com/en-_us/articles/opencl-_sdk/. This supports only (Intel?) CPUs. The compiler is based on LLVM, and the implementation is also based on Intel’s TBB (Threading Building Blocks).
Nvidia
Available at http://developer.nvidia.com/opencl, included in their CUDA distribution. This supports only GPUs.
pocl
Open source, available at https://launchpad.net/pocl. This OpenCL implementation has not yet been released (current version is 0.6), and is based on LLVM.

In addition, Wikipedia http://en.wikipedia.org/wiki/OpenCL lists two IBM implementations for their Power processor and for Intel compatible CPUs, respectively. The latter may be identical with or similar to AMD’s implementation.

Since OpenCL can run on CPUs, good OpenCL implementation are available at no cost for virtually all platforms.

It is possible to install several OpenCL implementations (platforms) at the same time, to build against any one of them, and then to choose at run time which devices from which platforms to use. For example, it is possible to build an application using the Intel implementation, and then at run time use the Nvidia platform to access a GPU (assuming that both Intel and Nvidia implementations are installed). On Unix, this is implemented via a system-wide configuration directory /etc/OpenCL/vendors that lists all OpenCL platforms that will be available at run time.

3 OpenCL Programming

OpenCL is very similar to C. However, it differs from C in several key aspects:

Given this, it is not possible to write a whole application in OpenCL. Instead, only the expensive parts (so-called compute kernels) are written in OpenCL, and are launched e.g. from C or C++.

In addition, the hardware architecture of GPUs and other accelerators differs from CPUs in one key aspect:

That means that one has to explicitly copy data between the host memory and the device memory before and/or after calling compute kernels.

4 OpenCL Programming in Cactus

Cactus supports OpenCL programming at several levels. At the lowest level, one can use this thorn OpenCL directly. While this works fine, it is somewhat tedious because one has to write a certain amount of boilerplate code to detect and initialise the device, to copy data between host and device, and to build and run compute kernels.

Since OpenCL is implemented as a library, the flesh knows only little about OpenCL. For example, there are no configuration options to spedify an OpenCL compiler, since code is compiled at run time via a library call to