## OpenCL

May 9 2012

Abstract

OpenCL is a programming standard for heterogeneous systems, i.e. for programming CPUs, GPUs, and other types of accelerators. OpenCL is implemented as a library, and OpenCL codes are compiled at run time by passing OpenCL routines, as strings, to the OpenCL library. This is diﬀerent e.g. from CUDA, which is implemented as a language such as C or C++.

This thorn OpenCL provides the conﬁguration bits that ensure that Cactus applications can use OpenCL libraries.

### 1 Introduction

OpenCL describes itself as:

OpenCL is the ﬁrst open, royalty-free standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. OpenCL (Open Computing Language) greatly improves speed and responsiveness for a wide spectrum of applications in numerous market categories from gaming and entertainment to scientiﬁc and medical software.

### 2 Availability

There seem to be four OpenCL implementations available at this time. Unfortunately, they each have their drawbacks:

AMD
Available at http://developer.amd.com/zones/openclzone/pages/default.aspx. This supports both CPUs and ATI GPUs. Unfortunately, the OpenCL compiler seems to produce code with a low quality.
Apple
Included with the operating system, available by default. This supports both CPU and GPU. The compiler is based on LLVM. Unfortunately, there seem to be serious bugs – for example, I can’t get the $cos$ function to provide correct results.
Intel
Available at http://software.intel.com/en-_us/articles/opencl-_sdk/. This supports only (Intel?) CPUs. The compiler is based on LLVM, and the implementation is also based on Intel’s TBB (Threading Building Blocks).
Nvidia
Available at http://developer.nvidia.com/opencl, included in their CUDA distribution. This supports only GPUs.
pocl
Open source, available at https://launchpad.net/pocl. This OpenCL implementation has not yet been released (current version is 0.6), and is based on LLVM.

In addition, Wikipedia http://en.wikipedia.org/wiki/OpenCL lists two IBM implementations for their Power processor and for Intel compatible CPUs, respectively. The latter may be identical with or similar to AMD’s implementation.

Since OpenCL can run on CPUs, good OpenCL implementation are available at no cost for virtually all platforms.

It is possible to install several OpenCL implementations (platforms) at the same time, to build against any one of them, and then to choose at run time which devices from which platforms to use. For example, it is possible to build an application using the Intel implementation, and then at run time use the Nvidia platform to access a GPU (assuming that both Intel and Nvidia implementations are installed). On Unix, this is implemented via a system-wide conﬁguration directory /etc/OpenCL/vendors that lists all OpenCL platforms that will be available at run time.

### 3 OpenCL Programming

OpenCL is very similar to C. However, it diﬀers from C in several key aspects:

• much smaller run-time library, consisting mostly of mathematical functions (such as sqrt) and printf;
• built-in support for ﬁne-grained and coarse-grainded multi-threading;
• built-in support for vectorisation.

Given this, it is not possible to write a whole application in OpenCL. Instead, only the expensive parts (so-called compute kernels) are written in OpenCL, and are launched e.g. from C or C++.

In addition, the hardware architecture of GPUs and other accelerators diﬀers from CPUs in one key aspect:

• memory is separate from the host (regular CPU) memory.

That means that one has to explicitly copy data between the host memory and the device memory before and/or after calling compute kernels.

### 4 OpenCL Programming in Cactus

Cactus supports OpenCL programming at several levels. At the lowest level, one can use this thorn OpenCL directly. While this works ﬁne, it is somewhat tedious because one has to write a certain amount of boilerplate code to detect and initialise the device, to copy data between host and device, and to build and run compute kernels.

Since OpenCL is implemented as a library, the ﬂesh knows only little about OpenCL. For example, there are no conﬁguration options to spedify an OpenCL compiler, since code is compiled at run time via a library call to which the source code is passed as string. There is, however, one way in which the ﬂesh supports OpenCL: Files with a .cl suﬃx are converted into a string and placed into the executable. These strings have the type char const * in C, and can be accessed at run time under a (globally visible) name OpenCL_source_THORN_FILE, where THORN and FILE and are the thorn name and ﬁle name, respectively. (This is also explained in the users’ guide.)

### 5 High-Level OpenCL Programming in Cactus

Cactus also oﬀers a higher-level way of OpenCL programming, implemented in the thorns OpenCLRunTime and Accelerator.

Thorn OpenCLRunTime provides a convenient function for executing OpenCL code. This function expects, as input, a string containing the OpenCL kernel code, and then calls this code. Lower-level tasks such as identifying available compute devices, initialising them, compiling the kernel (once, and then remembering it), and handling arguments and parameters are taken care of automatically. Details are described in this thorn’s documentation.

Thorn Accelerator simpliﬁes memory management for GPUs and other types of devices. One declares in the thorn’s schedule which routines read and write what variables, and Accelerator then keeps track which variables need to be copied at what time. It keeps track where (host and/or device) a variable has valid values, and copies data only when necessary, taking time level cycling, synchronisation, and I/O into account. Details are described in that thorn’s documentation.

Implements:

opencl

### 8 Schedule

This section lists all the variables which are assigned storage by thorn ExternalLibraries/OpenCL. Storage can either last for the duration of the run (Always means that if this thorn is activated storage will be assigned, Conditional means that if this thorn is activated storage will be assigned for the duration of the run if some condition is met), or can be turned on for the duration of a schedule function.

NONE

#### Scheduled Functions

CCTK_WRAGH

opencl_printinfo

print opencl system information

 Language: c Type: function