WaveToyOpenCL

Erik Schnetter <eschnetter@perimeterinstitute.ca>

May 15, 2012

Abstract

This thorn implements WaveToy, solving the scalar wave equation (in a Euclidean, i.e. trivial geometry). The thorn is implemented in OpenCL, with some wrapper code in C++.

1 Introduction

This thorn WavetoyOpenCL solves the scalar wave equation, the same equation solved in thorn WaveToy and its companions written in other languages. Its major purpose is to serve as high-level example of using OpenCL in Cactus. It is purposefully written to be simple and easy to understand; for example, there are no parameters to choose different types of initial or boundary conditions.

2 Thorn Structure

We assume the reader is familiar with the structure of a Cactus thorn written e.g. in C or C++. An OpenCL thorn is slightly more complex because it (1) has to describe when and what data are moved between host and device, and (2) Cactus does not (yet?) support calling OpenCL code directly; some boilerplate code is necessary.

2.1 Schedule Declarations

Thorn WaveToyOpenCL relies on thorn Accelerator to handle data movement between host and device. This does not need to be managed explicitly; instead, the file schedule.ccl describes which routines are executed where (host or device), and which variables or groups are read or written.

The location where a scheduled routine is ultimately executed needs to be described in a Device= schedule tag. The set of variables that are read and/or written needs to be declared in READS and WRITES schedule statements. For example, this is the schedule item for the evolution routine of thorn WaveToyOpenCL:

SCHEDULE WaveToyOpenCL_Evol AT evol  
{  
  LANG:   C  
  TAGS:   Device=1  
  WRITES: WaveToyOpenCL::Scalar  
} "Evolve scalar wave"

This indicates that this routine executes on the device, i.e. its kernel is implemented in OpenCL. Note that, in OpenCL, both CPU and GPU count as devices (thus every routine written in OpenCL counts as executing on a device, even if the device happens to be the CPU).

This also indicates that this routine writes (i.e. defines) the grid function group Scalar, without looking at (the current timelevel of) this group.

2.2 Schedule Routines

Executing OpenCL code requires some boilerplate: One needs to choose an OpenCL platform and device, needs to compile the code (from a C string), needs to pass in arguments, and finally needs to execute the actual kernel code. Thorn OpenCLRunTime provides a simple helper routine for these tasks that can be used e.g. as follows:

  char const *const groups[] = {  
    "WaveToyOpenCL::Scalar",  
    NULL};  
 
  int const imin[] = {cctk_nghostzones[0],  
                      cctk_nghostzones[1],  
                      cctk_nghostzones[2]};  
  int const imax[] = {cctk_lsh[0] - cctk_nghostzones[0],  
                      cctk_lsh[1] - cctk_nghostzones[1],  
                      cctk_lsh[2] - cctk_nghostzones[2]};  
 
  static struct OpenCLKernel *kernel = NULL;  
  char const *const sources[] = {"", OpenCL_source_WaveToyOpenCL_evol, NULL};  
  OpenCLRunTime_CallKernel(cctkGH, CCTK_THORNSTRING, "evol",  
                           sources, groups, NULL, NULL, NULL, -1,  
                           imin, imax, &kernel);

The function OpenCLRunTime_CallKernel performs the following steps:

  1. Choose a platform and device, compile the kernel code, and memoise (remember) the kernel for the next call
  2. Pass a set of grid functions to the kernel routine
  3. Parallelise the kernel over a certain set of grid points (this is similar e.g. to an OpenMP parallelisation, except that OpenCL devices may offer much more parallelism)
  4. Call the kernel

Consequently, one needs to define the set of grid functions to be passed to the OpenCL kernel (groups, a C array terminated by NULL), needs to define the iteration bounds (imin and imax), and needs to provide the actual source code (sources, a C array terminated by NULL).

Note that the first element of sources contains declarations (it is empty here), while the second element contains the actual kernel code (see thorn OpenCLRunTime). See thorn OpenCL how the string OpenCL_source_WaveToyOpenCL_evol is generated from .cl files).

The actual kernel is contained in the file evol.cl, and should be readable with some C knowledge. LC_LOOP3 is a macro that parallelises a loop, similar to the macros provided by thorn LoopControl.

3 Parameters




verbose
Scope: private  BOOLEAN



Description: Output progress information



  Default: no






wavelength
Scope: private  REAL



Description: Wavelength of initial data



Range   Default: 1.0
*:*



4 Interfaces

General

Implements:

wavetoyopencl

Grid Variables

4.0.1 PRIVATE GROUPS




  Group Names    Variable Names    Details   




scalar   compact0
u   descriptionScalar
  dimensions3
  distributionDEFAULT
  group typeGF
  timelevels3
 variable typeREAL




Uses header:

OpenCLRunTime.h

5 Schedule

This section lists all the variables which are assigned storage by thorn CactusExamples/WaveToyOpenCL. Storage can either last for the duration of the run (Always means that if this thorn is activated storage will be assigned, Conditional means that if this thorn is activated storage will be assigned for the duration of the run if some condition is met), or can be turned on for the duration of a schedule function.

Storage

 

Always: 
Scalar[3]  
   

Scheduled Functions

CCTK_INITIAL

  wavetoyopencl_init

  initialise scalar wave

 

 Language:c
 Tags: device=1
 Type: function
 Writes: wavetoyopencl::scalar

CCTK_EVOL

  wavetoyopencl_evol

  evolve scalar wave

 

 Language:c
 Tags: device=1
 Type: function
 Writes: wavetoyopencl::scalar

CCTK_EVOL

  wavetoyopencl_boundary

  boundary conditions for scalar wave

 

 After: wavetoyopencl_evol
 Language:c
 Reads: wavetoyopencl::scalar
 Sync: scalar
 Tags: device=1
 Type: function
 Writes: wavetoyopencl::scalar