WaveToyOpenCL

Erik Schnetter <eschnetter@perimeterinstitute.ca>

May 15, 2012

Abstract

This thorn implements WaveToy, solving the scalar wave equation (in a Euclidean, i.e. trivial geometry). The thorn is implemented in OpenCL, with some wrapper code in C++.

1 Introduction

This thorn WavetoyOpenCL solves the scalar wave equation, the same equation solved in thorn WaveToy and its companions written in other languages. Its major purpose is to serve as high-level example of using OpenCL in Cactus. It is purposefully written to be simple and easy to understand; for example, there are no parameters to choose different types of initial or boundary conditions.

2 Thorn Structure

We assume the reader is familiar with the structure of a Cactus thorn written e.g. in C or C++. An OpenCL thorn is slightly more complex because it (1) has to describe when and what data are moved between host and device, and (2) Cactus does not (yet?) support calling OpenCL code directly; some boilerplate code is necessary.

2.1 Schedule Declarations

Thorn WaveToyOpenCL relies on thorn Accelerator to handle data movement between host and device. This does not need to be managed explicitly; instead, the file schedule.ccl describes which routines are executed where (host or device), and which variables or groups are read or written.

The location where a scheduled routine is ultimately executed needs to be described in a Device= schedule tag. The set of variables that are read and/or written needs to be declared in READS and WRITES schedule statements. For example, this is the schedule item for the evolution routine of thorn WaveToyOpenCL:

SCHEDULE WaveToyOpenCL_Evol AT evol
{
  LANG:   C
  TAGS:   Device=1
  WRITES: WaveToyOpenCL::Scalar
} "Evolve scalar wave"

This indicates that this routine executes on the device, i.e. its kernel is implemented in OpenCL. Note that, in OpenCL, both CPU and GPU count as devices (thus every routine written in OpenCL counts as executing on a device, even if the device happens to be the CPU).

This also indicates that this routine writes (i.e. defines) the grid function group Scalar, without looking at (the current timelevel of) this group.

2.2 Schedule Routines

Executing OpenCL code requires some boilerplate: One needs to choose an OpenCL platform and device, needs to compile the code (from a C string), needs to pass in arguments, and finally needs to execute the actual kernel code. Thorn OpenCLRunTime provides a simple helper routine for these tasks that can be used e.g. as follows:

  char const *const groups[] = {
    "WaveToyOpenCL::Scalar",
    NULL};

  int const imin[] = {cctk_nghostzones[0],
                      cctk_nghostzones[1],
                      cctk_nghostzones[2]};
  int const imax[] = {cctk_lsh[0] - cctk_nghostzones[0],
                      cctk_lsh[1] - cctk_nghostzones[1],
                      cctk_lsh[2] - cctk_nghostzones[2]};

  static struct OpenCLKernel *kernel = NULL;
  char const *const sources[] = {"", OpenCL_source_WaveToyOpenCL_evol, NULL};
  OpenCLRunTime_CallKernel(cctkGH, CCTK_THORNSTRING, "evol",
                           sources, groups, NULL, NULL, NULL, -1,
                           imin, imax, &kernel);

The function OpenCLRunTime_CallKernel performs the following steps:

Choose a platform and device, compile the kernel code, and memoise (remember) the kernel for the next call
Pass a set of grid functions to the kernel routine
Parallelise the kernel over a certain set of grid points (this is similar e.g. to an OpenMP parallelisation, except that OpenCL devices may offer much more parallelism)
Call the kernel

Consequently, one needs to define the set of grid functions to be passed to the OpenCL kernel (groups, a C array terminated by NULL), needs to define the iteration bounds (imin and imax), and needs to provide the actual source code (sources, a C array terminated by NULL).

Note that the first element of sources contains declarations (it is empty here), while the second element contains the actual kernel code (see thorn OpenCLRunTime). See thorn OpenCL how the string OpenCL_source_WaveToyOpenCL_evol is generated from .cl files).

The actual kernel is contained in the file evol.cl, and should be readable with some C knowledge. LC_LOOP3 is a macro that parallelises a loop, similar to the macros provided by thorn LoopControl.

3 Parameters


verbose	Scope: private	BOOLEAN

Description: Output progress information

		Default: no


wavelength	Scope: private	REAL

Description: Wavelength of initial data

Range		Default: 1.0
:

4 Interfaces

General

Implements:

wavetoyopencl

Grid Variables

4.0.1 PRIVATE GROUPS


Group Names	Variable Names	Details

scalar		compact	0
	u	description	Scalar
		dimensions	3
		distribution	DEFAULT
		group type	GF
		timelevels	3
		variable type	REAL

Uses header:

OpenCLRunTime.h

5 Schedule

This section lists all the variables which are assigned storage by thorn CactusExamples/WaveToyOpenCL. Storage can either last for the duration of the run (Always means that if this thorn is activated storage will be assigned, Conditional means that if this thorn is activated storage will be assigned for the duration of the run if some condition is met), or can be turned on for the duration of a schedule function.

Storage

Always:
Scalar[3]

Scheduled Functions

CCTK_INITIAL

wavetoyopencl_init

initialise scalar wave

	Language:	c
	Tags:	device=1
	Type:	function
	Writes:	wavetoyopencl::scalar

CCTK_EVOL

wavetoyopencl_evol

evolve scalar wave

	Language:	c
	Tags:	device=1
	Type:	function
	Writes:	wavetoyopencl::scalar

CCTK_EVOL

wavetoyopencl_boundary

boundary conditions for scalar wave

	After:	wavetoyopencl_evol
	Language:	c
	Reads:	wavetoyopencl::scalar
	Sync:	scalar
	Tags:	device=1
	Type:	function
	Writes:	wavetoyopencl::scalar