## PUGH

Date

Abstract

The default unigrid driver for Cactus for both multiprocessor and single process runs, handling grid variables and communications.

### 1 Description

PUGH can create, handle and communicate grid scalars, arrays and functions in 1, 2 or 3-dimensions.

### 2 Compilation

PUGH can be compiled with or without MPI. Compiling without MPI results in an executable which can only be used on a single processor, compiling with MPI leads to an executable which can be used with either single or multiple processors. (Section 6 describes how you can tell if your executable has been compiled with or without MPI).

For conﬁguring with MPI, see the Cactus User’s Guide.

### 3 Grid Size

The number of grid points used for a simulation can be set in PUGH either globally (that is, the total number of points across all processors), or locally (that is, the number of points on each processor).

To set the global size of a N-D grid to be 40 grid points in each direction use

PUGH::global_nsize = 40

To set the global size of a 2D grid to be $40×20$ use

PUGH::global_nx = 40
PUGH::global_ny = 20

To set the local size of a 2D grid to be $40×20$ on each processor, use

PUGH::local_nx = 40
PUGH::local_ny = 20

### 4 Periodic Boundary Conditions

PUGH can implement periodic boundary conditions during the synchronization of grid functions. Although this may at ﬁrst seem a little confusing, and unlike the usual use of boundary conditions which are directly called from evolution routines, it is the most eﬃcient and natural place for periodic boundary conditions.

PUGH applies periodic conditions by simply communicating the appropriate ghostzones between “end” processors. For example, for a 1D domain with two ghostzones, split across two processors, Figure 1 shows the implementation of periodic boundary conditions.

Periodic boundary conditions are applied to all grid functions, by default they are applied in all directions, although this behaviour can be customised to switch them oﬀ in given directions.

By default, no periodic boundary conditions are applied. To apply periodic boundary conditions in all directions, set

PUGH::periodic = "yes"

To apply periodic boundary conditions in just the x- and y- directions in a 3 dimensional domain, use

PUGH::periodic = "yes"
PUGH::periodic_z = "no"

### 5 Processor Decomposition

By default PUGH will distribute the computational grid evenly across all processors (as in Figure 2a). This may not be eﬃcient if there is a diﬀerent computational load on diﬀerent processors, or for example for a simulation distributed across processors with diﬀerent per-processor performance.

The computational grid can be manually partitioned in each direction in a regularly way as in Figure 2b.

The computational grid can be manually distributed using PUGH’s string parameters partition_[1d_x|2d_x|2d_y|3d_x|3d_y|3d_z]. To manually specify the load distribution, set PUGH::partition = "manual" and then, depending on the grid dimension, set the remaining parameters to distribute the load in each direction. Note that for this you need to know apriori the processor decomposition.

The decomposition is easiest to explain with a simple example: to distribute a 30-cubed grid across 4 processors (decomposed as $2×1×2$, with processors 0 and 2 performing twice as fast as processors 1 and 3) as:

 proc 2: $20×30×15$ proc 3: $10×30×15$ proc 0: $20×30×15$ proc 1: $10×30×15$

you would use the following topology and partition parameter settings:

# the overall grid size
PUGH::global_nsize = 30

# processor topology
PUGH::processor_topology      = manual
PUGH::processor_topology_3d_x = 2
PUGH::processor_topology_3d_y = 1
PUGH::processor_topology_3d_z = 2     # redundant

# grid partitioning
PUGH::partition      = "manual"
PUGH::partition_3d_x = "20 10"

Each partition parameter lists the number of grid points for every processor in that direction, with the numbers delimited by any non-digit characters. Note that an empty string for a direction (which is the default value for the partition parameters) will apply the automatic distribution. That’s why it is not necessary to set PUGH::partition_3d_y = "30" or PUGH::partition_3d_z = "15 15" in the parameter ﬁle.

Because the previous automatic distribution gave problems in some cases (e.g. very long box in one, but short in other directions), there is now an improved algorithm that tries to do a better job in decomposing the grid evenly to the processors. However, it can fail in certain situations, in which it is gracefully falling back to the previous ("automatic_old") giving a warning. Note that, if one or more of the parameters PUGH::processor_topology_3d_* or PUGH::partition_3d_* are set, this mode automatically falls back to "automatic_old" without warning.

### 6 Understanding PUGH Output

PUGH reports information about the processor decomposition to standard output at the start of a job. This section describes how to interpret that output.

Single Processor (no MPI)

• Type of evolution

If an executable has been compiled for only single processor use (without MPI), the ﬁrst thing which PUGH reports is this fact:

INFO (PUGH): Single processor evolution

Multiple Processor (with MPI)

• Type of evolution

If an executable has been compiled using MPI, the ﬁrst thing which PUGH reports is this fact, together with the number of processors being used:

INFO (PUGH): MPI Evolution on 3 processors