CarpetIOHDF5

Erik Schnetter <schnetter@uni-tuebingen.de>
Christian D. Ott <cott@aei.mpg.de>
Thomas Radke <tradke@aei.mpg.de>

1 December 2004

Abstract

Thorn CarpetIOHDF5 provides HDF5-based output to the Carpet mesh refinement driver in Cactus. This document explains CarpetIOHDF5’s usage and contains a specification of the HDF5 file format that was adapted from John Shalf’s FlexIO library.

Contents

1 Introduction
2 CarpetIOHDF5 Parameters
3 Serial versus Parallel Output
4 Using the flesh I/O API to produce HDF5 output
5 Checkpointing & Recovery and Importing Data
6 CarpetIOHDF5 Utility Programs
 6.1 hdf5toascii_slicer
 6.2 hdf5_extract
7 Example Parameter File Excerpts
 7.1 Serial (unchunked) Output of Grid Variables
 7.2 Parallel (chunked) Output of Grid Variables
 7.3 Checkpointing & Recovery
 7.4 Importing Grid Variables via Filereader

1 Introduction

Having encountered various problems with the Carpet I/O thorn CarpetIOFlexIO and the underlying FlexIO library, Erik Schnetter decided to write this thorn CarpetIOHDF5 which bypasses any intermediate binary I/O layer and outputs in HDF51 file format directly.

CarpetIOHDF5 provides output for the Carpet Mesh Refinement driver within the Cactus Code. Christian D. Ott added a file reader (analogous to Erik Schnetter’s implementation present in CarpetIOFlexIO) as well as checkpoint/recovery functionality to CarpetIOHDF5. Thomas Radke has taken over maintainence of this I/O thorn and is continuously working on fixing known bugs and improving the code functionality and efficiency.

The CarpetIOHDF5 I/O method can output any type of CCTK grid variables (grid scalars, grid functions, and grid arrays of arbitrary dimension); data is written into separate files named "<varname>.h5". It implements both serial and full parallel I/O – data files can be written/read either by processor 0 only or by all processors. Such datafiles can be used for further postprocessing (eg. visualization with OpenDX or DataVault2 ) or fed back into Cactus via the filereader capabilities of thorn IOUtil.

This document aims at giving the user a first handle on how to use CarpetIOHDF5. It also documents the HDF5 file layout used.

2 CarpetIOHDF5 Parameters

Parameters to control the CarpetIOHDF5 I/O method are:

3 Serial versus Parallel Output

According to the ouptput mode parameter settings of (IO::out_mode, IO::out_unchunked,
IO::out_proc_every) of thorn IOUtil, thorn CarpetIOHDF5 will output distributed grid variables either

Unchunked means that an entire Cactus grid array (gathered across all processors) is stored in a single HDF5 dataset whereas chunked means that all the processor-local patches of this array are stored as separate HDF5 datasets (called chunks). Consequently, for unchunked data all interprocessor ghostzones are excluded from the output. In contrast, for chunked data the interprocessor ghostzones are included in the output.

When visualising chunked datasets, they probably need to be recombined for a global view on the data. This needs to be done within the visualisation tool (see also below), Cactus itself does not provide its own recombiner utility program for CarpetIOHDF5’s output files.

The default is to output distributed grid variables in parallel, each processor writing a file <varname>.file_<processor ID>.h5. The chunked/unchunked mode can also be set individually in a key/value option string (with the key out_unchunked and possible string values "true|false|yes|no") appended to a group/variable name in the out_vars parameter, eg.

  IOHDF5::out_vars = "wavetoy::phi{out_unchunked = ’true’}  grid::coordinates"

will cause the variable phi to be output into a single unchunked file whereas other variables will still be output into separate chunked files (assuming the output mode is left to its default). Grid scalars and DISTRIB = CONST grid arrays are always output as unchunked data on processor 0 only.

Parallel output in a parallel simulation will ensure maximum I/O performance. Note that changing the output mode to serial I/O might only be necessary if the data analysis and visualisation tools cannot deal with chunked output files. Cactus itself, as well as many of the tools to visualise Carpet HDF5 data (see http://www.cactuscode.org/Visualization), can process both chunked and unchunked data. For instance, to visualise parallel output datafiles with DataVault, you would just send all the individual files to the DV server: hdf5todv phi.file_*.h5. In OpenDX the ImportCarpetIOHDF5 module can be given any filename from the set of parallel chunked files; the module will determine the total number of files in the set automatically and read them all.

4 Using the flesh I/O API to produce HDF5 output

Periodic output of grid variables is usually specified via I/O parameters in the parameter file and then automatically triggered by the flesh scheduler at each iteration step after analysis. If output should also be triggered at a different time, one can do that from within an application thorn by invoking one of the CCTK_OutputVar*() I/O routines provided by the flesh I/O API (see chapter B8.2 “IO” in the Cactus Users Guide). In this case, the application thorn routine which calls CCTK_OutputVar*() must be scheduled in level mode.

It should be noted here that – due to a restriction in the naming scheme of objects in an HDF5 data file – CarpetIOHDF5 can output a given grid variable with given refinement level only once per timestep. Attempts of application thorns to trigger the output of the same variable multiple times during an iteration will result in a runtime warning and have no further effect. If output for a variable is required also for intermediate timesteps this can be achieved by calling CCTK_OutputVarAs*() with a different alias name; output for the same variable is then written into different HDF5 files based on the alias argument.

5 Checkpointing & Recovery and Importing Data

Thorn CarpetIOHDF5 can also be used to create HDF5 checkpoint files and to recover from such files later on. In addition it can read HDF5 datafiles back in using the generic filereader interface described in the thorn documentation of IOUtil.

Checkpoint routines are scheduled at several timebins so that you can save the current state of your simulation after the initial data phase, during evolution, or at termination. Checkpointing for thorn CarpetIOHDF5 is enabled by setting the parameter IOHDF5::checkpoint = "yes".

A recovery routine is registered with thorn IOUtil in order to restart a new simulation from a given HDF5 checkpoint. The very same recovery mechanism is used to implement a filereader functionality to feed back data into Cactus.

Checkpointing and recovery are controlled by corresponding checkpoint/recovery parameters of thorn IOUtil (for a description of these parameters please refer to this thorn’s documentation).

6 CarpetIOHDF5 Utility Programs

6.1 hdf5toascii_slicer

This utility program extracts 1D lines and 2D slices from 3D HDF5 datasets produced by CarpetIOHDF5 and outputs them in CarpetIOASCII format (suitable to be further processed by gnuplot).

The hdf5toascii_slicer program is contained in the src/utils/ subdirectory of thorn CarpetIOHDF5. It is built with

  make <configuration>-utils

where the executable ends up in the subdirectory exe/<configuration>/.

For details on how to use the hdf5toascii_slicer program, run it with no command-line options (or with the --help option).

hdf5toascii_slicer can be used on either chunked or unchunked data:

6.2 hdf5_extract

This utility program extracts selected datasets from any given HDF5 output file which may be useful when only certain parts (eg. a specific timestep) of large files are required (eg. for copying to some other location for further processing).

The hdf5_extract program is contained in the src/utils/ subdirectory of thorn CactusPUGHIO/IOHDF5. It is built with

  make <configuration>-utils

where the executable ends up in the subdirectory exe/<configuration>/.

7 Example Parameter File Excerpts

7.1 Serial (unchunked) Output of Grid Variables

  # how often to output and where output files should go  
  IO::out_every = 2  
  IO::out_dir   = "wavetoy-data"  
 
  # request output for wavetoy::psi at every other iteration for timelevel 0,  
  #                for wavetoy::phi every 4th iteration with timelevels 1 and 2  
  IOHDF5::out_vars = "wavetoy::phi{ out_every = 4 refinement_levels = { 1 2 } }  
                      wavetoy::psi"  
 
  # we want unchunked output  
  # (because the visualisation tool cannot deal with chunked data files)  
  IO::out_mode      = "onefile"  
  IO::out_unchunked = 1

7.2 Parallel (chunked) Output of Grid Variables

  # how often to output  
  IO::out_every = 2  
 
  # each processor writes to its own output directory  
  IOHDF5::out_dir = "wavetoy-data-proc%u"  
 
  # request output for wavetoy::psi at every other iteration for timelevel 0,  
  #                for wavetoy::phi every 4th iteration with timelevels 1 and 2  
  IOHDF5::out_vars = "wavetoy::phi{ out_every = 4 refinement_levels = { 1 2 } }  
                      wavetoy::psi"  
 
  # we want parallel chunked output (note that this already is the default)  
  IO::out_mode = "proc"

7.3 Checkpointing & Recovery

  # say how often we want to checkpoint, how many checkpoints should be kept,  
  # how the checkpoints should be named, and they should be written to  
  IO::checkpoint_every = 100  
  IO::checkpoint_keep  = 2  
  IO::checkpoint_file  = "wavetoy"  
  IO::checkpoint_dir   = "wavetoy-checkpoints"  
 
  # enable checkpointing for CarpetIOHDF5  
  IOHDF5::checkpoint = "yes"  
 
  #######################################################  
 
  # recover from the latest checkpoint found  
  IO::recover_file = "wavetoy"  
  IO::recover_dir  = "wavetoy-checkpoints"  
  IO::recover      = "auto"

7.4 Importing Grid Variables via Filereader