WatchDog is thorn that terminates jobs that do not make progress over a user-defined time frame.
Internally, WatchDog is made of two parts. One updates a global timer every iteration at CCTK_ANALYSIS. The other one spawns a watcher thread that periodically checks if the timer has been updated. If the timer has not been updated for more than a user-defined time frame the thread calls “abort()” to terminate the process (and the job).
check_every | Scope: private | INT |
Description: Check that the run is progressing every so many seconds
| ||
Range | Default: 3600 | |
1:* | Any positive integer
| |
Implements:
watchdog
This section lists all the variables which are assigned storage by thorn CactusUtils/WatchDog. Storage can either last for the duration of the run (Always means that if this thorn is activated storage will be assigned, Conditional means that if this thorn is activated storage will be assigned for the duration of the run if some condition is met), or can be turned on for the duration of a schedule function.
NONE
CCTK_ANALYSIS
watchdog
make sure that the run is progressing
Language: | c | |
Type: | function | |