How to run ESTEL in parallel on a cluster

From SourceWiki
Revision as of 11:24, 11 September 2007 by Jprenaud (talk | contribs)
Jump to navigation Jump to search


This article describes how to run parallel jobs in ESTEL on HPC clusters.

Beowulf clusters are real high performance facilities such as Blue Crystal. If you plan to run ESTEL on a network of workstations instead, use this article about networks of workstations.

Pre-requesites

  • TELEMAC system installed and configured for MPI.
  • PBS queuing system
  • PATH for fortran compiler...

Submitting a job

When the setup is done, it is quite easy to . A script exists in the /path/to/systel90/bin/ directory wehich submits a TELEMAC job to the PBS queue.

$ qsub-telemac jobname nbnodes walltime code case

where:

  • jobname
  • nbnodes
  • walltime
  • code
  • case

For instance, for ESTEL-3D one could use:

$ qsub-telemac test 12 10:00:00 estel3d cas

This would submit a job on 12 processors with a walltime of 10 hours to run a case named "cas" with ESTEL-3D.

Note that the script is clever enough to adjust the number of parallel processors in the steering file automatically to match the argument nbnodes. However, the keyword PARALLEL PROCESSORS needs to be in the steering file. Only the value is adjusted.

Limitations

All these limitations are being worked on:

  • Cannot add the keyword PARALLEL PROCESSORS if not present
  • Cannot deal with multiple processors per node
  • Cannot generate a list of the processes to kill on each processor in case of crash