GENIE experiments

From SourceWiki
Jump to navigation Jump to search

This page introduces a new way to configure and run traceable (i.e. reproduceable) GENIE experiments. We hope that this new method of approaching the organisation of GENIE files will make more sense to new users too.

ENOUGH WITH THE BLATHER! Just show me how to built an experiment

The old way

By convention, the normal place for GENIE to output data is a directory outside of the genie source tree, normally at the same level as the top-level genie directory, called genie_output. When an experiment with experiment id 'my_experiment' is compiled and run using the confusingly named genie_example.job script (which is traditionally used to compile and run all GENIE model runs), a directory genie_output/my_experiment is made, and directories made within it for each GENIE component model (e.g. embm) to write data into.

So far, this is all very sensible, but then things get a bit confusing. The genie_example.job script generates namelist files of runtime parameters from the xml config file (or goin files in the soon-to-be-deprecated old way of specifying model configuration). These are used by the compiled executable so that it knows what configuration settings to run the model with (i.e. the details of the experiment you're doing, where to find input files, where to write output, whether or not do do x, or y etc). These namelist files (e.g. data_EMBM) and the executable are copied into genie_output/my_experiment and then the model is run from within the my_experiment directory. So, the my_experiment directory is more than just genie_output - it's in fact most of what you need to run the experiment. With the exception of a bunch of input files (topography, forcings etc), which are generally found within the source tree of the model, the location and details of which are specified in the config file, everything you need to run the experiment without the presence of the genie source tree is contained within my_experiment, along with the output data produced by the model.

There's nothing at all insensible about most of this - having the compiled model executable and namelist parameters along with the model output is good for traceability (in theory, if nothing in the source tree has been changed since you ran the experiment, you should be able to navigate to genie_output/my_experiment and launch genie.exe, and exactly reproduce your previous results. In practice, it's rarely that simple and there are some things we can do to improve traceability; but the main problem with the above is the misleading name of the genie_output folder and the difficulty of having it outside the source tree. Therefore we have made a new directory in the genie tree, called genie-experiments, from which experiments can be configured, built and run with relative ease and better traceability.

GENIE experiments

The motivation behind the changes outlined here is to i) make GENIE easier to configure and run for new users by making it more obvious what's what and ii) to facilitate the creation of full traceable and portable (i.e. source-tree-independent) compiled experiments, which can be run manually (i.e. by executing genie.exe) or through a front end (e.g. the ALADDIN launchpad). Thus we have come up with the following specification for the structure and content of a GENIE experiment directory, which is portable and traceable. It is as follows

my_experiment/             // the experiment directory, by default written to genie/genie-experiments
 |
 |-aladdin/                // this directory not strictly part of the specification, but essential to an 
 | |                       // experiment's use with the [[GENIE_ALADDIN2|ALADDIN launchpad]].
 | |-graphVarData.xml
 | |-mapVarData.xml        // see [[GENIE_ALADDIN2_varData_files]]
 |
 |-archive/                // directory containing information necessary to reproduce the experiment at a later date
 | |-definition.xml        // definition.xml containing the default values of all model parameters / settings
 | |-my_experiment.xml     // the user-defined experiment config used to generate the experiment
 | |-svn.info              // the result of doing svn info > svn.info in the top=level genie directory
 | |-svn.diff              // svn diff > svn.diff*
 | |-genie_example.job    
 | |-genie_gridsizes.txt
 | |-makefile.arc
 |
 |-<module_name>/          // e.g. 'embm' - directory for each component model containing output data (unchanged from  
 |                         // old genie_output directories). Can be empty if experiment compiled but not run
 |
 |-input/                  // input directory contains model inputs for each component and mirrors directory structure 
 | |                       // of genie source tree to location of input files. i.e.:
 | |-genie-<module_name>/
 |   |-data/
 |     |-input/
 |       |-<input files> 
 |
 |-data_<MODULE_NAME>     // e.g. data_EMBM, data_genie. Namelist files for experiment config 
 |                        // (unchanged from the old genie_output approach) as
 |
 |-genie.exe              // compiled genie exectuable
 |
 |-genie_wrapper.sh       // genie wrapper which logs output to ./genie.out and contains controls for ALADDIN2  
                          // (not essential to the specification of a genie-experiment, but useful)


* the commands svn info and svn diff together in theory allow the recreation of any particular state of the genie source tree, whether a committed version or one containing user-changed code which wasn't in the repository at the time of doing the experiment. In practice, it would be silly to take GENIE into battle without using a fully-committed-to-svn version of the model, otherwise getting back to the same state in the future could be extremely difficult and messy even with the help of svn info and svn diff. If you're working with the GENIE trunk and not doing your own development work, it would be advisable to use a tuned tagged release of the model, which is set in stone and can always be reproduced.

Creating and configuring GENIE experiments

It is assumed that you're adopting our proposed new genie/genie-experiments directory as the location for creating your compiled experiments (if not, you should be able to work out what needs changing in the details below, but don't come running to us when it all goes wrong...).

Summary

in genie/genie-experiments directory:

./build_experiment -f files/<file list> -c configs/<config file> (-m <mavVarData file> -g<graphVarData file> -n)

The optional -n command-line argument will create an experiment without recompiling the model. The optional mapVarData and graphVarData files are specific to ALADDIN, where they are used to interpret ascii output files to extract parameters to be visualised in ALADDIN during the experiment. See GENIE_ALADDIN2_varData_files

Howto

You will need:

  1. a working version of the genie source code and its prerequisites
  2. an experiment you want to build - i.e. an xml config file with your parameters of interest set to the values you want (note if you're doing this for ALADDIN you may wish to 'expose' them for user-configuration and set a range of available values - see the ALADDIN documentation).
  3. knowledge of which data input files the experiment you want to do requires (see below).
  4. a terminal to run commands on
  5. a text editor

You will also need to make sure the OUTROOT and RUNTIME_ROOT environment variables are set correctly in user.sh:

CODEDIR             =/path/to/genie    // ~/genie by default
OUTROOT             =${CODEDIR}/genie-experiments
RUNTIME_ROOT        = ${RUNTIME_ROOT:=../..}

and also OUT_DIR and RUNTIME_ROOT in user.mak:

GENIE_ROOT          = /path/to/genie
OUT_DIR             = $(GENIE_ROOT)/genie-experiments
RUNTIME_ROOT        = ../..

File lists

Now, because we are creating a standalone experiment, we need to make sure that (i) all the necessary input files (topography, restart files, tracer initialisations, forcings etc) are in the correct places within the genie experiment structure (see specification) and (ii) that your xml config file is pointing GENIE to the correct place to look for them.

The first part of this is non-trivial - as well as knowing how many files of what type are required for a particular run/ setup, you also need to know which of a selection of files are most appropriate. This isn't obvious, or documented anywhere at the moment, but I hope in the future GENIE_input_files might contain what you need to know. For now, a good starting point is to look around in the genie/genie-<module-name>/data/input directories and see what you can see. Also trial and error works OK - GENIE is pretty good at telling you when it's missing a file and the correct files can be diagnosed from this.

To make life easier, there are a number of file lists provided for 'standard' model setups in genie-experiments/files. If you need to make a new one for a particular config / model setup, we'd be grateful if you'd add a filelist with a self-explanatory filename into the files directory and document it in GENIE_input_files if appropriate.

These filelists must contain soleley a list of files required and all of the files must be situated within the genie source tree, preferably in their default locations, so that minimum changes are required to experiment config file to point GENIE to the correct place in the experiment directory when these files have been copied across by build_experiment.sh.

e.g. a filelist for a standard 8 level eb_go_gs_el run might look like this:


genie-embm/data/input/worbe2.k1
genie-embm/data/input/taux_u.interp
genie-embm/data/input/taux_v.interp
genie-embm/data/input/tauy_u.interp
genie-embm/data/input/tauy_v.interp
genie-embm/data/input/uncep.silo
genie-embm/data/input/vncep.silo
genie-embm/data/input/ta_ncep.silo
genie-embm/data/input/qa_ncep.silo
genie-goldstein/data/input/worbe2.k1
genie-goldstein/data/input/worbe2.psiles
genie-goldstein/data/input/worbe2.paths
genie-goldstein/data/input/tempann.silo
genie-goldstein/data/input/saliann.silo
genie-goldsteinseaice/data/input/worbe2.k1
genie-ents/data/k_constants.dat
genie-ents/config/ents_config.par
genie-ents/data/atm_albedo_monthly.dat
genie-ents/data/uvic_windx.silo
genie-ents/data/uvic_windy.silo
genie-ents/data/monthly_windspd.silo
genie-ents/data/inv_linterp_matrix.dat
genie-ents/data/orography.dat
genie-ents/data/icemask.dat
genie-ents/config/sealevel_config.par

input (and output) directory information in my_experiment.xml

In definitions.xml, the default settings for genie, which are overridden by those specified in your experiment config file, default directory locations are specified, e.g. for the EMBM:

   <param name="indir_name">
     <value datatype="string"><varref>RUNTIME_ROOT</varref><sep/>genie-embm<sep/>data<sep/>input</value>
   </param>

   <param name="rstdir_name">
     <value datatype="string"><varref>RUNTIME_ROOT</varref><sep/>genie-embm<sep/>data<sep/>input</value>
     <description>restart (input) directory</description>
   </param>

   <param name="outdir_name">
     <value datatype="string"><varref>RUNTIME_OUTDIR</varref><sep/>embm</value>
   </param>

Note that the output directory for EMBM is already set correctly to meet with the specification of a GENIE experiment. However, it could be simplified by replacing '<varref>RUNTIME_OUTDIR</varref>' with '.' as genie.exe is executed from within the experiment directory.

The input directories however have to be changed to point at the input directory of the experiment directory. Therefore, in the embm section of your config file, you will need to override the default values:

   <param name="indir_name">.<sep/>input<sep/>genie-embm<sep/>data<sep/>input</param>
   <param name="rstdir_name">.<sep/>input<sep/>genie-embm<sep/>data<sep/>input</param>

build_experiment.sh

The build_experiment.sh script does the following:

  1. creates an experiment directory within genie-experiments and input and aladdin subdirectories
  2. copies user-specified (by -m and -g) or default mapVarData and graphVarData files to the ALADDIN directory
  3. copies input files specified in the file list (as <path-to-file>) from genie/<path-to-file> to my_experiment/input/<path-to-file>
  4. copies genie_wrapper.sh to my_experiment
  5. copies config file to genie-main/configs
  6. changes directory to genie-main and runs make cleanall (unless -n option specified)
  7. executes genie_example.job -f <configfile> -x which does most of the work

note that the -x command-line argument to genie_example.job supresses the execution of the model i.e. compilation and file copying to experiment directory only.

Your GENIE experiment can now be moved to anywhere and run from there (by ./genie_wrapper.sh for logged output or ./genie.exe for command-line output) or used with ALADDIN, independent from the genie source tree. Yay!