GENIE experiments

From SourceWiki
Revision as of 13:24, 9 June 2009 by Genie-user (talk | contribs)
Jump to navigation Jump to search

N.B. the genie-experiments approach to configuring and running GENIE is currently only on the ENGAGE branch of the genie repository, and is probably of most use at the moment to people building portable experiments for the ALADDIN Launchpad. We will be merging these new features into the trunk in the near future. If you're getting tired waiting for this to happen, please email martin.johnson@uea.ac.uk and give me a hard time.

This page introduces a new way to configure and run traceable (i.e. reproduceable) GENIE experiments. We hope that this new method of approaching the organisation of GENIE files will make more sense to new users too.

ENOUGH WITH THE BLATHER! Just show me how to built an experiment

The old way

By convention, the normal place for GENIE to output data is a directory outside of the genie source tree, normally at the same level as the top-level genie directory, called genie_output. When an experiment with experiment id 'my_experiment' is compiled and run using the confusingly named genie_example.job script (which is traditionally used to compile and run all GENIE model runs), a directory genie_output/my_experiment is made, and directories made within it for each GENIE component model (e.g. embm) to write data into.

So far, this is all very sensible, but then things get a bit confusing. The genie_example.job script generates namelist files od runtime parameters from the xml config file (or goin files in the soon-to-be-deprecated old way of specifying model configuration). These are used by the compiled executable so that it knows what configuration settings to run the model with (i.e. the details of the experiment you're doing, where to find input files, where to write output, whether or not do do x, or y etc). These namelist files (e.g. data_EMBM) and the executable are copied into genie_output/my_experiment and then the model is run from within the my_experiment directory. So, the my_experiment directory is more than just genie_output - it's in fact most of what you need to run the experiment. With the exception of a bunch of input files (topography, forcings etc), which are generally found within the source tree of the model, the location and details of which are specified in the config file, everything you need to run the experiment without the presence of the genie source tree is contained within my_experiment, along with the output data produced by the model.

There's nothing at all insensible about most of this - having the compiled model executable and namelist parameters along with the model output is good for traceability (in theory, if nothing in the source tree has been changed since you ran the experiment, you should be able to navigate to genie_output/my_experiment and launch genie.exe, and exactly reproduce your previous results. In practice, it's rarely that simple and there are some things we can do to improve traceability; but the main problem with the above is the misleading name of the genie_output folder and the difficulty of having it outside the source tree. Therefore we have made a new directory in the genie tree, called genie-experiments, from which experiments can be configured, built and run with relative ease and better traceability.

genie-experiments

The motivation behind the changes outlined here is to i) make GENIE easier to configure and run for new users by making it more obvious what's what and ii) to facilitate the creation of full traceable and portable (i.e. source-tree-independent) compiled experiments, which can be run manually (i.e. by executing genie.exe) or through a front end (e.g. the ALADDIN launchpad). Thus we have come up with the following specification for the structure and content of a GENIE experiment directory, which is portable and traceable. It is as follows

my_experiment/             // the experiment directory, by default written to genie/genie-experiments
 |
 |-aladdin/                // this directory not strictly part of the specification, but essential to an 
 | |                       // experiment's use with the [[GENIE_ALADDIN2|ALADDIN launchpad]].
 | |-graphVarData.xml
 | |-mapVarData.xml        // see [[GENIE_ALADDIN2_varData_files]]
 |
 |-archive/                // directory containing information necessary to reproduce the experiment at a later date
 | |-definition.xml        // definition.xml containing the default values of all model parameters / settings
 | |-my_experiment.xml     // the user-defined experiment config used to generate the experiment
 | |-svn.info              // the result of doing svn info > svn.info in the top=level genie directory
 | |-svn.diff              // svn diff > svn.diff*
 | |-genie_example.job    
 | |-genie_gridsizes.txt
 | |-makefile.arc
 |
 |-<module_name>/          // e.g. 'embm' - directory for each component model containing output data (unchanged from  
 |                         // old genie_output directories). Can be empty if experiment compiled but not run
 |
 |-input/                  // input directory contains model inputs for each component and mirrors directory structure 
 | |                       // of genie source tree to location of input files. i.e.:
 | |-genie-<module_name>/
 |   |-data/
 |     |-input/
 |       |-<input files> 
 |
 |-data_<MODULE_NAME>     // e.g. data_EMBM, data_genie. Namelist files for experiment config 
 |                        // (unchanged from the old genie_output approach) as
 |
 |-genie.exe              // compiled genie exectuable
 |
 |-genie_wrapper.sh       // genie wrapper which logs output to ./genie.out and contains controls for ALADDIN2  
                          // (not essential to the specification of a genie-experiment, but useful)


* the commands svn info and svn diff together in theory allow the recreation of any particular state of the genie source tree, whether a committed version or one containing user-changed code which wasn't in the repository at the time of doing the experiment. In practice, it would be silly to take GENIE into battle without using a fully-committed-to-svn version of the model, otherwise getting back to the same state in the future could be extremely difficult and messy even with the help of svn info and svn diff. If you're working with the GENIE trunk and not doing your own development work, it would be advisable to use a tuned tagged release of the model, which is set in stone and can always be reproduced.

creating and configuring GENIE experiments

Here it is assumed that you're adopting our proposed new genie/genie-experiments directory as the location for creating your compiled experiments. You will need to make sure the OUTROOT environment variable is set correctly in user.sh (note that user.sh notes tell us that RUNTIME_OUTDIR shouls always be "." - this will be useful later).

...
CODEDIR=/path/to/genie    // ~/genie by default
OUTROOT=${CODEDIR}/genie-experiments
...

and OUT_DIR in user.mak

...
GENIE_ROOT          = /path/to/genie
OUT_DIR             = $(GENIE_ROOT)/genie-experiments
...

To create a compiled genie experiment, you will require a GENIE user-defined xml config file with the parameters set up as you want them. Note that module indir_name parameters must be set to look within the my_experiment/input folder instead of