Difference between revisions of "GENIE experiments"

From SourceWiki
Jump to navigation Jump to search
Line 56: Line 56:
 
<code>*</code> the commands svn info and svn diff together in theory allow the recreation of any particular state of the genie source tree, whether a committed version or one containing user-changed code which wasn't in the repository at the time of doing the experiment. In practice, it would be silly to take GENIE into battle without using a fully-committed-to-svn version of the model, otherwise getting back to the same state in the future could be extremely difficult and messy even with the help of svn info and svn diff. If you're working with the GENIE trunk and not doing your own development work, it would be advisable to use a tuned tagged release of the model, which is set in stone and can always be reproduced.
 
<code>*</code> the commands svn info and svn diff together in theory allow the recreation of any particular state of the genie source tree, whether a committed version or one containing user-changed code which wasn't in the repository at the time of doing the experiment. In practice, it would be silly to take GENIE into battle without using a fully-committed-to-svn version of the model, otherwise getting back to the same state in the future could be extremely difficult and messy even with the help of svn info and svn diff. If you're working with the GENIE trunk and not doing your own development work, it would be advisable to use a tuned tagged release of the model, which is set in stone and can always be reproduced.
  
== creating and configuring GENIE experiments ==
+
== Creating and configuring GENIE experiments ==
  
 
It is assumed that you're adopting our proposed new genie/genie-experiments directory as the location for creating your compiled experiments (if not, you should be able to work out what needs changing in the details below, but don't come running to us when it all goes wrong...).  
 
It is assumed that you're adopting our proposed new genie/genie-experiments directory as the location for creating your compiled experiments (if not, you should be able to work out what needs changing in the details below, but don't come running to us when it all goes wrong...).  

Revision as of 20:55, 9 June 2009

N.B. the genie-experiments approach to configuring and running GENIE is currently only on the ENGAGE branch of the genie repository, and is probably of most use at the moment to people building portable experiments for the ALADDIN Launchpad. We will be merging these new features into the trunk in the near future. If you're getting tired waiting for this to happen, please email martin.johnson@uea.ac.uk and give me a hard time.

This page introduces a new way to configure and run traceable (i.e. reproduceable) GENIE experiments. We hope that this new method of approaching the organisation of GENIE files will make more sense to new users too.

ENOUGH WITH THE BLATHER! Just show me how to built an experiment

The old way

By convention, the normal place for GENIE to output data is a directory outside of the genie source tree, normally at the same level as the top-level genie directory, called genie_output. When an experiment with experiment id 'my_experiment' is compiled and run using the confusingly named genie_example.job script (which is traditionally used to compile and run all GENIE model runs), a directory genie_output/my_experiment is made, and directories made within it for each GENIE component model (e.g. embm) to write data into.

So far, this is all very sensible, but then things get a bit confusing. The genie_example.job script generates namelist files of runtime parameters from the xml config file (or goin files in the soon-to-be-deprecated old way of specifying model configuration). These are used by the compiled executable so that it knows what configuration settings to run the model with (i.e. the details of the experiment you're doing, where to find input files, where to write output, whether or not do do x, or y etc). These namelist files (e.g. data_EMBM) and the executable are copied into genie_output/my_experiment and then the model is run from within the my_experiment directory. So, the my_experiment directory is more than just genie_output - it's in fact most of what you need to run the experiment. With the exception of a bunch of input files (topography, forcings etc), which are generally found within the source tree of the model, the location and details of which are specified in the config file, everything you need to run the experiment without the presence of the genie source tree is contained within my_experiment, along with the output data produced by the model.

There's nothing at all insensible about most of this - having the compiled model executable and namelist parameters along with the model output is good for traceability (in theory, if nothing in the source tree has been changed since you ran the experiment, you should be able to navigate to genie_output/my_experiment and launch genie.exe, and exactly reproduce your previous results. In practice, it's rarely that simple and there are some things we can do to improve traceability; but the main problem with the above is the misleading name of the genie_output folder and the difficulty of having it outside the source tree. Therefore we have made a new directory in the genie tree, called genie-experiments, from which experiments can be configured, built and run with relative ease and better traceability.

genie-experiments

The motivation behind the changes outlined here is to i) make GENIE easier to configure and run for new users by making it more obvious what's what and ii) to facilitate the creation of full traceable and portable (i.e. source-tree-independent) compiled experiments, which can be run manually (i.e. by executing genie.exe) or through a front end (e.g. the ALADDIN launchpad). Thus we have come up with the following specification for the structure and content of a GENIE experiment directory, which is portable and traceable. It is as follows

my_experiment/             // the experiment directory, by default written to genie/genie-experiments
 |
 |-aladdin/                // this directory not strictly part of the specification, but essential to an 
 | |                       // experiment's use with the [[GENIE_ALADDIN2|ALADDIN launchpad]].
 | |-graphVarData.xml
 | |-mapVarData.xml        // see [[GENIE_ALADDIN2_varData_files]]
 |
 |-archive/                // directory containing information necessary to reproduce the experiment at a later date
 | |-definition.xml        // definition.xml containing the default values of all model parameters / settings
 | |-my_experiment.xml     // the user-defined experiment config used to generate the experiment
 | |-svn.info              // the result of doing svn info > svn.info in the top=level genie directory
 | |-svn.diff              // svn diff > svn.diff*
 | |-genie_example.job    
 | |-genie_gridsizes.txt
 | |-makefile.arc
 |
 |-<module_name>/          // e.g. 'embm' - directory for each component model containing output data (unchanged from  
 |                         // old genie_output directories). Can be empty if experiment compiled but not run
 |
 |-input/                  // input directory contains model inputs for each component and mirrors directory structure 
 | |                       // of genie source tree to location of input files. i.e.:
 | |-genie-<module_name>/
 |   |-data/
 |     |-input/
 |       |-<input files> 
 |
 |-data_<MODULE_NAME>     // e.g. data_EMBM, data_genie. Namelist files for experiment config 
 |                        // (unchanged from the old genie_output approach) as
 |
 |-genie.exe              // compiled genie exectuable
 |
 |-genie_wrapper.sh       // genie wrapper which logs output to ./genie.out and contains controls for ALADDIN2  
                          // (not essential to the specification of a genie-experiment, but useful)


* the commands svn info and svn diff together in theory allow the recreation of any particular state of the genie source tree, whether a committed version or one containing user-changed code which wasn't in the repository at the time of doing the experiment. In practice, it would be silly to take GENIE into battle without using a fully-committed-to-svn version of the model, otherwise getting back to the same state in the future could be extremely difficult and messy even with the help of svn info and svn diff. If you're working with the GENIE trunk and not doing your own development work, it would be advisable to use a tuned tagged release of the model, which is set in stone and can always be reproduced.

Creating and configuring GENIE experiments

It is assumed that you're adopting our proposed new genie/genie-experiments directory as the location for creating your compiled experiments (if not, you should be able to work out what needs changing in the details below, but don't come running to us when it all goes wrong...).

Summary

in genie/genie-experiments directory:

./build_experiment -f files/<file list> -c configs/<config file> (-m <mavVarData file> -g<graphVarData file> -n)

The optional -n command-line argument will create an experiment without recompiling the model. The optional mapVarData and graphVarData files are specific to ALADDIN, where they are used to interpret ascii output files to extract parameters to be visualised in ALADDIN during the experiment. See GENIE_ALADDIN2_varData_files

You will need

  1. a working version of the genie source code and its prerequisites
  2. an experiment you want to build - i.e. an xml config file with your parameters of interest set to the values you want (note if you're doing this for ALADDIN you may wish to 'expose' them for user-configuration and set a range of available values - see the ALADDIN documentation).
  3. knowledge of which data input files the experiment you want to do requires (see below).
  4. a terminal to run commands on
  5. a text editor

You will need to make sure the OUTROOT environment variable is set correctly in user.sh (note that user.sh notes tell us that RUNTIME_OUTDIR should always be "." - this will be useful later).

CODEDIR=/path/to/genie    // ~/genie by default
OUTROOT=${CODEDIR}/genie-experiments

and OUT_DIR in user.mak

GENIE_ROOT          = /path/to/genie
OUT_DIR             = $(GENIE_ROOT)/genie-experiments

Now, because we are creating a standalone experiment, we need to make sure that (i) all the necessary input files (topography, restart files, tracer initialisations, forcings etc) are in the correct places within the genie experiment structure (see specification) and (ii) that your xml config file is pointing GENIE to the correct place to look for them.

The first part of this is non-trivial - as well as knowing how many files of what type are required for a particular run/ setup, you also need to know which of a selection of files are most appropriate. This isn't obvious, or documented anywhere at the moment, but I hope in the future GENIE_input_files might contain what you need to know. For now, a good starting point is to look around in the genie/genie-<module-name>/data/input directories and see what you can see. To make life easier, there are a number of file lists provided for 'standard' model setups in genie-experiments/files. If you need to make a new one for a particular config / model setup, we'd be grateful if you'd add a filelist with a self-explanatory filename into the files directory and document it in GENIE_input_files if appropriate.

These filelists are key to constructing a standalone experiment and are passed to build_experiment.sh along with the experiment config file.

build_experiment.sh

The build_experiment.sh script takes files in the file list e.g. genie-goldstein/data/input/worap2.psiles. build_experiment.sh