Fortran2

From SourceWiki
Revision as of 16:16, 7 March 2011 by GethinWilliams (talk | contribs) (→‎To go further)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

'Fortran2: Getting the most from Fortran'

Following On

These notes follow on from the Fortran1 course. You may want to look back at those notes and examples, or if you're feeling confident, just dive in here!

To get the example programs login in to a Linux box and type:

svn co https://svn.ggy.bris.ac.uk/subversion-open/fortran2/trunk fortran2

Allocatable Arrays

We'll begin by looking at allocatable arrays.

Previously, we declared the size of our arrays when we wrote our code. This is all well and good, and at least we knew where we stood. However, this exact specification of the shape of all things can become a painful limitation. Suppose you were writing a program to process a list of University employees. The number of employees will almost certainly vary from one month to the next. So, we can only know the number of records to read in at runtime. This sort of thing starts to crop up a lot when we aim to write more general-purpose, flexible & robust programs.

Fear not! however, as trusty Fortran90 is here to help. In Fortran90 we can write array declarations of the form:

<type>,allocatable,dimension(:) :: my1dAllocatableArray  ! a 1D array of type <type>, 
                                                         ! which I will give a size to later

Let's go to an example:

cd fortran2/examples/example1

and take a look inside the file allocatable.f90.

In this example, we have a 2D grid for data on a sphere and corresponding arrays of longitude and latitude coordinates. The sizes of all these arrays are unknown at compile-time and will be read from a namelist file at runtime. We will then allocate the arrays using an appropriate amount of space, fill in the coordinate values according to the number of grid-cells specified and using the assumption that the the grid covers 360 degrees in longitude and 180 degrees in latitude. Our arrays are thus declared with the allocatable attribute:

  real,allocatable,dimension(:)   :: x_coords          ! x-coordinate values
  real,allocatable,dimension(:)   :: y_coords          ! y-coordinate values
  real,allocatable,dimension(:,:) :: data              ! 2D data on grid

We've covered namelists before, so reading the values of the variables nx and ny is straightforward. Once we know these values we can request memory space for our arrays. Note that it is an excellent idea to check whether this request was satisfied. If it were not for some reason (if, for example, you request a huge amount of memory and your computer is unable to oblige) we most likely want the program to stop, lest bad things happen!

  allocate(x_coords(nx),stat=errstat)
  if (errstat /= 0) then
     write (*,*) 'ERROR: could not allocate x_coords array'
     stop
  endif

Next is a spot of arithmetic to fill out the coordinate arrays. We want cell-centre coordinates, with longitude running from 0 to 360 degrees and latitude from -90 to +90 degrees.

So we're done? Almost, but not quite! It is vital that once you're finished with it, you clear up any memory that you allocated earlier. If you don't, bad things will happen sooner or later. So the last few lines look like:

  if(allocated(data))       deallocate(data,stat=errstat)
  if (errstat /= 0) then
     write (*,*) 'ERROR: could not dellocate data array'
     stop
  endif 
...

Here, we've been smarter than the average bear and we've checked to see if an array has been allocated before deallocating it. If we were to try to deallocate memory that had not in fact been allocated, our program would crash with the following runtime error:

Fortran runtime error: Internal: Attempt to DEALLOCATE unallocated memory.

So, it's a very good idea to check and so avoid this kind of bug. We've also included a check to make sure that the memory was actually deallocated, as requested.

Exercises

So there we are. Sorted. Allocatable arrays.

  • Try varying the values in input.nml first,
  • and then proceed to adding in a few allocatable arrays of your own.
  • If you have doxgen version 1.5.5 or higher installed (doxygen --version), try typing make doc and look in the html or latex directories.

Assumed Shape Arrays and Optional Arguments

Now, the eagle-eyed among you will have inferred that if we can decide the size of an array at runtime, then we may have difficulty specifying the dummy arguments for a subroutine which takes such an array. How big should we say the array is? And--should we want to iterate over all the elements of that array--how can we discover the size of an array at runtime? Fear not, take an look inside allocatable2.f90 and all will become clear.

Starting on line 85, we see the new subroutine, init_coords:

!< subroutine to initialise coordinate arrays
subroutine init_coords(coords, domain_size, coords_shift)

  implicit none

  ! dummy args
  real, dimension(:),     intent(out) :: coords
  integer,                intent(in)  :: domain_size 
  real, optional,         intent(in)  :: coords_shift
  ! locals
  integer :: extent
  integer :: ii
  real    :: delta
  real    :: offset = 0.0

  extent = size(coords)  ! determine the size of the array
  delta  = real(domain_size)/real(extent)

  ! note default value if arg is not passed
  if(present(coords_shift)) then
     offset = coords_shift
  end if

  ! fill out the values
  do ii=1,extent
     coords(ii) = (real(ii)*delta) - (delta/2.0) + offset
  end do

end subroutine init_coords

The line:

  real, dimension(:),     intent(out) :: coords

is our assumed shape array and it says that we're expecting an argument which is a 1D array of reals, but we don't know how big it will be. Fortunately, with the line:

  extent = size(coords)

we can discover how big the array is at runtime. The Fortran intrinsic function shape() is essential to determine the extent of the different dimensions of a multi-dimensional array.

All very nifty.

Another useful feature is the ability to specify that some arguments to a routine can be optional. This time, we'll take a look at the check_error subroutine:

!< subroutine to stop the program, with an optional message
subroutine check_error(err_code,message)
  
  implicit none

  ! dummy args
  integer,                    intent(in) :: err_code
  character(len=*), optional, intent(in) :: message

  if(err_code /= 0) then
     if(present(message)) then
        write (*,*) message
        stop 2
     else
        stop 1
     end if
  end if

end subroutine check_error

The first argument--err_code--is mandatory, whereas the second--message--is optional. In order to determine whether the optional argument has indeed been passed, we can use the:

     if(present(...)) then
        ...
     else
        ...
     end if

construct.

Notice that for both of these features, we need an explicit interface for the subroutines (given by the interface statements at the top of the file). This will require much less typing once we have encountered Fortran90 modules, as these will give us explicit interfaces for free.

User Derived Types

Next up, let's look at another very useful feature provided by good old Fortran90--user-derived types:

cd ../example2

Way back at the start of Fortran1 we looked at the half dozen or so intrinsic types offered by Fortran90. It's not so much that this set of types is becoming a limitation (as we found with static array sizes) but more that for a large program things get a bit, well, messy! "Not the end of the world", you may say. And you're right. However, mess does lead to bugs and bugs are a huge problem, so actually messy, spaghetti-like code, is a significant problem that we want to avoid at all costs.

Now, user-derived types offer us something simple, yet potentially extremely powerful--we can group things which are related into a single composite type. That is, we can group all the things that belong together into a single entity and then pass that entity around as work is done by our program. This is a huge bonus when designing and maintaining a reasonably sized program. As we'll find out with growing experience, user-derived types are, pretty much, the best thing since sliced bread!

User-derived types represent a first step down the road towards object-oriented programming. A step in the right direction in terms of designing programs.

OK, enough of the spiel. Let's take a look at user-types.f90.

I recently bought a DAB radio and I'm as pleased as punch with it. One of the great features is that you can select a radio station by name, push a button and bingo!, you're tuned in. No more scrolling around the dial trying to get clear reception. That makes it really easy to 'station hop' and find something good that's on. Now, imagine we were designing a program to operate a DAB radio. It's natural that we would want to keep the radio station name and tuning frequency (still needed, of course, but only by the radio itself as it now does the tuning for you) together in a single bundle. They belong together, after all, and are of different intrinsic types--a character string and a real number. In the program it looks like:

  type station
     real                  :: frequency                   ! MHz
     character(len=maxstr) :: name                        ! str
  end type station

Which we could represent schematically as:

a fortran type containing a real and a string.

It wouldn't be much of a radio unless it had a number of stations. We can use an (allocatable) array of entities of our new derived type to store all the required station information:

type(station),allocatable,dimension(:)   :: stations    ! array of stations

Again, schematically this would look like:

an array of instances of a fortran type.

I've made the array allocatable for flexibility, or perhaps out of habit. We could get more sophisticated and grow the array should more stations became available. Which does indeed happen.

After allocating the array (and importantly checking to see whether we hit a problem in doing so), all I need to do is fill out the array and print it to the screen, using the subroutine called render. Note the use of % to access elements of a derived-type.

You can see that we can pass derived-types around, into and out of subroutines for example, just like intrinsic types. This enables us to write very elegant, compartmentalised and hence maintainable programs. If we discover that we need some more information added to our radio station type, we just add it in one place and the rest of the program can remain unchanged--very handy!

Exercises

  • Have a go now at modifying the code: Add something new to the derived type, say a broadcaster string. Recompile using make and check that your program works as before without a problem. Now access the new type. For example, give it some values in the main part of the program and print them out from the subroutine.
  • Note the use of the size function inside the subroutine to find out the length of the array that it was passed. shape() is another useful function in this regard.
  • If you have doxgen version 1.5.5 or higher installed (doxygen --version), try typing make doc and look in the html or latex directories.

Modules

cd ../example3

In this example, we'll take a look at another very useful feature provided by Fortran90--modules. These provide further mileage down the road toward Object Oriented programming. They also do away with the need for common blocks, as used by older incarnations of the language.

Put simply, modules provide a way to collect together related variables and routines into a single entity. Other parts of your program may gain access to this entity through the use statement. In this way can tame the spaghetti tendency of large programs, and the slithery mess of pasta noodles starts to become a manageable collection of components that we plug together in a building block style more reminiscent of using lego or mechano.

Modules also have other useful side-effects. For example, the compiler will check the argument lists of subroutines and functions declared in a module against the argument lists used to call them. Otherwise, we would have to create an explicit interface declaration for each routine that we wrote--tedius! This feature is not to be sneezed at when an argument list mismatch will cause our program to crash.

Let's take a look at our example module in mymod.f90. In this example we see the accumulation of other features of Fortran90 shown in previous examples. For example, we see allocatable arrays, user-derived types, reading data from namelists, checks for memory allocation and deallocation, parameter attributes on 'global' constants etc. etc.

On the first line of the file we see:

module mymod

This is key to creating a module and differs from the program prog we see on the first line of prog.f90. There is, of course, a corresponding end statement on the last line of the file.

We have the ubiquitous implicit none and then the declaration of any variables we wish place in the module. We have a constant named pi, which can be made available to any other part of the program which connects to this module. We could also make available non-constant variables in this way too, if we really had to!. This is how modules replace the sharing of variables via common blocks. Sharing 'global variables' in this way is worth avoiding, however, if you possibly can, as it is another common source of bugs and encourages spaghetti coding.

Next up we have a type declaration for the rather simple 'model' that we'll be using. This user-derived type encapsulates the internal model state together with a record of how many time-steps have been asked of the model. We need an instance of this type to work with and this is declared by:

type(simple_model),private          :: modelA

Note the new attribute, private. This means that this instance will not be made available to other parts of the program which connect to this module. This is a good example of defensive programming through data hiding. It simplifies the interface that the rest of the program has to our model and also guards against bugs which could creep in through the inadvertent modification of our simple model's internal state. Trust me, privacy is good and well worth getting into the habit of specifying. It also prompts one to think about the design of our program a bit more, which is always a step in the right direction.

Following on from our variable declaration, we have:

contains

This signals the end of the variables and the start of the definitions of any routines (subroutines or functions) that we've elected to put into our module. In this case we have three subroutines for initialising, time-stepping and finalising our model, respectively. The routines contain code constructs that we have seen in previous examples. It's worth studying the error checks, use of external files, memory allocation and deallocation and the syntax for accessing elements of a derived type, just as a refresher.

a fortran module.

OK, so much for the module, now let's look at a program which uses it--prog.f90.

This is a rather simple program, now that much of the action is hived-off into a module. Note that the very first line of the program, before even our beloved implicit none is:

use mymod, only: pi, initialise_model, timestep_model, finalise_model

Here we state that we will use the named module in this program (we could use a number of modules). Some additional defensive programming is that we only include the things we need from that module. This simplifies matters and reduces the risk that we will fall foul of a bug arising from a clash of variable or subroutine names. We then procede to make use of any imported variables and routines as we see fit. Note, however, that we do not need direct access to the hidden internal state of our simple model.

Note also that the subroutine timestep_model_private is only called by routines inside the module. We do not need to call it from outside. We only need access to the routines (and perhaps variables) that comprise the interface to that module. All other details about how the module gets things done can be hidden from view. This greatly simplifies matters and helps us manage complexity in large programs. (So called Application Program Interfaces or APIs are used in conjunction with libraries of code. We can consider the library as a functional 'black-box', aside from the interface routines--they are all we need to know about.)

Exercise

  • OK, now that you are familiar with the vital anatomy of Fortran90 modules, have a play around. Start by changing the start-up state of the model by editing imput.nml. You could then proceed, for example, by adding new varaibles or subroutines to the module and making use of them in the program. When you're happy with that, you could experiment with the only and private keywords or even create a whole new module. Go on! Why not?:)
  • If you have doxgen version 1.5.5 or higher installed (doxygen --version), try typing make doc and look in the html or latex directories.

Using separate files

In the Fortran practicals so far, all compilations steps have been taken care of by the powerful build system program called make. However, we are reaching the time where you need to get your hands dirty and see how compilation can be done "by hand". Then we will see how the Fortran program can be split over several files.

$ cd ../example4

Invoking the compiler

The compiler that you have been using during the Fortran practicals is called gfortran. It is a free open source compiler. A range of compilers is available on the Linux platform. g95 is another free software alternative. ifort and pgf90 are commercial offerings from Intel and the Portland Group respectively. In this section, the gfortran compiler will be used but note that the practical would work just as well with any of these compilers.

To compile a Fortran file with gfortran, simply invoke the compiler with the name of the file. It will then create a file called a.out which is executable on the machine you compiled it on:

$ cd hello_world
$ gfortran hello_world.f90
$ ls -rtl
total 420
-rw-r--r-- 1 fred users     93 Mar  3 20:39 hello_world.f90
-rwxr-xr-x 1 fred users 411750 Mar  3 20:41 a.out

Try to execute the file a.out (remember to include the current directory in your call, i.e. ./a.out). Also note the size difference between the source file and the executable. Source files take up a lot less room.

The compiler could be invoked with many options to check aspects of the Fortran syntax; to control what types of warnings are issued; to perform various optimisations of the code or to create debugging information. We will not go into details in this practical so refer to the compiler documentation if you want more information (try man gfortran).

If you modify the source file, you need to recompile it before executing a.out for your changes to take effect.

The first compiler option we will be using is used to change the name of the executable file. This is done with the option -o followed by the name required:

$ gfortran hello_world.f90 -o myprog

Now that we can invoke the compiler manually, let's see how to split a Fortran program into multiple files.

Creating objects and linking them

First of all, let's see how to put a subroutine in a separate file. In the subroutine directory you will find a basic Fortran program in the file onefile.f90. It contains a main program unit and a subroutine.

$ cd ../subroutine

You can compile it using the method above.

$ gfortran onefile.f90 -o myprog
$ ./myprog

The content of this small Fortran program has also been split into two separate files:

  • one for the main program unit, main_program.f90
  • one the subroutine, sub1.f90.

If you invoke the compiler on either of these files, it will complain that something is missing. Try it to see the two types of error messages.

The reason for this is that; (i) the main program needs the definition of the subroutine; and (ii) a Fortran program needs a main program unit so neither file is enough on its own. Luckily, you can also invoke the compiler on the two files at the same time.

$ gfortran main_program.f90 sub1.f90 -o myprog

It works! (Note that the order of the files does not matter.) By giving two files as arguments to the compiler, it is clever enough to understand that there might be dependencies between the files and it takes care of them automatically.

The problem with this approach is that if the subroutine is modified, the main program will also have to be recompiled to generate the executable. This would be a issue for larger Fortran programs because it would be time consuming. We will also see later on that we might want to included external subroutines contained in a library and to which the source code is not available.

Instead, it is possible to ask the compiler to compile a file and "ignore" the dependencies. By doing this, the compiler creates an object file (usual extension .o) which contains all the information needed to create an executable minus the actual dependencies. This is done by using option -c for "compile only".

$ gfortran -c main_program.f90
$ gfortran -c sub1.f90

Check that two new files have been created. By default, they should be named like the source file but with a .o extension. Again, this default behaviour could be changed with the option -o.

Then the two objects have to be linked together to create an executable file. This is done by invoking a the compiler on the object files:

$ gfortran sub1.o main_program.o -o myprog

This might seem like an extra step compared to the previous method but the advantage of this method is that you can modify one file independently of the other, re-create the object for the modified file only and then link the objects together. If you have many files in your project, that would save a significant amount of time.

Try it.

Note 1: Although you can modify the files independently, they still need to make sense in terms of Fortran syntax. For instance, if you add an argument to the subroutine but don't modify the main program unit, the program might not work as expected.

Note 2: if you have many files in your project, understanding which ones need to be recompiled can be a complex task. This is why you are encouraged to use a "build system". The course contains an introduction to the make build system later on.

Dealing with modules

In this section we look at what happens when modules are used. The required files are in the directory called module.

$ cd ../module

This directory has two files; one containing a main program unit; and another containing a Fortran module. Have a look at the files, compile them and create an executable.

After the module is compiled, you might have noticed that a new file has been created in the directory with the extension .mod. This file contains information about the module and when the compiler notices that a module is required, it will use this file to get the information. Therefore, this file is necessary when you compile any source file using the module. You can check it by deleting it and trying to compile the other file again.

Note: If you change the module significantly but don't recompile the file that uses the module, you might get problems as the module file that was used to create the object is out of date. So the object should be recreated too.

By default, the compiler looks for the module files in the current directory. However, it is quite usual that some modules are stored elsewhere. For instance create a directory called include/, move the module file into it and then compile the program... you should get an error.

gfortran main_program.f90 -c
In file main_program.f90:3

  use mymodule
              1
Fatal Error: Can't open module file 'mymodule.mod' at (1) for reading: No such file or directory

Luckily, it is possible to tell the compiler to include another directory in its search. This is done with the option -I for include.

$ gfortran  main_program.f90 -I ./include -c
$ gfortran main_program.o mymodule.o -o myprog

All is well, the compiler can now find the relevant bits of information.

Libraries

The next step is to package groups of subroutines together in a library. The library can then be re-used by a different project.

$ cd ../example5

The directory contains a program (main_program.f90) and a directory called operations which we will use to build our first library.

$ cd operations
$ ls
addition.f90  multiplication.f90

The operations directory contains two Fortran files. They are actually modules and contains a set of functions that can be used to perform arithmetic operations. Have a look at them and then compile the two files to create the objects and module files.

The two object files can then be archived in a library with the command ar for archive:

$ ar rs liboperations.a *.o
ar: creating liboperations.a

Note the "r" argument which means replace. You could call the library whatever you want. However, using the lib prefix will make the library easier to link to later on.

At this stage, the object files are not required anymore and could be deleted. However, you need to keep the library and the module files.

Now we will go back to the main program and create an object for it.

$ cd ..

Have a look at the main program file. It uses the two modules defined in the operations folder. Therefore, the module files will be required at the compilation stage. The library will be required at the linking stage.

To compile the file, use the option -I to let the compiler search the operations/ directory for module files..

$ gfortran main_program.f90 -c -I./operations

Then we need to link the newly created object to the library that was created earlier on. The simplest method is to give the full path of the library to the compiler:

$ gfortran main_program.o ./operations/liboperations.a -o myprog

However, it is common practice to put many libraries in a given location and let the compiler know where to find the libraries. This is achieved by using the option -L to specify the path to the libraries and then -l to specify which library to use.

$ gfortran main_program.o -L./operations -loperations -o myprog

Note that in the example above, the compiler will look for the library "liboperations.a" in the folder "./operations". If more than one library was required, you would have to specify them one after the other: "-llib1 -llib2 ..."

That's all you need to know about creating libraries. One important thing to remember is that Fortran programs will require both library and module files. Sometimes they are not kept in the some location and often the module files are kept in a directory called include/ and the libraries in a directory called lib/. This is the case for the NetCDF example that follows.

Some useful third party libraries include:

  • The libraries described (with examples) on the linear algebra course (LAPACK,PETSc, PLASMA etc.) are very useful.
  • The NAG library of numerical operations (linear algebra, optimisation etc.)
  • NetLib is a searchable site of other libraries useful for scientific compuation.

A useful library: netCDF

netCDF is fast becoming the de facto file format for model data. It is a well-designed format for storing your own data and also a convenient way to share data with your collaborators. One of the it's many benefits is that the file format is endian-independent. That means that we don't need to worry, for example, about which architecture (Linux, Unix, MS Windows, Mac OS etc.) the data was created on.

$ cd ../example6

This section contains a number of data-structures (derived types) and routines for reading and writing NetCDF files. They are bound up into several examples. Hopefully these will be useful to adopt in your own programs, so feel fee to copy and use them.

To go further

That's it for the Fortran tuition. The Pragmatic Programming course continues with a practical about project management using "make".

A useful Fortran textbook is called Fortran 90 Programming by Ellis, Philips & Lahey. Take a look at the 'A Good Read?' page for more details.