Debugging you program: Various techniques

Introduction

Humans can be variously ingenious, inspired, careful, persistent and many things besides. All of these character traits can be called upon when writing software. There is one aspect of human nature we can be certain of, however: We err. We make mistakes; muck it up; break stuff. No amount of technology, know-how or experience will change this. From time-to-time, we all get it wrong.

This isn't all bad, however. It's a cliche, but if we never made a mistake, how would be learn? Making mistakes is essential for our progress. That said, we also need our programs to work correctly. We want our weather and climate models to accurately predict the future. We want our banking software not to 'lose' our money. We want stuff to work.

OK, given that we're going to get bugs and that we don't really want them, this workshop is focussed upon finding them and correcting them--the art of debugging. Approached rashly, debugging can a a torrid and despairing task. With some of the right tools and a systematic approach, however, debugging can be a rewarding task. As we alluded to earlier, debugging is a learning process and as you grapple with your own projects, you will have a great many of those, "aha!" and "oh, I see!" moments. Not quite a joy, perhaps, but definitely satisfying.

Getting the content for the practical

OK, let's make a start. Login to your favourite linux box and type:

svn export http://source.ggy.bris.ac.uk/subversion-open/debugging/trunk ./debugging

A Common Bug: going beyond the boundaries of an array

We will start with a pretty common coding problem: we have an array and a loop which access elements of that array in turn. The problem is that we've made a mistake with our loop and it tries to access elements beyond the boundaries of our array.

Let's visit our example:

cd debugging/examples/example1

Here's the saliant parts of the code, from array_bounds.f90:

  integer, parameter :: n = 10  ! array size
  integer            :: ii      ! counter
  real, dimension(n) :: x       ! array

  ! a loop accessing beyond the array bounds
  do ii = 1, 10000
    x(ii) = x(ii) + float(ii)
    write (*,*) "x(",ii,") is: ", x(ii)
  end do

Let's take a look and compile up the code using the open-source g95 compiler.

We get a segmentation fault as soon as we step outside of the array. "Fine, this is how it should be", you say. Well, somethimes were not so lucky. I tried compiling-up the same code using both the Intel and PGI Fortran compilers. We wern't so lucky. With Intel, the counter reached 44 before the program crashed. With PGI, we needed to step outside the array by thousands of elements before we triggered a segmentation fault.

Happily we can check for array bounds problems in a less ad hoc manner. Many compilers allow you to incorporate run-time array-bounds checks into your executable. Using g95, this is done by supplying the flag -fbounds-check (-CB for Intel, or -Mbounds for PGI). When we run the program now, we get a much more definitive statement from the compiler (and Intel and PGI don't wait until we're way passed the end of the array either):

Fortran runtime error: Array element out of bounds: 11 in (1:10), dim=1

So, by testing our code with the appropriate compiler flags, we can track down occurances of this common problem.

Argument Mismatch

Another common bug is a mismatch between the number (or type) of arguments passed to a subroutine when it is called and those defined in the definition of the subroutine itself. Let's take a look at an example:

cd ../example2

In the file subroutines.f90, we have three subroutines. The calls and definitions for the first two match. However, the third is called in the main program using:

  call sub3(numDim)

but defined as:

subroutine sub3(numDim,arg2)

  implicit none

  ! args
  integer, intent(in)  :: numDim
  integer, intent(out) :: arg2

  arg2 = numDim

end subroutine sub3

Now, you may think this is an obvious mistake, and it is for a small number of arguments. However, for large progams the argument lists for subroutines can get quite large. Perhaps 10, 20 even 30 arguments. When we get up to those numbers, it's very hard to spot a mismatch.

Sadly for us, compilers such as Intel and PGI don't check that the calls and the definitions match by default--it's not the Fortran way! They would compile up the program happily, only for it to seg' fault at runtime (with PGI):

 We live in             3  dimensions
 Up, down, side to side, yup             3  dimensions it is!
Segmentation fault

What a pain! Happily, there is a very simple fix to all this--we place our subroutines into a Fortran90 module. Let's take a look at what happens this time:

cd ../example3

We have an (almost) identical main program (the use statement is the only addition) and we have hived-off all our subroutines into mymod.f90. This time when we try to compile, with PGI, we get:

PGF90-S-0186-Argument missing for formal argument arg2 (subroutines.f90: 15)
  0 inform,   0 warnings,   1 severes, 0 fatal for argmismatch
make: *** [subroutines.o] Error 2

of with Intel:

fortcom: Error: subroutines.f90, line 15: A non-optional actual argument must be present when invoking a procedure with an explicit interface.   [ARG2]
  call sub3(numDim)
-------^
compilation aborted for subroutines.f90 (code 1)

We still have an error. This is true. But we are told exactly what and where it is and also before we've wasted a load of time trying to run the faulty program.

Note, with g95, we do get a warning when we compile example2:

In file subroutines.f90:39

subroutine sub3(numDim,arg2)
           1
In file subroutines.f90:13

  call sub3(numDim)
       2
Warning (154): Inconsistent number of arguments in reference to 'sub3' at (1) and (2)
g95  subroutines.o -o subroutines.exe

but a faulty executable is still created. Better to go with example3:

In file subroutines.f90:15

  call sub3(numDim)
       1
Error: Missing actual argument for argument 'arg2' at (1)

Looking Inside a Program

underflow
overflow
divide by zero
assign instead of compare
integer division

passing values into a subroutine and modifying them--use intent.

Testing

We need to test our code to see where the bugs are!

Debugging

Contents