Debugging
Debugging you program: Various techniques
Introduction
Humans can be variously ingenious, inspired, careful, persistent and many things besides. All of these character traits can be called upon when writing software. There is one aspect of human nature we can be certain of, however: We err. We make mistakes; muck it up; break stuff. No amount of technology, know-how or experience will change this. From time-to-time, we all get it wrong.
This isn't all bad, however. It's a cliche, but if we never made a mistake, how would be learn? Making mistakes is essential for our progress. That said, we also need our programs to work correctly. We want our weather and climate models to accurately predict the future. We want our banking software not to 'lose' our money. We want stuff to work.
OK, given that we're going to get bugs and that we don't really want them, this workshop is focussed upon finding them and correcting them--the art of debugging. Approached rashly, debugging can a a torrid and despairing task. With some of the right tools and a systematic approach, however, debugging can be a rewarding task. As we alluded to earlier, debugging is a learning process and as you grapple with your own projects, you will have a great many of those, "aha!" and "oh, I see!" moments. Not quite a joy, perhaps, but definitely satisfying.
Getting the content for the practical
OK, let's make a start. Login to your favourite linux box and type:
svn export http://source.ggy.bris.ac.uk/subversion-open/debugging/trunk ./debugging
A Common Bug: going beyond the boundaries of an array
We will start with a pretty common coding problem: we have an array and a loop which access elements of that array in turn. The problem is that we've made a mistake with our loop and it tries to access elements beyond the boundaries of our array.
Let's visit our example:
cd debugging/examples/example1
Here's the saliant parts of the code, from array_bounds.f90:
integer, parameter :: n = 10 ! array size integer :: ii ! counter real, dimension(n) :: x ! array
  ! a loop accessing beyond the array bounds
  do ii = 1, 10000
    x(ii) = x(ii) + float(ii)
    write (*,*) "x(",ii,") is: ", x(ii)
  end do
Let's take a look and compile up the code using the open-source g95 compiler.
We get a segmentation fault as soon as we step outside of the array. "Fine, this is how it should be", you say. Well, somethimes were not so lucky. I tried compiling-up the same code using both the Intel and PGI Fortran compilers. We wern't so lucky. With Intel, the counter reached 44 before the program crashed. With PGI, we needed to step outside the array by thousands of elements before we triggered a segmentation fault.
Happily we can check for array bounds problems in a less ad hoc manner. Many compilers allow you to incorporate run-time array-bounds checks into your executable. Using g95, this is done by supplying the flag -fbounds-check (-CB for Intel, or -Mbounds for PGI). When we run the program now, we get a much more definitive statement from the compiler (and Intel and PGI don't wait until we're way passed the end of the array either):
Fortran runtime error: Array element out of bounds: 11 in (1:10), dim=1
So, by testing our code with the appropriate compiler flags, we can track down occurances of this common problem.
Argument Mismatch
Another common bug is a mismatch between the number (or type) of arguments passed to a subroutine when it is called and those defined in the definition of the subroutine itself. Let's take a look at an example:
cd ../example2
In the file subroutines.f90, we have three subroutines. The calls and definitions for the first two match. However, the third is called in the main program using:
call sub3(numDim)
but defined as:
subroutine sub3(numDim,arg2) implicit none ! args integer, intent(in) :: numDim integer, intent(out) :: arg2 arg2 = numDim end subroutine sub3
Now, you may think this is an obvious mistake, and it is for a small number of arguments. However, for large progams the argument lists for subroutines can get quite large. Perhaps 10, 20 even 30 arguments. When we get up to those numbers, it's very hard to spot a mismatch.
Sadly for us, compilers such as Intel and PGI don't check that the calls and the definitions match by default--it's not the Fortran way! They would compile up the program happily, only for it to seg' fault at runtime (with PGI):
We live in 3 dimensions Up, down, side to side, yup 3 dimensions it is! Segmentation fault
What a pain! Happily, there is a very simple fix to all this--we place our subroutines into a Fortran90 module. Let's take a look at what happens this time:
cd ../example3
We have an (almost) identical main program (the use statement is the only addition) and we have hived-off all our subroutines into mymod.f90. This time when we try to compile, with PGI, we get:
PGF90-S-0186-Argument missing for formal argument arg2 (subroutines.f90: 15) 0 inform, 0 warnings, 1 severes, 0 fatal for argmismatch make: *** [subroutines.o] Error 2
of with Intel:
fortcom: Error: subroutines.f90, line 15: A non-optional actual argument must be present when invoking a procedure with an explicit interface. [ARG2] call sub3(numDim) -------^ compilation aborted for subroutines.f90 (code 1)
We still have an error. This is true. But we are told exactly what and where it is and also before we've wasted a load of time trying to run the faulty program.
Note, with g95, we do get a warning when we compile example2:
In file subroutines.f90:39
subroutine sub3(numDim,arg2)
           1
In file subroutines.f90:13
  call sub3(numDim)
       2
Warning (154): Inconsistent number of arguments in reference to 'sub3' at (1) and (2)
g95  subroutines.o -o subroutines.exe
but a faulty executable is still created. Better to go with example3:
In file subroutines.f90:15
  call sub3(numDim)
       1
Error: Missing actual argument for argument 'arg2' at (1)
Looking a bit Closer
So far, we've looked at some bugs with severe effects--they caused the program to crash. If we have a bug, in a way we hope it's one with severe effects. That way at least they will be easy to spot! So far these severe problems have been easy to track down too. Alas, bugs are often more subtle and are accordingly harder to find. Don't despair, however, as we have more tools and aids to help us find the pernicious little critters.
An oft seen appraoch is to add print statements to the code, perhaps printing the value of a variable or merely proclaiming, "the program got as far as me!" Then recompile and rerun the program. Perhaps we get some insight or not on this time around, add some more print statements, recompile, rerun and hopefully home-in on the problem. This approach can certainly work, but is tedious and time consuming. Happily there is a better way. We can run our code inside a tool specifically designed to help us find bugs--we can use a debugger.
A very serviceable open-source debugger available on most linux systems is called ddd. We will use this one for this practical. There are a number of other good debugging tools, such as MS VisualStudio, the Portland Group debugger (available on quest) and many more besides. They all work in a similar manner, however, and so becoming familiar with ddd will keep you in good stead.
OK, let's move to a new example:
cd ../example4
We can compile up our program using make, note, however, the addition of the -g flag to the (g95) compiler, which instruments the code for dubugging:
[ggdagw@dylan example4]$ make g95 -g -c gubbins.f90 -o gubbins.o g95 -g gubbins.o -o gubbins.exe
Now, the sorrowful program in gubbins.f90 is full of programming problems:
- integer division
- overflow
- underflow
- divide by zero
The program will run silently to completion and we may be none the wiser about any of the mishaps along the way. However, if we run the program inside the debugger, we can examine the values of all the variables and control the flow of the program as we see fit, exposing all those little mistakes to the cold light of day:
ddd gubbins.exe
The first thing we do is to set a breakpoint. When we run the program, it will get as far as the line of code with the breakpoint attached, and will then sit and wait for our next command. We can step the program one line at a time, use next to enter a subroutine and then step through that. (I don't know whay these two functions are the wrong way round for Fortran, they just are. Sigh.) We can continue to the next breakpoint (or the end of the program if we haven't set one) and also display the values of variables as we go along. You can also hover over variables to see their values. This is all rather neat, eh?!
Inspecting the flow of loops and conditionals and the values of variables inside a program like this is ideal for finding bugs, and it's a lot less laborious than a tedious cycle of add print statement, recompile, rerun...
Note that accidentally modifying an argument passed to a subroutine is another common source of problems. The best way to address this one is an example of defensive programming, whereby we proactively avoid bugs through mindful programming practices. Fortran provides us with the intent attribute for dummy variables to address this. Trying to modify a dummy variable with the intent(in) attribute will result in a compile-time error. Adding intent to your arguments also helps you think clearly about the design of your subroutine.
Testing
In the previous sections we've looked at a number of ways of finding a bug once we know we have a problem. The fix for a bug is usually self-evident and part of the "aha!" moment. However, in oreder to determine whether or not we have a bug, or more accurately, whether it is manifest under the range of conditions in which we run our program, we need to test it. This may seem blindingly obvious, but it is sobering to see the number of programs that are used without a second thought given to testing whether it actually does what it is intended to do!
There are few generalities that I can list with regard to testing--each research code is likely to be different, with different needs etc. However, I would stress that it is a good idea to make it as easy as possible to test your code. Frequent testing is the key to finding bugs quickly and those that are found in a timely manner and far easier to find and fix. (ref)
It is possible to add a test rule to the makefile that you use to compile your code. Given such a rule and some appropriate scipts, it can be as simple as typing make test to test your code. Easy for you. Easy for your collaborators. To find out more about make and the addition of a test rule, you can look at make our course on make.