StartingC

startingC: Learning the C Programming Language

=Introduction=

Welcome to 'StartingC', a tutorial aimed at getting you up and running with the C programming language. The tutorial is split into sections in each of which there will be some explanatory text (maybe even some diagrams!) but most importantly some working example programs that you can easily download and run. To get these, just cut and paste the text below onto your command line:

svn co https://svn.ggy.bris.ac.uk/subversion-open/startingC/trunk ./startingC

We will not be assuming any previous programming experience, just an enquiring mind and the rudiments of using the command line on a Linux-based computer. If you have any doubts about the latter, take a look at the Linux1 tutorial, which is also part of the 'pragmatic programming' set.

=A Quintessential First Program=

OK, now that we have the example code, let's get cracking and run our first C program. First of all, move into the example1 directory:

cd startingC/examples/example1

We'll use of a Makefile for each example, so as to make the build process painless (hopefully!). All we need do is run make (see the [make tutorial about make] if you're interested in this further):

make

Now, we can run the classic program:

./hello.exe

and you should get the friendly response:

hello, world!

Bingo! We've just surmounted, in some ways, our biggest step--running our first C program. Programming is like playing with mechano or lego. Remember how much fun it was to assemble all those building blocks into something new and fascinating? We've just built our first model, and the rest of the toy box awaits, so let's get stuck in!

=Types & Operations=

Buoyed with confidence from our first example, let's march fearlessly onwards into the realm of variable types and basic operations. To do this, move up and over to the directory example2 and type make to build the example programs:

cd ../example2 make

Take a look inside types.c (it's best to run your text editor in the background, so that you can type make etc. when needed) and after the start of the main function, you'll see a block of variable declarations:

C, like many languages (e.g. Fortran), requires that variables must be declared to be of a certain type before they can be used, and here we see examples of four intrinsic types provided by the language. It's a very good habit to comment all your variable declarations, and here the comments pretty much explain what the various types are. double is a double precision--twice the storage space of a float--floating point number. The extra space make a double a good choice for an accumulator where you want to minimise rounding errors and avoid under- and overflow as best as possible. (The Fortran programmers amongst us will note, with a whince, the absence of an implicit type for complex numbers. Those reeling from this revelation will be comforted by the knowledge that C++ contains a complex class.)

Various types can be given further qualifiers, such as short, long, signed and unsigned:

The const keyword is also very useful for, well, declaring constants. In invaluable intrinsic (aka built-in) function when pondering the amount of memory assigned to a variable is sizeof.

In addition to single entities of various types, we can also declare arrays of the self-same intrinsics. The syntax for this is along the lines of:

You'll see a good deal more of accessing the various elements of an array in later examples, but for now be satisfied with the knowledge that array indices start at 0 in C (yes, that's right Fortraners, that's zero, not 1) and that the syntax for array access is, e.g.:

Enumerated types can be a useful way to map (a list of) symbolic names to integer values.

Now that you've read it through, run the program and satisfy yourself that it all works as you expect it to. To run the program, type:

./types.exe

Shifting our attention to operations.c, let's consider some basic operations that C supports. This is the start of the doing things part.

The first block of code here gives an arithmetic example--how to calculate the volume of an oblate spheroid that happens to be close to all our hearts, our shared home Earth:

I won't dwell on this as I'm confident that the syntax is self-explanatory, save to mention that the function pow comes from the built-in library of math functions.

Next up, you'll see the decrement and increment operators:

also self-explanatory.

C provides the logic operators, == (is equal), != (not equal), && (AND) and || (OR); as well as the relationals, > (greater than), < (less than), >= (greater than or equal) and <= (less than or equal).

An operation that you will become keenly aware of--especially working in scientific computing--is the ability to temporarily convert the a variable from one type to another on-the-fly. This is known as casting. Two examples of this are:

where, in the first, we convert pi into a (short) integer and convert 42 into a floating point number in the second. Note that the cast does not effect the original variable in any way. i.e. the value given to the variable called pi is not changed through using the cast.

One last class of operations for now are the bitwise operators. These give you very low-level control over the bytes associated with variables, should you need that. For example, we can perform a bitwise AND on the two bytes 01001000 and 10111000, yielding 00001000 when all the bit pairs are considered in turn according to the criteria:

To run the second program, type:

./operations.exe

Now, it's very important that you muck around with these example programs as much as possible! Ideally, so much so that you break them! We never learn as much as when we make a mess of things, and since these are just toy programs, you may as well go for it! If you get in a pickle, you can get the original programs back with a quick waft of the Subversion wand:

svn revert *

Exercises

types.c
 * declare a character array sufficient to record the state of a game of naughts and crosses, populate it and print it to the screen.
 * How many bytes is used to store a long double?
 * You can give an initial value to a character array when you declare it (e.g. char cStr[20] = "xxxxxxxxxxxxxxxxxxxx";). What happens if we leave '\0' out of character assignments in this case?

operations.c
 * 29.2% of the Earth's surface is land. How much is this in square kilometers?  C has an arccos function (acos) and the web has the [formula for the surface area of an oblate spheroid]
 * Logic is perilous. Can you think of a time when we say "or", but really mean logical AND?
 * What happens if you cast the character '9' to an int?

=Conditionals & Loops=

OK, we have types and operators under our belts. This C malarky isn't too bad, eh? Let's take a look at some stalwarts of the procedural family of languages--conditionals and loops. As we will start all our sections, move up and over to the example3 directory and build the program(s) therein:

cd ../example3 make

Looking inside flow.c, our first block shows how we can make many way decisions using if tests and the else catch-all:

This is all very nice and self-explanatory. Typically you would use the above for a decision point that could follow one of 3 or less branches. If you have more than 3 branches, the switch statement is likely to be more concise and easier for you and your fellow developers to read:

The default case is much like our else catch-all in the box above and is important to include as otherwise you will be vulnerable to a 'fall-through' bug. This is when none of the cases trigger because we did not consider the actual value passed to switch. You will also notice the break statements in all the cases. Adding these is also a defensive maneouvre, since we could accidentally trigger two cases. Case 4 and the default, say. A caveat, however, is that the expression in the parentheses of switch(expr), must be integer valued.

Moving on. The for is an oft used tool on the work bench:

It's tidy, succinct and gets the job done. Note that we've nested an if statement inside our loop. The continue statement is a useful way to skip the rest of an iteration, if it's superfluous.

Sometimes, however, we don't know ahead of time how many iterations of a loop will be required. We can't use a for loop in this case and the while loop steps into the breach for us. For example:

In this case we keep testing to see if ii is greater than the threshold. If it is, then we go around the loop one more time, acquiring a new value for ii along the way. We loop back to the top, re-test against the threshold and so on. The loop will only terminate when ii is less than the threshold, i.e. when the while test fails, so watch out for those infinite loops!

To run the example program, type:

./flow.exe

Exercises


 * Can you nest an if within another if and what would be the point? Indeed can you have an if within a loop, within an if..?
 * What happens if you remove break statements from the switch construct?
 * Can you write a for loop that counts down rather than up? What about in steps of 2, or 3?
 * Can you increment more than one variable in a for loop?
 * Can you have multiple tests conditions in a loop?
 * What's the simplest infinite loop you can write? Do you know how to abort a program?!
 * Can you sabotage the counting in a for loop? Is there a way to protect against such a bug?

=The C Preprocessor=

Up until now, we've been studiously ignoring the lines beginning with # at the start of our programs. The time has come, however, to look these statements square in the eyes!..

cd ../example4 make

So far we've glanced upon constructs such as:

Lines starting with a # form instructions to the C preprocessor. We can think of the preprocessor as a form of cut & paste. In our example, the preprocessor will replace our #include line with the contents of the system header file, stdio.h. Why are we doing this? Well, we wish use some of the standard input/output library functions, such as printf in our program and the header file contains the function prototypes. The compiler needs these prototypes to make sure that we are calling the functions correctly and thus to produce a working executable or compile-time error--whichever is appropriate.

We'll look at header files in more detail when we come to write our own functions.

We can do a good deal more than just including header files, however. For one, we can use a #define statement to set global constants. Take a look inside macros.c and notice how we have specified the size of our character array, called cStr. Outside of the main program we have:

Inside the main program, we then make use of our new symbol in our variable declarations block:

We can arrange to loop over the contents of that array using:

To run the program, type:

./macros.exe

as per usual.

Arranging conditional compilation is perhaps the most useful aspect of the preprocessor. Further down in the main program we have the conditional code block:

which we can activate through the use of an appropriate compiler flag. In order to do this, uncomment the line:


 * 1) CFLAGS=-DDEBUG

in the Makefile, retype make, and re-run.

The preprocessor gives us yet more possibilities, with constructs such as:

However, a word of caution It is wise not to overuse the preprocessor. For example:
 * It may be better to use a const variable declaration, rather than a global #define.
 * Conditional compilation can be useful, but if you can use run-time switches in your code instead, you will not have to keep re-compiling your programs when you want to vary a parameter, say.

If you're keen, you can see a good use of the preprocessor for setting function names in mixed Fortran-C programming.

Note that we now have 3 distinct stages en route to producing an executable program:
 * 1) The preprocessor step: cut & paste.
 * 2) Compilation: taking source code and creating object code.
 * 3) Linkage: Linking object files and possibly libraries together to give an executable.

Exercises


 * Vary the size of the character array. Note that you'll have to re-compile your program each time.
 * Invent a new block of conditionally-compiled code and make the appropriate changes to the Makefile to bring it into effect.
 * Experiment with the additional #ifdef and #ifndef preprocessor statements.

=Functions=

So, onto functions. What are these and why do we use them?

Well, we can think of a function, in some ways, as a black box--we feed in inputs and it returns outputs. An example would be a trigonometric function, such as the sine function. If we input $$\pi/2$$ radians, we'll get 1 as the output; input $$\pi$$ and we'll get 0 back. We're not limited to just mathematical functions in C, however. We can write pretty much any function we like! We'll see many examples cropping up from here onwards.

OK, so much for a function's general form. Motivation-wise, if you need to do something more than once in a program, you should write a function to do it. That way, you just call your function whenever you need to perform that task. That strategy will give us concise programs as we don't need to duplicate any lines of code. Another benefit is that duplicate lines of code is a bug waiting to happen! Why? Well, there is a good chance that if you modify one of those lines of code, you'll forget to change the other. We're humans, after all. We err. Now, those lines are no longer identical and so will no longer do the same thing--tada, your bug.

Now repeat after me, never duplicate any code---write a function.

Another reason for writing a function, even if you don't call it more than once, is that breaking down your program into functional units will make it much easier to read and understand. This should be your #1 design criterion for any piece of code that you write.

OK, with the preamble out the way, let's take a look at an example:

cd ../example5 make

Inside funcs1.c, you'll see that we compute the volumes for all the planets in the solar system, rather than just for Earth. Accordingly, we bundle the volume calculation into a function of it's own:

and call it a number of times as we cycle through the planets:

Note also the presence of a function prototype near the top of the file:

We'll need one of those for each function that we write. To run the program, type:

./funcs1.exe

The eagle-eyed amongst you will have noticed that the const variable pi is no longer needed in the main program unit. Also, the variable val is declared inside the main program unit and the function. Both of these things allude to something called scope, and in particular that variables declared inside a function are only known to that function. This rule also applies to the main function. Thus, if we want to pass values between functions, we must use arguments and return values. (Deliberately ignoring global values, as they are typically considered to be bad news.)

While we're on the topic of C function arguments; they are what's known as passed-by-value. Beware, this contrasts with passed-by-reference, as used in Fortran, for example. What's the significance of this? Well, in C a copy of the value of the agument is passed into a function. That means that you can do anything you like to the it's value inside the function, but it will all be forgotten upon exiting. Pass--by-reference means that the actual memory address of the argument is passed into the routine and so any changes to it's value will stick. We'll look at this topic more closely when we consider memory addresses and pointers later on.

Exercises


 * Modify funcs1.c so that the equitorial radius argument is zeroed inside the function. Write a second loop to investigate the consequences of that inside the main program.
 * Write an additional function to calculate the surface area of a planet and print the results of applying that function too. (See  for the formula.)

A light hearted interlude. Functions can call themselves. Neat! It's called recursion and can be both elegant and powerful. Neater! The classic example is the Fibonacci series, and who am I to buck the trend? We'll it does lend itself to beautiful shapes:



Take a look in fibonacci.c and try running it (fibonacci.exe) using various function inputs.

In truth, more interesting examples of recursion crop up when we consider more advanced data structures, such as binray trees, so we'll save some of the good stuff until then.

=Pointers and Allocatable Memory=

OK, now we're talking. Now we're getting to the marrow of the language. Once you're comfortable with this material, the world will be your oyster! So, without further ado, let's wade in:

cd ../example6 make

(Don't worry about the compiler warnings--this program deliberately mis-assigns a pointer variable.)

Looking inside pointers.c, we don't have to wait long to see something new. In the variable declaration block we see:

iNum we're happy with, just a common-or-garden integer. iAddr is a new species, however, and is a pointer to an integer. Said another way, the value of iAddr is the memory address of iNum. We can draw an analogy between memory addresses and pigeon holes, where each pigeon hole is labelled with a unique (integer) number--it's address. Diagrams can often be helpful. Here's one where b is our plain old integer, given a value of 17 and a is used to store the memory address of b, i.e. a is a pointer to b:



Now, we can explore the consequences of this relationship in our program. Let's give iNum a value, and set iAddr to point to iNum. We use the & symbol to get the address of a variable:

If we print iNum, we'll obviously see that it has a value of 3. We can also follow the pointer iAddr, and see what value is stored in the memory address that it's pointing to. This is known as dereferencing and has the symbol is *. Since this is the value of iNum, it will, of course, always yield 3:

If we change the value of iNum, we'll see a corresponding change in *iAddr. Also, if we assign *iAddr to a new value, we'll see the value of iNum follow suit, since they are one and the same:

We've seen arrays before, but until now, we haven't witnessed their special relationship with pointers: An array is a contiguous chunk of memory which we can reference through the address of it's first element. Further, we can access subsequent elements of an array through the use of pointer arithmetic. What does this all mean? Well, it's perhaps best explained with an example:

Here, we set iAddr to point to the first element of the array called data. We then loop through all the elements of the array printing values; first through the use of the familiar square bracket syntax ([]); and secondly via a pointer and increments upon the base address.

The last chunk of code in the program illustrates the use of dynamic memory allocation. Up until now, we've specified the size of all our variables at compile-time. Sometimes, however, we don't know how much space we'll need to store something ahead of time (perhaps we want to read a file which changes in length from one day to another). The ability to allocate memory on the fly will help us tremendously in this situation. Another situation where dynamic memory allocation will help is when we have too much data to store it all in main memory at the same time. In this situation, we can allocate some space, fill it with some of our data, work on it, free the space and then move on to work on another chunk of data.

You'll recall that arrays are chucks of contiguous memory referenced through the address of the first element. If we declare a pointer to the variable type we're interested in and then invoke a command to pair it with a chuck of memory, we'll be in business, right? Enter the malloc function:

On one line, we've rather neatly requested a chunk of contiguous memory with malloc. We've requested the number of bytes required to store an integer, multiplied the number of elements we desire in our new array. The malloc function returns a general purpose pointer to void (void *), which we've quickly cast to be a pointer to an integer so that the RHS matches the LHS in our assignment to trusty old iAddr. Bingo, we have our new array; arranged on-the-fly, primed and ready to go!

What's even better is that we can access it like any other array:

Here's a diagram illustrating the situation:



We should always clean up after ourselves and free any memory that we've allocated:

making sure that we don't try to free any memory that we have not allocated, as that will cause our program to crash.

Pointers are very versatile and can be used to construct more advanced data structures such as binary trees and linked lists.

Exercises


 * Allocate an array of doubles. Set all the elements to 0.0 and then set every third element to 1.0 in a loop.
 * Allocate a 2-dimensional array of integers. To do this, you'll need a pointer-to-a-pointer, e.g. int **my2dArray.  You'll have to allocate an array of integer pointers (sizeof(int *) will be handy) and set my2dArray to point to the first element of that.  Finally you should loop over all the elements your array of int pointers and set them to point to freshly allocated blocks of memory to hold the ints themselves.  Once you've done that, you deserve a chocolate bar!  Be careful about how you free up all that memory, or you'll leave some chunks stranded (and create what's called a memory leak in the process).
 * What happens when you mix up your types when setting pointers to other variables?



=Header Files=

Header files crop up when programs get larger and we want to split our code over multiple files. They provide a way for the compiler to check that all the pieces are consistent before ploughing on and creating an executable.

In this example, we've re-worked our program to calculate the volume of the planets in the solar system, splitting the various bits of functionality over several files:

cd ../example7 make

Inside the directory you'll find the files, main.c, calcs.c and io.c, together with the header files, calcs.h and io.h. Take a few minutes to peruse each in turn. You'll see that we moved the function prototypes out the the header files and used the #include directive to include those prototypes wherever they are needed. For example the prototype for the arithmetic function volume is need to compile both the function itself, written in calcs.c, and also the main function, where the volume is 'called'. (Note, to 'call' a function is to request that it is executed with a specific list of arguments.)

A quick diagram will again be useful:



In addition to introducing header files, we've begun to use file i/o as well. The program is now reading the equitorial and polar radii from a text file called, appropriately enough, radii.dat. I'll move quickly over the gory details (a little cumbersome using only intrinsic functions) save to say that we grab each line of the file in turn:

then 'tokenise' the line, by splitting it around the tab character it contains, e.g.:

and lastly, for the number columns, we must convert the ascii character sequences into an actual numerical value which we can use in our calculations, e.g.:

We also introduce a new function to handle the printing to screen:

All of these changes result in a far more concise main program, which essentially only contains:

This program is small, so the increases in readability we've gained may not make a huge impact. However, this strategy applied to larger programs will pay real dividends!

Exercises
 * Add other functions, such as the calculation of surface area, to this program.
 * Add the names of the planets to the data file and read and store those too.

=Structures=

cd ../example8 make

Example8 is another re-working of our trusty planets program. This time we've availed ourselves of the ability to collect together related variables into a single package, known as a structure. This seemingly small manoeuvre has very far reaching benefits. Before we get into those, let's peruse the structure itself. We've declared it inside a new header file, main.h:

As you can see, we've grouped together all the things we'd like to know about the planet--it's name, various radii, volume etc.--and created a new datatype called planet. Neat eh? Not only neat, but we've also taken a subtle yet hugely important step in the way we think about our programs. We've moved from thinking about the functions we need to perform to the way in which our variables relate to each other. This is a cornerstone of object oriented programming, something which we'll hear a good deal about when looking at C++.

Now that we have our new datatype in place, we can declare an array of such things, just as easily as if we were declaring an array of intrinsic datatypes (such as integers and what not):

The big payoff arrives, however, when we no longer have to painstakingly pass in all the myriad variables required by some function, but instead we can just pass in an instance of our new structure, or even a whole array of structs in one go:

Once inside the function, we can access whichever parts of the structure we happen to be interested in, for example:

where the dot, ., sytax indicates that we're accessing the member of an instance of a struct.

In our re-worked volume calculating function, we see the arrow syntax, ->, this is a structure access including a dereference, since the function was passed not an instance of the struct, but a pointer:

Why did we pass the pointer? Well, we wanted to store the result of the calculation in the 'volume' element and, in order to make that assignment stick, we needed to use pass-by-reference, rather than pass by value.

Exercises
 * Add more functionality to the program--surface areas, orbits, journey times in a rocket etc. You'll need to add more 'fields' to the structure to accommodate the extra inputs and outputs for any new calculations.
 * Think up some new structures and add to your program. Stars & galaxies perhaps?  Too abstract?  How about plants and animals?  Mountains, lakes or rivers?

=Command Line Parsing=

The icing-on-the-cake for our planets program will be to add in command line parsing. With this feature the program can read a number of arguments passed to in and act accordingly. This can be a really useful feature if we want to maximise our runtime flexibility and hence minimise the number of times we need to recompile our program. To take a peek, let's

cd ../example9 make

Looking inside main.c, the first difference we note is:

where previously the arguments to the main function were listed as 'void'. argc is a count of the number of command line arguments passed to the program (including the program name itself), which we can use as a check upon proper usage of the program:

argv is a vector of strings (i.e. a 2d character array) which comprise the arguments themselves. The first argument we desire is a filename, and so that is easy enough to process. The second argument that we would like is the number of planets to process and so we must convert the string to a numerical value (integer in this case) as we have done previously when reading the values in the data file. Once we have that number, we can allocate an array of our planet structures accordingly:

This all makes our program more robust and flexible.

Exercises
 * Run the program with data for another planetary system, such as [Gliese_581]
 * Change the number of command line arguments, perhaps adding an argument to trigger output in a concise or verbose mode?

=Going Further=

Well, if you have read through the tutorial and made a stab at the exercises, you should have pretty much all that you need to go forth and conquer in the land of C! The books listed below will help plug any gaps in your knowledge that you find. Also, you may like to move on to C++, the enhanced, object oriented, all sing, all dancing big brother to C. If so take a look at the [CtoC++] tutorial in the pragmatic programming family.

Further Reading


 * The bible is The C Programming Language by Kernighan & Ritchie--often shortened to just K&R. Take a look at the 'A Good Read?' page for more details.