Difference between revisions of "MPI"
Line 34: | Line 34: | ||
If all processes are waiting to receive prior to sending, then we will have deadlock. See the example of a pairwise exchange. | If all processes are waiting to receive prior to sending, then we will have deadlock. See the example of a pairwise exchange. | ||
− | == | + | ==Some Parallelisation Examples== |
+ | |||
+ | First, numerical integration using the trapezoidal rule. | ||
+ | experiment with the number of strips. | ||
Notice that the accuracy does not increase monotonically. | Notice that the accuracy does not increase monotonically. |
Revision as of 10:27, 21 July 2010
MPI: Message passing for distributed memory computing
Introduction
MPI-1[ref]--the first incarnation of the standard--arrived in 1994 in response to the need for a portable means to program the growing number of distributed memory computers appearing in the marketplace. MPI stands for Message Passing Interface, and as its name suggests, it is an API, rather than a new programming language. At the time of writing, MPI can be used in C, C++, Fortran-77 and Fortran-90/95 programs. We will see that MPI-1 contained little on the topic I/O. This was rectified in 1997 with the arrival of MPI-2[ref], which contained the MPI-IO standard (supporting parallel I/O) along with additional functionality to support the dynamic creation of processes and also one-sided communication models.
We can extend Flynn's original Taxonomy[ref] with the acronym SPMD--Single Program Multiple Data. This emphasises the fact that using e.g. MPI, we can write single programs that will execute on computers comprised of multiple compute elements, each with its own--not shared--memory space.
Hello World
The quintessential start. Programs in C, Fortran-77 and Fortran-90.
MPI_Init() and MPI_Finalise(). MPI_Comm_Size() and MPI_Comm_rank(), which use the communicator MPI_COMM_WORLD.
These programs assume that all processes can write to the screen. This is not a safe assumption.
Send and Receive
Processes send their messages back to the master process, and it then prints to screen. A much safer program.
MPI_Send(), MPI_Recv().
The triple (address, count, datatype) provides a pattern.
Tags. Can use these as a form of filtering: junk mail, bin; handwritten & perfumed, open now!; bill, open later!
Communicators. Messages do not pass between different communicators. We can create custom communicators.
A Common Bug
If all processes are waiting to receive prior to sending, then we will have deadlock. See the example of a pairwise exchange.
Some Parallelisation Examples
First, numerical integration using the trapezoidal rule. experiment with the number of strips.
Notice that the accuracy does not increase monotonically. Monte Carlo techniques are suited to parallelisation. In particular, they are robust to the loss of a compute element.
Experiment with the number of throws at the dartboard. What happens to the accuracy of the estimate on average? Ask yourself, how accurate do I need an estimate in order to solve my problem?
Synchronisation, Blocking and the role of Buffers
Independent 'compute elements' Synchronised communication requires that both sender and receiver are ready. Through the introduction of a buffer, a sender can deposit a message before the receiver is ready. MPI_Recv() only returns when the message has been received, however. Hence the term blocking.
Exercises
- What happens if the tags don't match? (ans. deadlock)
- Create two custom communicators: master & evens. master & odds. Write a 'chinese whispers' program that cycles messages around the two communicators in a round-robin fashion, randomly morphing a character..
Non-Blocking Communication
Having processors often idle, waiting to receive messages when they could be getting on with something useful, will degrade performance. One option is to use non-blocking send and receives.
MPI_Isend(), MPI_Irecv().
Latency Hiding: first class letters, coal, canals & power stations
With increasingly parallel architectures, latency will only get worse.
There will be an inevitable latency between the time a message is sent and when it is received. Say ~24hrs for a first class letter. If we sat twiddling our thumbs while we waited for the letter, we wouldn't get much done. If on the other hand, we can profitably spend our time working on something until the letter arrives, then we have effectively hidden the latency time. Think of a coal-fired power station that receives its fuel by canal barge. The barge may take a long time to travel between the pit and the boilers. If, however, the power station has a sufficient buffer, then the time spent on the canal doesn't matter.
Asyncronous
We want our codes to be as asynchronous (and latency tolerant) as possible.