Attempting to debug such programs can be frustrating as the introduction of debugging statements in the code can change the behavior of the program enough to prevent the bugs to appear. The execution of debugging statements introduced in parallel tasks slows them down, affecting their execution in different ways. This in turn makes them interact differently over time. Such a disruption is sometimes sufficient to prevent a spurious bug from showing up when the program is being debugged.
It is thus paramount to write parallel programs with debugging in mind, and to incorporate debugging mechanisms right from the start of the writing process. Although the appeal of creating a program with many parallel tasks exchanging numerous messages with each other may be tremendous for the programmer, faster development time will be achieved if a disciplined and methodological approach is followed, and if the level of parallelism and the amount of communication is kept under tight control. The development communication modules is an important step to write modular application, in a methodological way. But the discipline part has to come from the programmer.
We will concentrate on preventive, or defensive debugging techniques, where the goal of the programming effort is to imbed debugging constructs in the code, and to fashion the code in such a way that running the program in debugging mode can be accomplished with little effort.
But first, we look at debugging techniques, which include the use of commercial debuggers, and tracing techniques for monitoring the use of stack and of dynamic memory.
We shall nevertheless attempt to cover it briefly here, both for completeness, and also to introduce a tool which may prove invaluable in designing a parallel applications.
We can categorize debuggers for parallel programs in two broad classes: the "high-tech" and the "low-tech" tools, to use Carriero and Gelernter's terms [CARRI90]. The low-tech approaches typically offer the user many windows into which a sequential debugger follows the execution of a given task or process. The setting of break-points, step-by-step and trace runs, as well as the examination of local variables are facilities usually offered by such debuggers. High-tech debuggers add a higher level of sophistication by providing some representation of the global state of the program. This can be a graphical representation of the network running the application with information relating to the nodes, to the state of the channels, and about the location of specific messages.
We will target our description to the Parascope debugger, designed for the transputer environment, which is both a post-mortem and an interactive debugger. Parascope has elements of both high and low-tech debuggers, but its limitation to actually visualize two tasks at a given time keeps it in the realm of low-tech debuggers. Be careful not to associate low-tech with low-quality: Parascope is a powerful tool whose use should be learnt carefully.
Stopping the cio server, however, does not alter or change the state of the transputers, so that the debugger, once started on the host can interrupt the transputers and create a file containing an exact image of the status of the memory in each transputer. With this image, which it stores in a file, the C source files, and a symbol table file, the debugger then recreates a user-readable representation of the state of the transputers.
This is why this type of debugging is called post-mortem. The information that is displayed by the debugger is really not live, but a reconstruction that is stored in the memory of the host. This information is just a picture of what was in the transputers' memory at the time the user loaded the debugger. Even if the debugger were to allow a variable to be modified, for example, the modification would be done only in the host memory, and not in the transputer memories.
A quick example will illustrate how complex some of the issues can be. While debugging a serial program, you often put breakpoints at key statements, say after a loop, or just on the return of a subroutine, so that the status of key parameters can be checked. The breakpoint stops the program completely, and puts it in a suspended state. With a C program running on two transputers, for example, how should breakpoints work? Assume that the two transputers are running the same program. If we put a break after Statement Si, for example, should it be effective for both transputers? If so, then both will stop on the same statement, but probably not at the same time. So our debugger cannot but change the dynamics of the two programs. If, however, the two transputers run different programs or different parts of the same program, so that Si appears only in the code run by the Root transputer, what should the second transputer do while the first one is suspended? Should it continue? Should it stop as well?
If your choice is that it, too, should stop then how should it find out that is has to stop? The only possible ways are via the transmission of a message on a hard channel, or through the activation of one of the extra signals exchanged by transputers along with data links: analyze and reset. But the analyze and reset signals have special functions which may be required for other purposes by the debugger. And sending a message works only if the second transputer is running code specific to the debugging process. This becomes quite complex. It is probably then easier to assume that the breakpoint will take effect only on one transputer. This implies that the second transputer will be continuing its computation while the second one is suspended. If the second transputer runs code that includes any fail-safe channel communication functions, such as ChanOutTimeFail, then it will very likely stop or aborts on its own accord, rendering any attempt to restart or continue the program execution futile.
Thus Interactive debuggers have more complex issues to deal with.
To start with, let's consider our prime-finding program, written for one transputer.
/* =======================================================================
prime2.c
DESCRIPTION:
prime finding program for two transputers. Used for debugging example.
TO COMPILE AND RUN:
make -f prime2
chainnif -# 1 -1 prime2 -nif prime2.nif
ld-net prime2
ASSOCIATED NIF-FILE
buffer_size 200;
host_server CIO.EXE;
level_timeout 400;
decode_timeout 2000;
1, prime2, R0, 0, , ,;
====================================================================== */
#include <stdio.h>
#include "conc.h"
/* =========================== DEFINITIONS ============================ */
#define SENTINEL -1
/* =========================== PROTOTYPES ============================= */
void Print(Process *P, Channel *softc);
void Compute(Process *P, Channel *softc);
/* -------------------------------------------------------------------- */
/* MAIN */
/* -------------------------------------------------------------------- */
main(int argc, char *argv[])
{
Process *ComputeP, *PrintP;
Channel *softc;
debug("prime2");
softc = ChanAlloc1("SoftChan");
ComputeP = ProcAlloc(Compute, 4096, 1, softc);
PrintP = ProcAlloc(Print, 4096, 1, softc);
if ((softc==NULL) || (ComputeP==NULL) || (PrintP==NULL))
{
printf("Out of dyn memory\n");
exit(1);
}
ProcRun(ComputeP);
ProcRun(PrintP);
while (1);
}
/* -------------------------------------------------------------------- */
/* COMPUTE (Concurrent Task) */
/* Compute primes in the interval [1..1000]. */
/* -------------------------------------------------------------------- */
void Compute(Process *P, Channel *softc)
{
int i,j,x;
i = 0;
/*--- for each integer in the range ---*/
for (x = 1; x<1000; x++)
{
if (IsPrime(x))
ChanOut(softc, &x, (int) sizeof(int));
}
x = SENTINEL;
ChanOut(softc, &x, (int) sizeof(int));
}
/* -------------------------------------------------------------------- */
/* PRINT (Concurrent Task) */
/* Gets primes from Compute tasks and displays them on the screen. */
/* -------------------------------------------------------------------- */
void Print(Process *P, Channel *softc)
{
int x, Done = 0;
while (!Done)
{
ChanIn(softc, &x, (int) sizeof(int));
if (x==SENTINEL)
Done = 1;
else
printf("%d ", x);
}
}
/* -------------------------------------------------------------------- */
/* ISPRIME */
/* Semi-efficient prime finding function which, given x, returns 1 if */
/* it is prime, and 0 otherwise */
/* -------------------------------------------------------------------- */
int IsPrime(int x)
{
int i; /* 0 1 2 3 4 5 6 7 8 9 */
static int SmallPrimes[10] = {0,0,1,1,0,1,0,1,0,0};
if (x<10) return SmallPrimes[x];
if (x%2==0) return 0;
if (x%3==0) return 0;
if (x%5==0) return 0;
for (i = 2; i*i<=x; i++)
if (x%i==0) return 0;
return 1;
}
Note the highlighted statements. They are required by the Parascope debugger if we want to have some level of automation in the debugging process. The call to the debug function (linked to our program externally) allows the debugger to automatically find the source and symbol table files when it first analyzes the Root transputer. The ChanAlloc1 function (also linked externally) forces the program to keep each soft channel in a linked list, with identifiable names given by the programmer, which allows the debugger to access them and find their status.
Let's start the program, but making sure that we abort the cio server before the program completes the printing of all the primes.
cio ('C' I/O Driver), Version 89.1 [Link I/O Driver: 'lspcdma']
Copyright (c) 1986-1989 by Logical Systems
1 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211
223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331
337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449
457 461 463 467 479 487 491 499 503 509 521 523 ^C
C:\cdbug /s=80180000 /e=80200000 /n
Transputer Server for Modula-2, Version 1.0
Copyright (C) 1989 Computer System Architects
booting debugger Resident Library Kernel, V1.1, May 30, 1989
The debugger, instructed as to where it should load itself in the transputer
memory (starting and ending addresses), displays a
graphics screen with several windows, as shown in Figure 6-1.

Figure
6-1: Parascope window system.
The debugger is mouse driven, and clicking the left, right, or both buttons at the same time triggers different actions.
Do not worry if your debugger does not show the same screen organization. We are only interested in observing how this debugger works, and get a feel for the kind of information we can expect to get from such a tool in a parallel environment. Let's take a look at some of the windows and see what they allow us to peek in.
The Top left window of Figure 6-1 is the Source window. It displays the source code
containing the last C statement to have been executed in the module selected:
ProcPar(PrintP,ComputeP,0);
The Source window displays the source code of the part of the program that is being analyzed.
The window on the right of the source window is the Process Window. It
lists all the processes that were running on the transputer currently debugged
(the Root in our case). We see main, Print, and Compute, with additional
information about what they were currently doing when the computation was
aborted. Stopped by reset means that main was actually stopped by the
debugger and not by our action... Is this possible? We did press Ctrl-C to
abort the program. The DOS session shown above clearly shows that we return to
DOS before the transputer program has a chance to output all of its primes...
So why is main stopped by the debugger, which we start only later?You may have guessed it. The reason is that we never actually aborted the program running on the transputer; we only stopped the server which was consuming the primes by printing them. The transputer program was the producer, and we simply blocked it, by stopping its consumer.
Having stopped cio resulting in blocking the Print task which outputs primes to the Host via printf statements. This, in turns, blocked the Compute task which must provide each prime it finds to Print over a soft channel.
Now the information in the window makes sense. Print is blocked outputting to link0, and Compute is blocked waiting to transfer an integer over the soft channel SoftChan (remember that we used the ChanAlloc1 debugging function provided with Parascope to better monitor soft channels. Here Parascope is showing us the name we gave the soft channel when we called ChanAlloc1).
Pointing the mouse cursor to either one of the processes in this window and clicking the left button will make Parascope update the whole display to show the status of the process just selected. This includes showing the code and local variables associated with that process.
Pressing both mouse buttons in this window instructs Parascope to display a list of the four hard channels Link0, Link1, Link2, and Link3. Selecting 1 of them "transports" Parascope to the transputer connected to the present one via that link. Debugging a multi-transputer program is thus possible by moving from transputer to transputer via selected links.
The Process window shows the tasks that were either running or blocked on the transputer when parascope started its analysis. This window can be used to move from transputer to transputer via hard links and debug a multi-transputer program.
The Data 1 window displays the local variable associated with the
process or task currently selected (probably by clicking it in the Process
window). Here Parascope is displaying the variables associated with main. We
see the standard arguments argc and argv, and the process pointers PrintP and
ComputeP used to define our two tasks Print and Compute, respectively, along
with the soft channel softc shared by both tasks to exchange the primes.Parascope displays the value associated with each variable. We are reminded here that Inmos numbers its memory with negative numbers, hence the most significant bit of the pointers are set to 1, resulting in a first hex digit 8 instead of the more common 0.
But there is more information that can be gathered from this window. But pressing the left mouse button on "ComputeP" and then "Points to," we get to take a look at the program descriptor, that is the structure that holds the information relative to the parallel task Compute. We see the function pointer, indicating the address of the function containing the code of the task, the workspace pointer, showing the address of the area of memory used as a stack, its size, defined as wssize, and the number of parameters passed to the task when it starts: nparam. This is a powerful feature: each local variable is shown along with its type and can be examined, and pointers can be followed.
The Data 2 window is similar to the Data 1 window and can be selected instead of Data 1 for displaying the local variables of a task. This feature can be useful if we want to see the locals associated with Print and Compute in two separate windows, so that a comparison can be made. Let's do just that!
We see now the state of the computation for both tasks. Print shows a value of
x equal to 541. This is the next prime to be printed after 523, which is the
last prime output by the program before we stopped it.
In summary:
The Data 1 and Data 2 windows display the local variables associated with the currently selected process. The locals are shown as symbols along with their types and values. Pointers to structures can be followed to expose the contents of the structures. Similarly, individual array cells can be analyzed by pointing to an array and by providing an index.
The memory window is the last window we will explore here. The example on the
left shows it expanded (an option available by pressing the two mouse buttons
simultaneously). It accepts an address provided by the user, and can display
the raw dump of memory words located at that address as characters (as here),
bytes, or words. Here we used the Data 1 window to find out where the argv pointer was pointing to, and used the memory window to display the command line arguments.
The memory window offers the simplest form of debugging possible. We select an address and it displays the contents of the memory starting at that address. Isolating and recognizing the information may require some practice, but it is the simplest, though crudest, form of debugging.
The Memory window provides the user with a dump of a memory area. The 32-bit words can be shown as words, bytes, or ASCII characters.
You now have an idea of what a parallel debugger can do. It is in many ways similar to a debugger for serial programs. The added twist is that the parallelism of the computation adds extra dimensions to the process of debugging an application. This is where practice along with several cups of coffee will turn an awkward tool into an invaluable partner. Do not miss on this opportunity! The time spent learning how to master a debugger will be saved many folds in the long run.