Parallel Programming in C for the Transputer
© D. Thiébaut, 1995




EXERCISES


 
6-3

In Figure 6-3, the links on the very left, as well as the links on the very right are mapped to hard channels, while the inside links are soft channels. Why is it necessary to let the channels on the right be hard channels?

.


 

 
6-4
Design a three-transputer application for computing primes using the defensive debugging technique. The network should be a linear chain, and all three transputers will be computing primes in a given interval whose bounds are a function of the node number (be careful here when the application will run on the root!) The Root and Node 2 will be relaying primes as well as computing them. Make sure your program works on the Root before trying it on three transputers.


 

 
6-5

What modifications do you need to bring to map.h to make it compatible with 4-link transputers? This way rectangular meshes can be debugged with the technique presented in this chapter. .


 

 
6-6
Root debugging has another interesting use. It allows one to test complex networks that may not be feasible, for some reason. For example, assume that you have an Educational Kit™ from CSA in your PC at home, with a T4XX family transputer boasting 2 active links. On the other hand, you have in your lab at work a sophisticated system with 16 T8XX family transputers that you can connect as a 4 by 4 mesh. By using the Root debugging technique, you can start developing and testing your application on your home PC, by creating 16 logical nodes on the root, connected as a mesh.

To investigate this in more details, you will code a simple experiment were each node of the 4 by 4 mesh will represent a cell in Conway's game of life. You will only need to implement one step of the simulation, where each node decides on the fate of its cell by finding how many neighbors it has. If that number is less than 2 or greater than 3, then the cell will die. Otherwise the cell will continue to live, or will be born, depending on its previous status.

You will need to pay special attention to the way you define the map.h file, since you have only one physical transputer, which we will assume to have only two hard links. Your map.h file should define 16 logical nodes, connected by soft channels to form a mesh.


 




6-4 Managing the stack and heap

Running out of stack creates unpredictable program behavior that is sometimes hard to diagnose. In a parallel environment, stack overflow is only one of many possible reasons a program may hang, and a careful management of the stack is of prime importance. Unfortunately, the LS C compiler does not provide an option for checking stack overflow, and the transputer does not have any hardware support for checking the stack either. Some processors do this by generating an interrupt, for example when the stack pointer passes some bound register. Unfortunately, the transputer maintains only one register, the workspace register, to define the stack area. The responsibility of allocating enough stack is thus left to the programmer.

The stack area for a function can normally be allocated in one of two areas, depending on whether the function is the result of a nested call from the main function, or if it is the result of a nested call from a parallel task. We will refer to the first case as the default stack area, and to the second as an allocated stack area.

The Default Stack

The default stack area is defined at link time. The normal way to provide this information to the linker is via a STACK entry in the link file, as shown below.

FLAG
LIST     tracer.map
INPUT    tracer
ENTRY    _vcmain
LIB      t8lib
LOAD     0x80002000
STACK    0x80001FFC

The STACK entry specifies that our default stack will occupy the first 16380 bytes of memory. The LOAD entry, on the other hands, controls where the program code is loaded. In the example above, the stack will occupy approximately 16 Kbytes, from address 0x80000000 to 0x80003FFC, the lower two or four Kbytes of this range mapping to the internal RAM of the transputer.

The default stack is used by main and all the functions resulting from nested calls issued from main, and only by them. So the first rule we should adopt is to always make sure that the amount of default stack area is large enough to hold the local variables declared in the functions called by main, and also large enough to run library functions. This is an important point: Some of the library functions, such as printf for example, may require close to 1 Kbytes of stack! It is thus recommended to allocate memory very liberally:

Stack Rule
Be conservative

Always allocate at least 2 KBytes more of stack space than the total area required by all the local variables of main and its functions.

The Allocated Stack

The allocated stack, as its name indicates, is allocated from heap storage. It is created by a call to ProcAlloc. Creating a new task requires three things: A pointer to the name of the function embodying the code to be executed, a set of parameters passed to the code when the task starts, and a parameter specifying the size of the stack area that will be used by the task. This pointer often appears as the macro WS_SIZE in the Logical Systems literature, standing for work-space size.

The allocated stack works exactly the same way the default stack does, except that the allocated stack is private to each parallel task. Two parallel tasks running on the same transputer will require two separate stacks. The size of each one must be large enough to contain the frames of all functions called directly or indirectly by the task. We see here a clear disadvantage of having many parallel tasks: they all require stack space that cannot be shared, and even though several of these tasks may call identical functions, each task must be able to provide a sufficient amount of frame space for these functions.

Making Room for the Stacks

As the number of parallel tasks running on a transputer increases, so does the memory allocated to their workspaces. We must therefore insure that the amount of dynamic memory available to the program is large enough to hold them. By default this amount is set to 128 Kbytes. To increase it beyond this value, the external variable _heapend defined in the standard library must be modified.

The function below shows a simple method for resetting _heapend and for measuring the amount of heap available.

void DisplayHeapSize(void)
{
     int  base = 1, Offset=0;
     char *ptr;

     if (_node_number==1)
         printf("heapend = %0X",_heapend);
     _heapend = (void *)0x800FFFFC;
     if (_node_number==1)
         printf(", reset to %0X\n",_heapend);

     while (1)
     {
         base *= 2;
         ptr = (char *) malloc(Offset+base*(int) sizeof(char));
         if (ptr==NULL)
         {
                 base /= 2;
                 if (base <= 1) break;
                 Offset += base;
                 base = 1;
         }
         free(ptr);
     }
     if (_node_number==1)
         printf("%d bytes can be allocated\n",Offset+base);
}
The function first prints the value of _heapend before resetting it to the highest RAM address inside the transputer (1 MByte in our case).

A simple example will show how this works. Assume that 500 KBytes of heap space are available. Before the while-loop starts, offset is set to 0. The while loop then increases base until it is too large to be allocable: 512 K. The if test then stores 256 K in offset, resets base to 1, and the while loop continues. It now tries to find the largest block which, once added to 256K can be allocated. It will find that 128 K is the limit, and offset will become 256 K + 128 K = 384 K. And so on.

This function is very quick and reports an accurate size for the available heap.

6-5 Using High and Low Priority Tasks

High or low priorities?

The transputer supports tasks with two levels of priorities: high priority tasks, which cannot be preempted and which monopolize the processor until they block on a channel or deschedule themselves, and low priority tasks that follow a round robin scheduling when they are active. What priority should we use when we implement tasks? Are there rules to follow for the assignment of priorities? Are there times when the priority should dynamically change during the computation?

To answer some of these questions, let's revisit our familiar prime-finding problem, and assign it to two transputers. We will use a simple and standard configuration where the application relies on two tasks to carry out the workload of a transputer. The first task, the Compute Task, is given one number at a time, tests it, and returns a boolean indicating whether that number is prime or not. The second task, Dispatch, is in charge of distributing the numbers and receiving or relaying their associated booleans back to the root. No buffering is implemented at any level. The Dispatch Task on the Root has the added function of giving out the primes and of saving the booleans in an array. A diagram of the task decomposition of this application is shown in Figure 6-4.


Figure 6-4: Task decomposition of prime finding application.

Efficiency versus flexibility tradeoffs

Before we move on to look at the code in details, let's discuss some of the choices made when creating the task decomposition. In particular, note that in Figure 6-4 Node 2 supports two tasks. Is it necessary? After all, we could have used only the Compute task and interfaced it directly with the Dispatch task of Node 1. The answer depends on the level of efficiency we want to achieve, and on the amount of flexibility we want to create for this system. Dispatch on Node 2 does nothing more than relaying information back and forth between Node 1 and the Compute Task. Since no buffering is implemented, we could have done away with this task on Node 2 and would have saved the switching time incurred by the multitasking. This would have made the application slightly more efficient, both in reducing the code required by Node 2, and in removing unnecessary delays. We chose to keep a relay task in the second node, however, to keep the symmetry of the system, and to allow easy scaling to higher number of nodes. This makes the application more flexible and removes the special case of having the last node in a chain contain only one task rather than two.

The code

Let's take a look a the code now.

/* =======================================================================
   hiprio.c

   DESCRIPTION:
   Example program used to compare efficiency of mixing high and low
   priority tasks.

   TO COMPILE AND RUN:
   make -f hiprio
   chainnif -# 2 -1 hiprio  -nif hiprio.nif
   ld-net hiprio

 ====================================================================== */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdarg.h>
#include "conc.h"

/* =========================== DEFINITIONS ============================ */
#define DONE    -1
#define MAXDIM  600000

/* =========================== PROTOTYPES ============================= */
void AbortIf(int, char *);
void Dispatch(Process *P, Channel *, Channel *);
void Compute(Process *P, Channel *, Channel *);
int  IsPrime(int x);

/* ============================= GLOBALS ============================== */
unsigned char Table[MAXDIM];
int LocalCount=0, RemoteCount=0;

/* -------------------------------------------------------------------- */
/*                                 MAIN                                 */
/* -------------------------------------------------------------------- */
main()
{
     Process *DispatchP, *ComputeP;
     Channel *C2R, *R2C;
     int TimeStart, TimeStop;

     /*--- Initialize ---*/
     AbortIf(!(C2R = ChanAlloc()), "Allocation Error");
     AbortIf(!(R2C = ChanAlloc()), "Allocation Error");
     AbortIf(!(DispatchP = ProcAlloc(Dispatch, 4096, 2, C2R, R2C)),
              "Alloc Error");
     AbortIf(!(ComputeP = ProcAlloc(Compute, 4096, 2, C2R, R2C)),
              "Alloc Error");

     /*--- Start timer, and launch tasks ---*/
     TimeStart = Time();
     ProcPar(DispatchP, ComputeP, 0);

     /*--- Summary ---*/
     TimeStop = Time();
     if (_node_number==1)
     {
          printf("Computed Primes: %d total. Remotely: %d,  locally: %d\n",
               RemoteCount+LocalCount, RemoteCount, LocalCount);
          printf("Elapsed time = %10.6f seconds\n",
               (TimeStop-TimeStart)*0.000064); exit(1);
     }
}

/* -------------------------------------------------------------------- */
/* COMPUTE                                                              */
/* Receives integers from the Dispatch task and determine if they are   */
/* primes or not.  Returns 1 or 0 to Dispatch depending if number was   */
/* found to be prime or not.                                            */
/* -------------------------------------------------------------------- */
void Compute(Process *P, Channel *C2R, Channel *R2C)
{
     int x = 0;

     /*--- start the parallel program by sending 0 to Dispatch ---*/
     ChanOutInt(C2R, x);

     /*--- main loop: get a number, test it, and return result ---*/
     while (1)
     {
          x = ChanInInt(R2C);
          if (x==DONE) break;
          x = IsPrime(x);
          ChanOutInt(C2R, x);
     }
}

/* -------------------------------------------------------------------- */
/* DISPATCH                                                             */
/* Passes numbers to Compute tasks (remote or local) and gets boolean   */
/* back.  If Dispatch is on Node 1, then the boolean is stored in an    */
/* array, otherwise it is passed to the other Dispatch task.            */
/* -------------------------------------------------------------------- */
void Dispatch(Process *P, Channel *C2R, Channel *R2C)
{
     int x, count, index;
     int LastLocal = 0, LastRemote = 0;

     /* ================================================================ */
     /*                             NODE # 1                             */
     /* ================================================================ */
     if (_node_number==1)
     {
          for (count = 1; count<MAXDIM; count++)
          {
               /*--- alternate checking the two channels ---*/
               if (count%2)
                    index = ProcAlt(LINK1IN, C2R, 0);
               else
                    index = 1-ProcAlt(C2R, LINK1IN, 0);
               if (index==0)
               {
                    /*--- Get boolean for last remote integer sent,
                    and store it ---*/
                    x = ChanInInt(LINK1IN);
                    Table[LastRemote] = x;
                    /*--- send a new number to test ---*/
                    ChanOutInt(LINK1OUT, count);
                    LastRemote = count;
                    RemoteCount++;
               }
               else
               {
                    /*--- Get boolean for last local integer sent,
                    and store it ---*/
                    x = ChanInInt(C2R);
                    Table[LastLocal] = x;
                    /*--- send a new number to test ---*/
                    ChanOutInt(R2C, count);
                    LastLocal = count;
                    LocalCount++;
               }
          }

          /*--- We've sent all the numbers to be tested.  Wait until ---*/
          /*--- they all come back                                   ---*/
          while (RemoteCount+LocalCount<MAXDIM+1)
          {
               index = ProcAlt(LINK1IN, C2R, 0);
               if (index==0)
               {
                    x = ChanInInt(LINK1IN);
                    Table[LastRemote] = x;
                    x = DONE;
                    ChanOutInt(LINK1OUT, x);
                    RemoteCount++;
               }
               else
               {
                    x = ChanInInt(C2R);
                    Table[LastLocal] = x;
                    x = DONE;
                    ChanOutInt(R2C, x);
                    LocalCount++;
               }
          }
     }

     else
     /* ============================================================== */
     /*                              NODE # 2                          */
     /* ============================================================== */
     {
          while (1)
          {
               /*--- Get integer, pass it up to Compute... ---*/
               x = ChanInInt(C2R);
               ChanOutInt(LINK0OUT, x);
               /*--- Get result back from Compute and pass it back
               to Dispatch on Node 1 ---*/
               x = ChanInInt(LINK0IN);
               ChanOutInt(R2C, x);
               if (x==DONE) break;
          }
     }
}

/* ------------------------------------------------------------------ */
/* ISPRIME: Semi efficient prime finding function which, given x,     */
/*          returns 1 if it is prime, and 0 otherwise                 */
/* ------------------------------------------------------------------ */
int IsPrime(int x)
{
     int i;                    /*  0 1 2 3 4 5 6 7 8 9 */
     static int SmallPrimes[10] = {0,0,1,1,0,1,0,1,0,0};

     if (x<10) return SmallPrimes[x];
     if (x%2==0) return 0;
     if (x%3==0) return 0;
     if (x%5==0) return 0;
     for (i = 7; i*i<=x; i++)
          if (x%i==0) return 0;
     return 1;
}

/* ------------------------------------------------------------------ */
/* ABORTIF: Stops the computation is the condition is true.           */
/* ------------------------------------------------------------------ */
void AbortIf(int condition, char *mesg)
{
     if (condition)
     {
          if (_node_number==1) printf("***%s***\n", mesg);
          exit(1);
     }
}

Listing 6-4: program hiprio.c running on two nodes, each supporting two concurrent tasks.




[Previous] [HOME] [NEXT]