Parallel Programming in C for the Transputer
© D. Thiébaut, 1995



5-4 Communication between multitasks

When we dealt with communication between tasks in Chapter 4, the tasks were on different transputers and communicated through hard channels. Hard channels are accessed through the use of the ChanIn and ChanOut functions, with a channel pointer identifying the direction of the communication (LINK0IN, LINK0OUT, LINK1IN, LINK1OUT, etc.)

The situation now is slightly different. The tasks that must communicate with each other reside on the same transputer. While message passing was the only solution for inter-node communication, several options are now available to us, depending on how the tasks share the information. We will investigate three methods:

Shared variables

Let's consider the following problem. We want to display the Mandelbrot set on the video screen of our host system [MAND83]. Displaying the Mandelbrot set has become a very popular way to test the performance of computer systems while generating an attractive and colorful display on video screens to attract potential buyers. Both the floating point support and the graphics system are tested by the program. Each pixel on the screen represents a point z in the complex plane. The Mandelbrot set is created by taking each complex point C in a rectangular region (mapped to the video screen) and computing the degree with which the series {zn} diverges, when it is defined as follows:

z0 = 0
zn+1 = zn2+C                                        
(5-1)

  Where zn and zn+1 represent the value of z at iteration n and n+1, respectively. The point C, corresponding to a single pixel on the screen, is assigned a color that translates the speed with which zn diverges away from z0. The following section from a Turbo C program shows how colors can be assigned (see the book by Pfeitgen [PFEI89] for more details):

#define THRESHOLD 10
int i, distance;

...
/* assume a point z0 has been selected */
for (i=0; i < getmaxcolor(); i++)
{
/* iterate zn = zn-1*zn-1 + C */
...
/* compute distance between zn and z0 */
...
if (distance > THRESHOLD)
break;
}
if ( i>=getmaxcolor())
putpixel(1,1,BLACK);
else
putpixel(1,1,i);


Figure 5- 4: Geographic distribution of the tasks for the Mandelbrot problem.

Now, assume that we program the root transputer to compute the color for each pixel and that we delegate the display of pixels of different colors to our PC host. The root runs two concurrent tasks, Task1 and Task2, the first is responsible for computing the color for each point, while the other is in charge of sending the coordinates of the point and its color to the PC host. The geographic distribution of the tasks is shown in Figure 5-4.

Task1 generates the (x,y,color) triplets and passes them on to Task2 which transfers them over Link0[1] to the host. Task1 is the producer, and Task2 the consumer. The data shared by the two is a queue holding (x,y,color) triplets. The queue is maintained through two indexes, Front and Back, that are shared by both tasks. Because Task1 always enqueues data, and Task2 always dequeues data, the Front and Back indexes are modified by only one task, and can be safely implemented as shared variables. The protocol between the two tasks will be to exchange data as follows:

/* Task1 */
while (!Done)
{
     /*--- compute color of new point ---*/
     ...

     /*--- enqueue newly found point ---*/
     while (Full(&Queue) == TRUE) 
          /* wait */;
     Enqueue(&Queue,x,y,color);
}


/* Task2 */
while (!Done)
{
     /*--- check for available points ---*/
     while (Empty(&Queue) == FALSE)
          /* wait */;

     /*--- dequeue point ---*/
     Dequeue(&Queue,&x,&y,&color);
     
     /*--- send triplet to host ---*/
     ...
}

We assume that Enqueue appends a triplet at the back of the queue, updating the Back pointer along with the Full condition in the process. Similarly, Dequeue removes the triplet from the front of the queue, and updates the Front pointer along with the Empty condition.

The shared-variable scheme will work because Task1 is the only one authorized to modify Front, and Task2 the only one authorized to set modify Back. Hence there cannot be any conflict between the two tasks.

Before we move on to the solution involving semaphores, let's take a second look at the code shown above. The astute reader will certainly notice that a lot of processing time will be wasted in the wait loops. Since both tasks must busy-wait on each other, they may spend a great deal of their quantum in the loops. A more efficient solution in this case is for the tasks to deschedule themselves when they encounter a wait condition, rather than busy-wait until the end of their allotted quantum. The ProcReschedule function is a natural choice in that case.

void ProcReschedule(void)

Whenever a multitasked process executes a ProcReschedule, it is automatically descheduled and put at the back of the list of active tasks with identical priority levels.

/* Task1 */
while (!Done)
{
     /*--- compute color of new point ---*/
     ...

     /*--- enqueue newly found point ---*/
     while (Full(Queue) == TRUE) 
          ProcReschedule();
     Enqueue(Queue,x,y,color);
}


/* Task2 */
while (!Done)
{
     /*--- check for available points ---*/
     while (Empty(Queue) == FALSE)
          ProcReschedule();

     /*--- dequeue point ---*/
     Dequeue(Queue,&x,&y,&color);
     
     /*--- send triplet to host ---*/
     ...
}

Task1 keeps on creating triplets and euqueueing them. In case the queue reaches full capacity, Task1 deschedules itself to give as much processing time to Task2 to start emptying the queue. On the other hand, Task2 keeps on descheduling itself when it sees the queue empty. When the queue contains one or more triplets, Task2 dequeues a triplet and sends it to the host. Remember that this last operation will involve ChanOut which is blocking, and which will automatically deschedule Task2 until the transfer is over.

Instead of looking at a full implementation of the program running on the root transputer now, let's postpone generating the whole code until we have covered all three paradigms. We will next look at a semaphore implementation of this same application.

Semaphores

Semaphores appeared first in the context of operating systems design. Operating systems provide a rich area for parallelism, especially in maintaining resources shared by several processes or programs.

We will cover the use of semaphores in detail in the next chapter, and we will concentrate here mostly on their declaration and the general locking mechanism they implement. The main task of a semaphore is to provide mutual exclusion in a protected region of memory available to several processes. This region can contain code or data, with the latter being the one most frequently associated with semaphores.

The mutual exclusion may work for individual processes, or for groups of processes. When the exclusion allows only one process at a time in the protected region, the semaphore is called a binary semaphore, since two states are sufficient to indicate the status of the region: 0, the region is accessible, 1, it is in used and cannot be accessed. When groups of processes are allowed to access the region at the same time, the semaphore is usually implemented with an integer, such as with the Unix System V operating system. The access of variables protected by semaphores is controlled by three main operations [BRAW89]:

The conc.h library supports binary semaphores with four functions: SemAlloc, SemFree, SemP, and SemV. Let's put them to work directly in our previous example.

/*=== GLOBAL AREA ===*/
Semaphore *QueueSem;

...
/* Initialize Semaphore */
QeueSem = SemAlloc();
if (QueueSem==NULL)
	exit(1);
...

/* Task1 */
while (!Done)
{
     /*--- compute color of new point ---*/
     ...

     /*--- enqueue newly found point ---*/
     while (Full(&Queue) == TRUE) 
          ProcReschedule();
     SemP(QueueSem);
     Enqueue(&Queue,x,y,color);
     SemV(QueueSem);
}


/* Task2 */
while (!Done)
{
     /*--- check for available points ---*/
     while (Empty(&Queue) == FALSE)
          ProcReschedule();

     /*--- dequeue point ---*/
     SemP(QueueSem);
     Dequeue(&Queue,&x,&y,&color);
     SemV(QueueSem);

     /*--- send triplet to host ---*/
     ...
}
...
/*--- All the points have been generated and plotted: ---*/
SemFree(QueueSem);

Although we have already stated that the queue can be managed by both tasks without any possible conflicts (if the queue is implemented in a circular buffer where Front and Back "roll-off" from the last cell to the first one, for example) we can imagine a different implementation of the queue that resets both Front and Back pointers to the first cell when the buffer is empty. In this case, both tasks must have modify-privileges on Front and Back, and the semaphore implementation is justified.

QueueSem is declared as a semaphore, and initialized by SemAlloc[2]. The first task to call SemP will find QueueSem as free, and will be granted access to the queue. If the other task attempts a call to SemP before the first one calls SemV, then that task will block, and will automatically be descheduled. When the task accessing the queue finishes its update, it calls SemV to free up access to the queue.

Semaphore *SemAlloc(void)
void SemFree(Semaphore *s)

Syntax

The syntax of SemAlloc and SemFree is simple. For the P and V operations, however, several functions are available. Those whose name starts with an underscore are implemented as functions (and must be passed a pointer to the semaphore in order to modify it) while the others are macros (which do not require a pointer). When the semaphore is shared by tasks of different priorities, then HSemP and HSemV must be used.

int SemP(Semaphore s)
int SemV(Semaphore s)
int _SemP(Semaphore *s)
int _SemV(Semaphore *s)
int HSemP(Semaphore s)
int _HSemP(Semaphore *s)
int HSemV(Semaphore s)
int _HSemV(Semaphore *s)


EXERCISES


 
5-4

A simple pipeline. Write a program running on one transputer and containing two concurrent tasks, Task1, and Task2. both running at low-priority. They will both share access to a queue that is protected by a semaphore. Task1 will be given the name of a text file and will read it, a character at a time, and enqueue them in the queue as it reads them. Task2 will read characters from the queue and will look for a particular pattern, say "gr." Characters that do not match the pattern will be printed on the screen. Those matching the pattern, however, will be replaced in the output to the screen with a new pattern, say "gremlin."

Check that your program implements correct enqueue and dequeue functions by running it with different queue sizes.

.


 


Message Passing

We now come to the mode of communication which is already a natural for the transputer, that of exchanging information over a channel.

We have seen in Chapter 4 that ChanIn and ChanOut (and their derived functions) offer an easy and efficient way for tasks residing on neighboring transputers to communicate with each other. One of the key properties of the links is that they are memory mapped, and are controlled directly in microcode. Because the control of the links is stored deep in the microcode, it is possible for the transputer to provide the programmer with same message-passing facility for tasks residing on the same transputer as for task residing on separate transputers. Since a link operation is essentially seen by the processor as a memory operation, the microcode can fool our programs by making them think information is transferred from the memory of one transputer to the memory of another transputer, when the transfer actually takes place within the same transputer. In one case the message will travel over hard channels, and will actually pass the boundaries of the transputers, while in other cases the messages will travel over soft channels within the same memory space.

The hard channels refer to the physical links connecting transputers to each other. The soft channels refer to channels emulated in memory, allowing two tasks residing on the same processor to exchange information between their own address spaces. Since their address spaces share the same physical memory space of the processor, the communication over a soft channel is nothing more than a byte-move operation, from one region of the memory to another.

Hence we can use ChanIn and ChanOut to pass messages on a soft channel, just as we did with hard channels. The only additional operation we need to perform is to declare a soft channel.

Let's revise our Mandelbrot set program, and create a soft channel between the two tasks.

/*=== GLOBAL AREA ===*/
Channel *softc;

/*--- initialize channel ---*/
softc = ChanAlloc();
if (softc==NULL)
     exit(1);

/* Task1 */
/*=== LOCAL AREA TO TASK1 ===*/
int Buffer1[3]; /* used to store x, y, color */

while (!Done)
{
     /*--- compute color of new point ---*/
     ...

     /*--- send point to Task2 ---*/
     Buffer[0] = x;
     Buffer[1] = y;
     Buffer[2] = color;
     ChanOut(softc, (char *) Buffer, 3*sizeof(int));
}


/* Task2 */
/*=== LOCAL AREA TO TASK2 ===*/
int Buffer2[3];

while (!Done)
{
     /*--- Wait for next point ---*/
     ChanIn(softc, (char *) Buffer2, 3*sizeof(int));
          
     /*--- send triplet to host ---*/
     ...
}

/*--- All the points have been generated and plotted: ---*/
ChanFree(softc);

We see here that we have lost the flexibility inherent to the queue implementation, but this is only an artifact of the example chosen.

The main advantage of soft channels is that they look and feel just like hard channels do. Any function that accepts a hard channel as a parameter will accept a soft channel as well. A ProcSkipAlt function can thus be given a list of channels including soft and hard channels to monitor, for example.

scalability

This is tremendously appealing when one writes programs with scalability in mind. Assume that a parallel program is to be written for a square mesh of 64 transputers, but that during program development, only a 16-transputer mesh is available. By programming the inter-task communication over channels and by forming logical clusters of four tasks, the programmer can assign these clusters to each of the 16 transputers and still design and test her program on the network of 16. When the 64-node network becomes available, the program can be made to run on the 64-node machine with very few changes by judiciously changing some of the soft channels to hard channels, and by redefining the assignment of tasks to transputers. Figure 5-5 illustrates this concept.



Figure 5-5: Scaling a program from 16 nodes to 64. (a) 16-transputer mesh. Each node supports four tasks. (b) 64-transputer mesh. Each group of four tasks is exploded to cover four neighboring transputers.


EXERCISES


 
5-5

Comparing hard to soft channels. This exercise compares the performance of hard and soft channels in terms of data transfer speed. Organize your network of transputers as a ring, with as many transputers in the ring as possible. Each node in the ring will act as a simple repeater, receiving messages on one link and sending them on the opposite link. As soon as a message is inserted on the ring, it will keep on flowing through the ring. Make the root insert the packet and record the average time it takes the message to go full circle on the ring.

Once you have measured the cycle time of the message, implement a ring of concurrent tasks on the root alone. Assuming that you had N nodes on the original ring, create N concurrent tasks running on the root, and linked by soft channels forming a ring. The task will contain the same code you used for the repeaters, except one that will be in charge of inserting the message and timing its travel speed.

Compare the execution times you obtained for the physical ring, and for the "soft" ring. If you kept the code almost identical in the two experiments, you should be able to compare the data transfer rate of the hard channels versus that of the soft channels.

Hints: Although the code in the second case runs on only one transputer, it might be tempting to assume that it will be N times slower than for the experiment implemented on the hardware ring. This is not true, since only one repeater is active at a time!




[Previous] [HOME] [NEXT]