Parallel Programming in C for the Transputer
© D. Thiébaut, 1995




EXERCISES


 
4-1

Write a program running on two transputers that prints out all the prime numbers in an interval defined by the user. The decomposition of the parallel tasks is as follows:

  • Node 1 prompts the user for two integers, Xlow and Xhigh. As soon as it gets them, it sends them to Node 2, via two ChanOut statements. It then awaits prime numbers sent by Node 2, and prints each number received. When all the numbers have been received, it prints a count of how many were printed and stops.

  • Node 2 waits for Node 1 to send the bounds of the search interval. Once it has received them, it computes all the primes in that range and sends each number back to Node 1.

Store your program in a file called prime_v2.c. Create a make file using the skeleton make file supplied for prime_v1

C:\ make -f prime_v2

Then create a network information file with the chainnif.exe utility, also provided on the distribution disk:

C:\ chainnif -# 2 -nif prime_v2.nif -1 prime_v2

Then load and run your program with the ld-net loader:

C:\ ld-net prime_v2

 

 
4-2
Write a program running on two transputers that prompts the user for a string of characters. The string must be less than 80 characters long, and is input by Node 1. Once Node 1 has read the user string, it sends it to Node 2 which converts all lowercase letters to uppercase, and conversely. Once the transformation is done, Node 2 returns the string to Node 1 which prints it back on the video display.

Since Node 1 and 2 will not know how many characters will be entered by the user, you have to be careful how you pass the variable-length string from one node to the other.

.


 

 
4-3
Transform prime_v1.c so that Node 2 sends the primes that it has found back to Node 1 as an array of integers, rather than one prime at a time. Assuming that you call the array Prime_array, you can simply pass Prime_array to ChanOut to designate the address of the data to be transferred[1.]

.


 


Other forms of channel transfer

ChanIn and ChanOut are the workhorses of channel communication. The conc.h library contains other versions as well, specialized in integer and character transfers.
void ChanOutInt(Channel *, int)
int ChanInInt(Channel *)

void ChanOutChar(Channel *, char)
char ChanInChar(Channel *)

ChanInInt and ChanOutInt are dedicated to integer exchanges, while ChanInChar and ChanOutChar support character (byte) exchanges. Although ChanInInt and ChanOutInt would have done just as well in our prime-finding program, we have learnt the most versatile functions, which, in practice, are used more often than the specialized ones. With them and a little coding, we can very easily generate macros targeted for the transfer of particular data structures.

Reliable communication

Finally, we will mention two more variations of the ChanIn/ChanOut functions that are used for the implementation of reliable communication:
int ChanInChanFail(Channel *, void *, int, Channel *)
int ChanInTimeFail(Channel *, void *, int, int)

int ChanOutChanFail(Channel *, void *, int, Channel *)
int ChanOutTimeFail(Channel *, void *, int, int)

The ChanInChanFail and ChanOutChanFail functions perform just as the ChanIn and ChanOut functions, except that they specify an additional channel as their fourth parameter. Under normal operations, this second channel is never activated, and the ChanInChanFail and ChanOutChanFail perform as regular ChanIn and ChanOut functions. However, in some cases, a failure mode may occur, making it necessary to unblock processes engaged in communications known to never have a chance to complete. The second channel passed to ChanInChanFail and ChanOutChanFail are the mechanism for aborting the communication. A process in charge of maintaining the integrity of the whole computation will detect the failure mode and will unblock tasks involved in "ChanFail" communication by sending an integer via the second channel (fourth parameter of ChanInChanFail and ChanOutChanFail).

ChanInTimeFail and ChanOutTimeFail are used in similar situation, except that a time-out period is associated with the transfer. If the communication is not done by the specified time (passed as the fourth parameters to these function) the communication is aborted, the communicating tasks are unblocked, and the channel is reset.

A note of caution

Implementing reliable communication is not an easy task, and should be undertaken with care. An important point to remember is that the first byte of a hard channel message is sent as soon as the output side is ready, regardless of the state of the input side!

4-3 Alternation with ProcSkipAlt

Our first program, prime_v1.c, works fine, but it really does not fit the description of a parallel program. Node 1 could have just as well computed the primes and printed them. Node 2 was not necessary to do that. However, it still provides us with a simple example to illustrate channel communication between nodes of a transputer network. Let's improve our program (and its performance) by having Node 1 carry out some of the computation. Our new problem is described by the following requirements:

Improving prime_v1.c

Let's think about the first requirement. Node 1 now has two jobs: i) it must compute and print primes, and ii) it must receive primes from Node 2 and print them as well. It would be unwise to code Node 1's task as two consecutive loops for the two jobs. Can you see why? If we do this, then we loose any performance benefit from the parallelization of our initial problem, since the two loops would serialize the computation. Node 1 would first compute and print all the primes between 1 and 99, and then it would start accepting the primes from Node 2 and print them. But because the ChanIn and ChanOut functions are unbuffered and blocking, Node 2 would get blocked the first time it tries to "chanout" its first prime, and wouldn't get released from this state until Node 1 would start receiving the primes. At this point Node 2 would have computed only one prime, and would still have to go through the rest of its numbers to find the remaining primes. This is the main restriction associated with channel communication on the transputer. We must remember that communication is blocking. If we are not careful we can miss on harvesting all the power provided by the multiprocessor.

What we need to do instead, is to mix the two loops that Node 1 must run, the one computing the primes, and the one relaying the primes from Node 2. What happens when we run two different loops simultaneously? A similar case occurs in a serial algorithm you might be familiar with: the merge-sort algorithm [SEDG83]. In it two loops, one reading items from one file, and the other reading from another file are executed together. The loop in which the item read is smaller is iterated while the other loop waits. Eventually one loop finishes before the other, and the remaining loop is run by itself.

Using this example, we can code the task running on Node 1 using the following model:


/*--- Pseudo code running on Node 1 ---*/
for (x= 1; x< 100; x++)
{
if (x is prime)
{
print it;
increment counter;
}
if (Node 2 has a prime to send)
{
Get the prime from Node 2;
if not the sentinel print it;
increment counter;
}
}
while (we haven't received the sentinel from Node 2)
{
wait for a prime from Node 2;
if not the sentinel print it;
increment counter;
}

Testing when Node 2 has primes to send

The first for loop embeds two inner blocks, one computing the primes between 1 and 99, the other getting primes from Node 2. How should we implement the highlighted statement?
     if (Node 2 has a prime to send)
     {
          Get the prime from Node 2;
          if not the sentinel print it;
increment counter;
}
Can we use ChanIn(LINK1IN,...) to test if Node 2 has a prime to send? The answer is yes. Should we use ChanIn? The answer is no!. The reason is that ChanIn() is blocking, and if we use it in the test, the task running on Node 1 will be blocked until Node 2 finds a prime to send. This means that the for (x=1; x< 100; x++) loop would alternate between printing a prime found by Node 1 (less than 100) and a prime found by Node 2 (larger than 100). This is still too restrictive, if Node 2 takes a long time finding primes, then Node 1 is slowed down unnecessarily. We need a non-blocking mechanism to check if there is some activity on the channel connecting Node 2 to Node 1. ProcSkipAlt is the solution.

int ProcSkipAlt(Channel *, ...,Channel *, 0)

ProcSkipAlt is an alternation function that receives a list of channel pointers terminated by a 0 (NULL), and that checks whether these channels are ready for input. Alternation refers to the fact that the action taken by the function, or its result is not deterministic, but rather depends on the list of elements past to the function. Alternatively, the result may depend on the first parameter, or the second, or the fourth, or none of them, and this cannot be predicted in general. With ProcSkipAlt, if a channel in the list is ready for input, an index corresponding to the position of the channel in that list is returned. If none of the channels are ready, then -1 is returned.

Examples

Assume that we have a transputer with four active links, and assume that the neighbor it is attached to via LINK1 has initiated a transfer to this transputer. The following calls show the value returned by ProcSkipAlt in that case.
ProcSkipAlt(LINK0IN, LINK1IN, LINK2IN, 0);	/* returns 1  */
ProcSkipAlt(LINK0IN, 0); /* returns -1 */
ProcSkipAlt(LINK1IN, LINK2IN, LINK0IN, 0); /* returns 0 */
We can now translate the pseudo code for Node 2 as follows:

if (ProcSkipAlt(LINK1IN,0) != -1)
{
ChanIn(LINK1IN, &x, INTSIZE);
printf("%8d",x);
NoPrimes++;
}
If Node 2 hasn't initiated a transfer yet, then ProcSkipAlt returns -1 and the loop is skipped. Node 1 can then test another integer for its "prime" characteristic. The code for the whole program is now shown in its entirety.

/* =======================================================================
   prime_v2.c

   DESCRIPTION:
   Compute primes on 2 nodes.
   Node 2 computes the primes and sends them to Node 1.
   Node 1 (Root) collects the numbers and prints them

   TO COMPILE AND RUN:

   make -f prime_v2
   chainnif -# 2 -1 prime_v2 -nif prime_v2
   ld-net prime_v2

 ====================================================================== */
#include <stdio.h>
#include <stdlib.h>
#include <conc.h>                                 /* transputer library */

/* ============================ DEFINITIONS =========================== */
#define INTSIZE   ((int) sizeof(int))
#define INTERVAL1 100
#define INTERVAL2 200
#define SENTINEL  -1

/* -------------------------------------------------------------------- */
/*                                 MAIN                                 */
/* -------------------------------------------------------------------- */
main()
{
     int NoPrimes = 0, x, y = 0, j;

     /* =============================================================== */
     /* NODE #1                                                         */
     /* =============================================================== */
     if (_node_number==1)
     {
          /*--- Find primes in first interval and relay primes from  ---*/
          /*--- Node 2                                               ---*/
          for (x = 1; x<INTERVAL1; x++)
          {
               if (IsPrime(x))
               {
                    printf("%8d", x);
                    NoPrimes++;
               }

               /*--- check if Node 2 has a prime ready ---*/
               if (ProcSkipAlt(LINK1IN, 0)!=-1)
               {
                    ChanIn(LINK1IN, &y, INTSIZE);
                    if (y!=SENTINEL)
                    {
                         printf("%8d", y);
                         NoPrimes++;
                    }
               }

          }

          /*--- Keep relaying primes if Node 2 isn't done yet ---*/
          while (y!=SENTINEL)
          {
               ChanIn(LINK1IN,  &y, INTSIZE);
               if (y!=SENTINEL)
               {
                    printf("%8d", y);
                    NoPrimes++;
               }
          }
          printf("\nReceived %d primes \n", NoPrimes);
          exit(0);
     }
     else
     /* =============================================================== */
     /* NODE #2                                                         */
     /* =============================================================== */
     {
          /*--- scan interval ---*/
          for (x = INTERVAL1; x<INTERVAL2; x++)
          {
               if (IsPrime(x))
                    ChanOut(LINK0OUT,  &x, INTSIZE);
          }

          /*--- signal Node 1 that we are done ---*/
          x = SENTINEL;
          ChanOut(LINK0OUT,  &x, INTSIZE);
     }
}

/* -------------------------------------------------------------------- */
/* ISPRIME                                                              */
/* Semi-efficient prime finding function which, given x, returns 1 if   */
/* it is prime, and 0 otherwise                                         */
/* -------------------------------------------------------------------- */
int IsPrime(int x)
{
     int i;                    /*  0 1 2 3 4 5 6 7 8 9 */
     static int SmallPrimes[10] = {0,0,1,1,0,1,0,1,0,0};
     if (x<10) return SmallPrimes[x];
     if (x%2==0) return 0;
     if (x%3==0) return 0;
     if (x%5==0) return 0;
     for (i = 2; i*i<=x; i++)
          if (x%i==0) return 0;
     return 1;
}

Listing 4-2: prime_v2.c

The code executed by Node 2 is the same as in our first program, except that Node 2 computes primes between 100 and 200, coded as the macros INTERVAL1 and INTERVAL2, respectively.

Let's look at the task executed by Node 1. The main for-loop computes the primes and relays those sent by Node 2. Note that we are using the temporary integer variable x in the prime-generating loop and y in the relay loop. By having two variables instead of one, we can easily "remember" whether we received the sentinel from Node 2. Note that this is not the most efficient way of writing the code for Node 1. Because Node 1 does not check if Node 2 has a prime ready for output inside its j-loop, it is slightly slowing Node 2 down. In Section 4-4 we will look at an example where buffering is used to improve such a situation.

Program output

Let's run the program and see if we indeed have introduced some parallelism. If we see all the primes printed as two intertwined series of increasing numbers, one series with numbers less than 100, and the other series with numbers greater than 100, then we have created some amount of parallelism in our code. This is indeed what the output shows:
     101       2     103       3     107     109       5     113     127
       7     131     137     139     149      11     151     157      13
     163     167     173     179      17     181     191      19     193
     197     199      23      29      31      37      41      43      47
      53      59      61      67      71      73      79      83      89
      97
Received 46 primes 

Observe that Node 2 is turning out primes very fast, compared to Node 1. Remember, Node 1 is the one outputting the primes less than 100, while Node 2 computes the primes greater than 100. In fact, when Node 2 has finished testing all its 100 numbers (when 199 is output), Node 1 still hasn't advanced past a third of its 100 numbers (it hasn't outputted 23 yet).

Take a minute to look at the code and try to figure out why this is the case.

The reason is that Node 2 executes a highly specialized task and spends all its time testing numbers and channeling them out, while Node 1 is involved with testing numbers, channeling numbers in from Node 2, and channeling numbers out to the host through printf statements, not mentioning updating a counter! The net result is that Node 1 is about three times slower than Node 2.

Other alternation functions

The conc.h library contains two other important alternation functions: ProcAlt, and ProcTimerAlt.
int ProcAlt(Channel *, ..., Channel *, 0)
int ProcTimerAlt(int, Channel *, ..., Channel *, 0)

ProcAlt is similar to ProcSkipAlt, except that it is blocking, i.e. it waits until one of the channels specified in the list becomes active, and then returns to its caller. We will look at an example using ProcAlt next. ProcTimerAlt, like ProcAlt, is blocking. However, it has a time-out feature which enables it to unblock when a specified time is reached. If you remember our discussion in Chapter 2, each transputer maintains two timers, and ProcTimerAlt allows us to enter a blocking ProcAlt call, with the option to return if too much time (defined by the user) has elapsed. This is very useful for debugging purposes, and in environments requiring high-reliability.


EXERCISES


 
4-4

Edit prime_v2.c and replace the ProcSkipAlt statement with ProcAlt. The syntax for both is the same, so you simply need to remove the "skip" part. Before you run the program, can you predict what will happen?

Run the program, and explain its output. .


 

 
4-5
Write a three-transputer program that computes primes in the way described below.

  • Node 1 will compute all the primes between 1 and 100 (not included), and will print them as they are found. It will also print any prime numbers it receives from Node 2.

  • Node 2 will compute all the primes between 100 and 200 (not included), and will send them to Node 1 as they are found. It will also receive primes from Node 3 and will relay them to Node 1 as well.

  • Node 3, you will have guessed, will compute all the primes between 200 and 300, and will pass them to Node 2 as it finds them.

Pay special attention to stopping conditions for the tasks. In particular, try to define as exactly as possible the following three conditions: When Node 1 is done, when Node 2 is done, and when Node 3 is done.

Code the tasks so that they do not stop until these conditions are met exactly.

To run your program you will have to create a network information file for three transputers. Assuming that you have called your program prime_v3.c, you can use the chainnif utility as follows:

chainnif -# 3 -nif prime_v3.nif -1 prime_v3
ld-net prime_v3
.

 




[Previous] [HOME] [NEXT]