Parallel Programming in C for the Transputer
© D. Thiébaut, 1995



7-9 Deadlock Recovery

Finally, we should point to another possible choice for dealing with deadlocks: that of assuming that they will occur, and that when they do so, the parallel program should be clever enough to try to unblock the tasks involved and resume their execution. In large applications, this may be a more viable solution than that of implementing a deadlock-free routing scheme, although prevention should be preferred over recovery. The ultimate trade-off is between the overhead introduced by, say a deadlock-free routing scheme based on virtual channels, and the overhead of a detection/recovery scheme. The virtual channel router may require more buffer space for the individual queues associated with each virtual channels, resulting in a loss of channel bandwidth and performance because of the multiplexing of virtual paths over hard links. The detection/recovery scheme will require extra code, and extra processor utilization spent in the detection phase that is often associated with time-out periods.

Nonetheless, let's take a look at a simple example and see how some Logical Systems C functions can be used to recover from (very frequent) deadlocks.

Game of life on four transputers

Assume that we have four transputers connected as a ring (or a mesh), each one in charge of maintaining a quarter of the full population of cells in Conway's game of life. The cells reproduce on a square grid, and a quarter of the grid is assigned to each transputer. When computing the next generation of a cell, a transputer must compute the sum of all neighboring cells. Depending on the number found and the rules controlling the life and death of the cells, a cell may remain alive, be born, or die.

However, when a transputer counts the number of neighbors for the cells that lie on a border common with another transputer, some exchange must be made between the two transputers, so that an accurate count of neighbors can be made. Typically, each transputer will send to the other a vector corresponding to the current population of cells on the border. Figure 7-8 illustrates the exchange that must take place between each transputer, every generation.

We will make two simplifications here. The first one is that we assume that the transputers have special rules for dealing with the cells located on the boundaries of the full grid. Our second assumption is a simplifying one: we will assume that the exchange of information is done in such a way that transputers diagonally apart will get the status of each other's inside cell. The four cells at the intersection of the inside row and inside column of each transputer depend on the status of the other three, and hence there should also be "diagonal" information transfer, which we do not consider here. Instead, we are interested in exchange of data between the transputers, which we sketch as follows.

S1:          GenerationCount <- 0
S2:          Get column and Row from neighboring transputers
S3:          Send row and column to neighboring transputers
S4:          Compute new generation
S5:          GenerationCount <- GenerationCount + 1
S6:          if (GenerationCount < MAXGENERATION) goto Step S2

This six-step algorithm is thus run by each transputer.


Figure 7-8: For the computation of each new generation, transputers must exchange their inner row and column of cells with their neighbors. For example, Transputer 1 must send its rightmost column to Transputer 2, and its bottom row to Transputer 3. In exchange, it will get the leftmost column of Transputer 2 and the top row of Transputer 3.

It implements what we want, but will it work?

From what we have seen in this chapter, you may already see that we blatantly forgot to implement any synchronization between the exchange of data. Clearly, if all four start by waiting for a column or a row from their neighbor, then the program will go nowhere... Except if we write the code keeping in mind that deadlocks will occur. The idea is to use communication functions that can time-out. LS C supports several of them, among which we have ChanInTimeFail, ChanOutTimeFail, and ProcTimerAlt. These functions work similarly to their counterpart ChanIn, ChanOut, and ProcAlt, but take an extra argument that represents a time in the future. When the timer associated with the current priority reaches the value stored in that argument, the function is aborted. If the task was inactive, blocked on a deadlocked channel, then the timer will awaken the task and bring it back into the queue of active tasks.

A quick example will illustrate how ChanInTimeFail can be used this way.

#define TIMEOUT 1000
int SafeChanInInt(Channel *chan, int *data)
{
     int i;

     i = ChanInTimeFail(chan, data, sizeof(int), Time()+TIMEOUT);
     return i;
}

The ChanInTimeFail function, whose syntax chart is given below, is set up to receive an integer from the channel chan, and to store it in (*data). However, if the ChanInTimeFail function fails to return before a thousand timer ticks have elapsed (64 ms), then the channel transfer will be aborted, and the value 1 will be stored in i. If the transfer completes within 1000 ticks, then the function is successful, and 0 is stored in i.

int ChanInTimeFail(Channel *, void *, int, int)

Let's take a look at the code for the four transputers now. It contains only the code relative to the data exchange, as we left out the maintenance of the grid of cells.

/* =======================================================================
   timeout.c

   DESCRIPTION:
   Simulates some of the data exchanges required in Conway's game of
   life.  the program simulates the exchange of a full vector (column
   or row) of the grid of cells maintained by each transputer by
   sending a single integer.  This could be a bit-map representing 32
   cells for example).

   The program will deadlock as no effort has been made to synchronize
   the exchange of information in a dead-lock free fashion.  the
   program will recover from the deadlocks, however, by using the
   timeout feature of the ProcAltTimerList function.

   TO COMPILE AND RUN:

   make -f timeout
   ld-net timeout

   ASSOCIATED NIF-FILE

   buffer_size     200;
   host_server     CIO.EXE;
   level_timeout   400;
   decode_timeout  2000;

   1, timeout, R0,  0     ,   2[0] ,     3[1] ,         ;
   2, timeout, R1,  1[1]  ,   4[0] ,          ,         ;
   3, timeout, R4,  4[1]  ,   1[2] ,          ,         ;
   4, timeout, R2,  2[1]  ,   3[0] ,          ,         ;

 ====================================================================== */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "conc.h"

/* =========================== DEFINITIONS ============================ */
#define NO_NODES      4                       /* number of transputers  */
#define MINTIMEOUT    400                     /* minimum timeout period */
#define TIMEOUT       (Time()+_node_number*MINTIMEOUT)
#define NO_EXCHANGES  100               /* Total # of data-exchanges or */
                                        /* generations                  */
#define LeftChanIn    LINK1IN
#define RightChanIn   ((_node_number==1)? LINK2IN: LINK0IN)
#define LeftChanOut   LINK1OUT
#define RightChanOut  ((_node_number==1)? LINK2OUT: LINK0OUT)

/* =========================== PROTOTYPES ============================= */
void ExchangeData(void);

/* ============================= GLOBALS ============================== */
int     NoTimeOuts = 0;

/* -------------------------------------------------------------------- */
/*                                 MAIN                                 */
/* -------------------------------------------------------------------- */
main()
{
     int i, StartTime, EndTime;

     srand(Time());                    /* initialize random # generator */
     StartTime = Time();               /* keep track of processor time  */

     /*--- simulates computation of several generations ---*/
     for (i = 0; i<NO_EXCHANGES; i++)
          ExchangeData();

     EndTime = Time();

     /*--- display elapsed time ---*/
     if (_node_number==1)
          printf("Elapsed time: %6.3f seconds\n%d timeout(s)\n\n",
               1.0*(EndTime-StartTime)*64/1000000, NoTimeOuts);
}

/* -------------------------------------------------------------------- */
/* EXCHANGEDATA                                                         */
/* function used to exchange inside columns and inside rows with        */
/* neighboring transputers                                              */
/* -------------------------------------------------------------------- */
void ExchangeData(void)
{
     int GotLeft = 0,                /* keep track of exchanges already */
         GotRight = 0,               /* performed.                      */
         SentLeft = 0,
         SentRight = 0;
     int index, dummy, vector;
     Channel *ChanList[3];           /* use list in ProcAlt function    */

     ChanList[0] = LeftChanIn;
     ChanList[1] = RightChanIn;
     ChanList[2] = NULL;

     /*--- wait until 4 data exchanges are performed ---*/
     while (GotLeft+GotRight+SentLeft+SentRight<4)
     {
          /*--- if there's still something to receive wait a while and
          listen if its coming on a link ---*/

Step S2

          if (GotLeft+GotRight<2)
          {
               index = ProcTimerAltList(TIMEOUT, ChanList);
               switch (index)
               {
                    case 0: /*--- something on first channel in list ---*/
                            vector = ChanInInt(ChanList[0]);
                            if (!GotLeft)
                                 GotLeft++;
                            else
                                 GotRight++;
                            /*--- remove 1st channel from list ---*/
                            ChanList[0] = ChanList[1];
                            ChanList[1] = NULL;
                            break;
                    case 1: /*--- something on second channel in list ---*/
                            vector = ChanInInt(ChanList[1]);
                            GotRight++;
                            /*--- remove 2nd channel from list ---*/
                            ChanList[1] = NULL;
                            break;
                    case -1: /*--- time-out occurred ---*/
                            NoTimeOuts++;
                            break;
               }
          }

          /*--- simulate sending a column or row ---*/
          vector = _node_number;

          /*--- if we haven't sent anything yet... ---*/

Step S3

          if (SentRight+SentLeft==0)
          {
               /*--- choose one neighbor at random and send it vector ---*/
               if (rand()%2)
               {
                    ChanOutInt(RightChanOut, vector);
                    SentRight++;
               }
               else
               {
                    ChanOutInt(LeftChanOut, vector);
                    SentLeft++;
               }
               continue;
               /*--- then go back to receive mode ---*/
          }

          /*--- we already have sent to the "left" neighbor... ---*/
          if (!SentRight)
          {
               /*--- then send to the right neighbor ---*/
               ChanOutInt(RightChanOut, vector);
               SentRight++;
          }

          /*--- we already have sent to the "right" neighbor... ---*/
          if (!SentLeft)
          {
               /*--- then send to the left neighbor ---*/
               ChanOutInt(LeftChanOut, vector);
               SentLeft++;
          }
     }
}

Listing 7-1: Listing of timeout.c program. Partial implementation of game of life.

ProcTimerAltList

The first highlighted section of the program contains the deadlock recovery mechanism: the call to the ProcTimerAltList function. This function is related to ProcAlt, which is a blocking function. Hence, as soon as the call is made, the program is blocked and in an inactive state. However, if no data transfer has taken place in the next _node_number*MINTIMEOUT timer ticks, the task is unblocked and returns -1, indicating that the function's normal operation has been aborted. The case statement recognizes this condition and increments a counter.

Why utilize a time-out period that is a function of the node number and not a constant used uniformly by the four transputers? If we had selected the same constant for all three transputers, then we would have created a scenario where the transputers would keep "missing" each other. Because the transputers are loaded at different time, they will actually start the ProcTimerAlt function at different times, but because they always exchange four pieces of information between each other, they would quickly synchronize themselves, and start a pattern where they would timeout for the ProcTimerAlt function at the same time, then move on to the next section where they would enter the ChanOut statements which are not deadlock-resistant. This would create a deadlock. But trying to fix this problem by using ChanOutTimeFail functions with identical time-out intervals would not solve the problem. The reason is that all four transputers would switch to input mode at the same time, time-out of this mode at the same time, go into output mode (ChanOutTimeFail) at the same time, and then time-out, still at the same time, ad infinitum. They would never reach a point where one would be in input mode while its neighbor would be in output mode. For this reason the transputers must use different timeout periods.

Another option would be to adopt the same system used in computer packet-switched networks where several sources colliding while trying to establish a communication will wait a random period of time and try again. The randomness practically eliminates the chances that they will collide again. We could have used this method as well and made each transputer wait for random periods of time before timing out (see Exercise 7-9).

Running on four 20 MHz T805, the above code times at 2.560 seconds. For 100 simulated generations, this is awfully slow. A synchronized version of the program, written so that it is deadlock-free boasts an execution time approximately 50 times faster on the same four transputers (see Exercise 7-10). The number of time-outs reported by the program is 99. Since the time-outs occur only in the ProcAlt call, and since two calls to ProcAlt must be made before receiving the two vectors needed for each generation, this means that one data exchange (four vectors) experiences an average of one time-out, and that this time-out lasts (2.560-0.051)/99 = 0.025 seconds, or 0.025/0.000064 = 390.625 ticks. About the size of MINTIMEOUT. This makes sense. Node 1 has the shortest time-out period, and will always be the first one to "unblock" a lock, hence we can expect an average delay per unblocked deadlock of 400 ticks.

Reducing the MINTIMEOUT constant could reduce the total execution time, but may also result in too short a period of time to insure that the 4 transputers remain synchronized once unblocked.


EXERCISES


 
7-7

Modify the code of the timeout.c program listed previously in this chapter, and change the TIMEOUT macro to

#define TIMEOUT (Time()+MINTIMEOUT)

Run the program on four transputers connected in a square mesh. Why isn't the program running? .


 

 
7-8
Change the ChanOut calls to ChanOutTimeFail, and modify the macro TIMEOUT as shown in the previous exercise. Run the program on a network of four transputers. Why isn't the program still running?

 

 
7-9
Modify the TIMEOUT macro so that it yields out a random time in the future, between Time()+MINTIMEOUT and Time()+MAXTIMEOUT. What is the smallest MINTIMEOUT value you can use that allows the program to always finish its 100 generations?

 

 
7-10
Modify the timeout.c program and rearrange the exchange of information between the four transputers, so that deadlock-free communication can be established without introducing any time-out functions. Time the execution of your program and compare it to that of the original program.

 


Other Reliable LS C Communication Functions

Logical Systems C provides other functions, beside ChanInTimeFail, ChanOutTimeFail and ProcTimerAlt, that can be used in the context of deadlock and recovery: ChanInChanFail and ChanOutChanFail. Both operations are similar to ChanIn and ChanOut, except that their argument list contains four arguments. The first three are the same used by ChanIn and ChanOut, while the fourth one is channel pointer.

ChanInChanFail(channel *c, void *buffer, int size, channel *timeoutchan)

ChanOutChanFail(channel *c, void *buffer, int size, channel *timeoutchan)

The timeoutchan channel is directed at the task that calls these ChanFail functions, and is used to unblock the process that called ChanInChanFail, or ChanOutChanFail.

Assume, for example, that some task T1 regularly sends out information over some channel c to another task T2, and that it is feared that T2 may deadlock, bringing T1 in the lock by not responding to T1's ChanIn operations. One way to detect the deadlock and to recover from it is to create a new task T3, running in parallel to T1 (very likely but not necessarily on the same transputer), to "watch over" T1.

watch dog

We'll call T3 a watch dog. One would expect T3 to monitor the task T1 as to its normal functioning. This could be done directly, via T1 sending short messages to T3, or indirectly, by having T3 check on the general status of the overall computation. T1 and T3 would share a channel, timeoutchan, used by T1 as the fourth argument of its ChanInChanFail directed to T2. Were T3 to detect that T1 had not been performing correctly for a short while, it can very quickly bring T1 back to the list of active tasks by sending an integer on the channel timeoutchan. The result is that T1 aborts its input and the channel c is reset. T1 can then receive the integer on timeoutchan (clearing that channel) and start a deadlock recovery operation.

Time-Out Rule
Data loss

Note that when a communication function is aborted, either by a time-out condition, or by the activation of a watch-dog channel, data may get lost. The reason is that if the transfer of information had already started, then it is stopped and no attempt is made by the timing-out agents to let it continue until the end. When dealing with hard channels, it will be impossible to find out if any bytes of information had been transferred before the transmission was aborted.
Therefore, time-out delays should be selected long-enough that the probability of a time-out to occur during a transmission be as small as possible.

The LS C functions using the watch-dog channel are powerful, and could be used to implement sophisticated deadlock detection and recovery schemes such as the one presented by Feitelson [FEIT91]. Feitelson's scheme applies to shared-memory systems, and can be used to monitor deadlock situation arising within a transputer as well, or within a general networks through a careful implementation.

In Feitelson's scheme, one parallel task on the processor would be dedicated as a watch-dog task monitoring three counters. The first one, Total, represents the number of tasks running on that processor. This counter does not include the Watch-dog task, and is incremented every time a new task is created, while decremented every time a task finishes. The second counter, Stuck, represents the number of tasks that are blocked, or active in a state where they cannot contribute to the computation. A task stuck in a while loop waiting for a queue to become non empty would be a good example of such a case. Typically a task would increment the stuck counter before performing a blocking operation:

/*--- update counter in critical section ---*/
ProcToHigh();
Stuck++;
ProcToLow();

/*--- perform blocking operation ---*/
ChanOut(...);

/*--- wake up, and update counter again ---*/
ProcToHigh();
Stuck--;
ProcToLow();

or before a busy-wait loop:

/*--- update counter in critical section ---*/
ProcToHigh();
Stuck++;
ProcToLow();

/*--- busy wait ---*/
while (Queue.Empty)
        ProcReschedule();

/*--- update counter again ---*/
ProcToHigh();
Stuck--;
ProcToLow();

Note that the update of the counters must be done in a critical section, since Stuck is shared and updated by many tasks. The watch-dog task keeps comparing the Stuck counter to the Total counter. When the two are equal, one processor has nothing to do. When all the processors in the system reach this state, the last processor to actually reach the state should initiate a deadlock recovery operation. Feitelson assumes a shared memory system, so that individual processors can set shared bits to report their condition. This way, the last processor in the pool to reach the deadlock condition will detect this situation by setting the last bit in the pool, and will therefore be entitled to start the recovery. Using virtual channels and a protocol where watch-dog tasks keep on passing a token message representing the status of the system, a similar algorithm is manageable.

7-10 Concluding Remarks

We have looked at deadlocks and categorized operating conditions when they are likely to occur: data sharing and communication. When accessing shared data, we have to be careful about locking access to the resources, and the simplest solution is to use one lock only per data structure. When multiple locks are required, a safe approach is to lock all or lock none.

In the domain of data communication, deadlock arise from the creation of cycles in the resource or wait-for graphs. Preventing cycles from occurring is by far the most efficient way to deal with deadlocks. Several methods are available, and can be implemented in LS C.

In situations where deadlocks are allowed to occur, fail-safe communication functions can be use to recover from deadlock conditions. These communication functions (ChanInTimeFail, ChanInChanFail, ChanOutTimeFail, ChanOutChanFail) allow communicating tasks to exit from their blocked state on a time-out condition, or when a specific channel (watch-dog channel) receives information.




[Previous] [HOME] [NEXT]