Parallel Programming in C for the Transputer
© D. Thiébaut, 1995



7-8 Deadlock prevention

The first order of business when designing an application where several tasks are interconnected by a sophisticated routing system of channels, is to design the routing system in such a way that deadlocks cannot occur. This is an ongoing area of research, and we will look at several solutions.

Prevention in deterministic routing

When the routing is deterministic, that is the path of a packet through the network is deterministic and depends only on the source and destination of that packet, then static analysis of the application can yield accurate deadlock-free code.

The wait-for graph is essentially a directed graph where each task of the parallel application is represented as a vertex, and where each link or channel connecting two tasks is a directed edge, whose direction is the direction of the data flow. To maximize performance, buffers are often implemented at both ends of a channel. This leads to a possible refinement of the graph where buffers can be made vertices within a task, and edges can be added between them representing packet-paths internal to a task.

One general solution to prevent cycles in these types of graphs is to impose a partial ordering on the resources making up the graph (buffers or links), and to enforce a policy where messages can be sent only following a path going through ordered resources.

Gunther [GUNT81], and Merlin and Sweitzer [MERL80] have proposed deadlock avoidance techniques where the buffers are the selected resources. Their solutions create pools of buffers that are structured in classes with a partial order. Dally and Seitz's approach [DALL87], on the other hand, imposes a partial ordering on the channels. Their scheme is more general and applies not only to network with store-and-forward routing, such as transputer networks, but also to network supporting other routings such as the wormhole[1] routing. We will explore this last scheme in more details now.

An example

We will use a simple example taken from Dally and Seitz [DALL87]. Let's assume that we have four parallel tasks (running on one or several transputers) connected by four channels, as shown in Figure 7-1a. The first step is to create the channel dependency graph, as shown in Figure 7-1b. This graph is a directed graph where the vertices are the channels and the edges represent the routing function. If a routing function allows packets arriving at Task Ti over Channel Cn, to leave Ti over Channel Cm, then the channel dependency graph will have an edge between Cn and Cm, directed toward Cm. Dally and Seitz give a theorem proving that the routing of messages in the network is deadlock free if and only if the channel dependency graph does not contain cycles. This is a powerful theorem which is the basis for their deadlock avoidance scheme.


Figure 7-1: Example of deadlock condition in a simple network, and the Dally-Seitz solution.

Break channels into virtual channels

Clearly Figure 7-1b exhibits a cycle. The application is thus deadlock-prone. The first step to make the routing deadlock free is to break each channel on the cycle into groups of virtual channels, each with its own associated queue. If the channels are soft channels within the computation structure of a transputer, then we simply need to double them with additional soft channels. If the original channels are hard channels linking transputers, then we must use the virtual channel facility of LS C. Virtual channels share existing hard links by time-multiplexing the sending and receiving of packets. In Figure 7-1c the resulting interconnection network is shown. The directional channels linking tasks have been replaced by two virtual channels, going in the same direction. For example, c3 has been split into cl3 and ch3, where l stands for low and h for high.

Routing Up or Down

The key do deadlock free routing is to adopt the following rules when sending packets:

Routing Rule

When Node i must send a packet to Node j, it uses a high channel if j is greater than i, and a low channel if j is less than i.

For example, assume that Node 2 running Task T2 must send a packet to Node 4. Since 2 is less than 4, Node 2 uses the high channel ch2 to send the packet first to Node 3. Node 3 compares its Id to 4, the destination, and it too chooses the high channel: ch3.

The most important result of the application of this rule is that it creates a total order of the channels. For example, let's take a look at Node 3. Packets arriving to it on cl2 from Node 2 can continue only on cl3. A packet from Node 1 would have to take this route. Therefore cl2 > cl3. A packet arriving on ch2, on the other hand, would have to continue on ch3. The reason is that if it arrived on ch2, its destination must have been greater than 2, and thus, if it does not stay at Node 3 it must use ch3 to continue: ch2 > ch3. Going through a similar analysis for the other node we get the ordering cl1 > cl2 > cl3 > cl4 > ch1 > ch2 > ch3 > ch4. (cl1 and ch4 are never used, but we still incorporate them here). The result is that the channel dependency graph does not contain a cycle anymore, as Figure 7-1d shows.

This solution is elegant and easy to implement. Its performance may suffer slightly from the overhead of the virtual channel manager, although the use of virtual channels with associated queues has been shown by Dally to increase throughput in some cases [DALL92]. In addition, it may require more buffer space than the standard approach, but it is a price worth paying for avoiding deadlocks.


EXERCISES


 
7-1

Implement the four-node example shown above in LS C. Use virtual channels linking four transputers connected in a ring network. Create parallel tasks for managing the channel input and output operations. Test your program by having each node compute the average delay of a round trip packet sent to one of its neighbors. For example, Node 2 will compute the average delay to send a packet to Node 4 by sending N packets to Node 4 containing the time at which the packet left. Node 4, upon receiving the packets, will forward them back to Node 2. Upon receipt, Node 2 will compute the difference between the present time and the time stamp of the packet, and add it to a running sum. After the N packets have been exchanged, Node 2 will divide the sum by 2*N to compute the average delay. Write your code so that each node will be carrying out its own experiment while relaying packets sent by other nodes.

Verify that no deadlocks occur during the execution of your program. .


 


Prevention under adaptive routing

When the routing is adaptive, it is not always easy to predict the path of packets. Typically adaptive routing uses deterministic rules as long as the traffic is low, but switches to nondeterminisc path selection as the traffic augments.

Son and Paker [SON91] propose an adaptive routing scheme where the main idea is extremely simple: Assuming that each output channel is associated with a queue, no node in the network will accept a packet unless it has room for at least one packet in one of its output queues. This scheme will work even if the channel dependency graph contains a cycle, and is an example of an algorithm that restricts the use of buffers rather than the use of channels, as was Dally and Seitz's scheme.

Son and Paker's full algorithm is a collection of six rules:

Rule 1: A node can introduce a new packet on the network only if it has room for that packet in one of its output queues.

Rule 2: Input channels active simultaneously are treated fairly by the routing scheme.

Rule 3: A node will give higher priority to a packet that must be relayed than to a packet it needs to introduce on the network.

Rule 4: If the node is not ready to absorb a packet, it will forward it to the output channel with the smallest queue.

Rule 5: If the output queue selected by the routing function for a given packet is full, then that packet is enqueued in the output queue of minimal size.

Rule 6: If more than one such queue exist, then the router selects one at random.

It is interesting to note that this is a very different solution than Dally's, but one that also provides deadlock-free routing. Its major concern is to keep packets going over the network until the nodes can absorb them, and to never allow all output queues to fill-up, so that nodes can be given a chance to eventually absorb their packets.




[Previous] [HOME] [NEXT]