| 6-1 |
Modify the prime finding program used in this section to illustrate the inner workings of Parascope, so that it runs on two transputers. The Print task will run on Transputer 1, the Root, while the Compute Task will reside on the second transputer. This time the two tasks will communicate over a hard channel. Run the same experiment implemented in this section, and stop the output of primes by pressing Ctrl-C before the last prime is printed. Then use Parascope (or your debugger) to find the status of the Compute Task residing on Transputer 2. In particular, display the contents of its local variables. . |
| 6-2 |
When starting several tasks with ProcPar(Task1P, Task2P...TasknP), which
task starts first? Task1 or Taskn? To find out, write a one-transputer
program that will start three tasks. Write the tasks in such a way that you
can safely abort the cio server once the program is started and analyze its
status with Parascope to find out which Task started first. (Be careful about
using a printf statement. Having several parallel tasks use the same resource
(the link to the Host) at the same time will likely result in the program
crashing!)
|
Defensive debugging means that we should start coding in a manner that will ease our debugging effort. In fact, we should start thinking about debugging before we start coding at all. Given a machine structure where the different parts of the program reside on different processors using hard channels to communicate, and where only one of them, the Root, is connected to the host computer, it becomes very difficult to follow the execution of any code segment that does not reside in the Root. For example, examine the following simple code section.
main()
{
int A;
if (_node_number==1)
{
...
ChanIn(LINK0IN, &A, INTSIZE);
printf("Root: A = %d\n",A);
...
}
else if (_node_number==2)
{
...
printf("Node 2: A = %d\n",A);
...
}
}
The two different code sections are running on two transputers, and both are
using a variable named A. At a particular point in the computation, we
want to check the contents of A in each task. Why not use a printf
statement? It looks simple and should work. Unfortunately, it will not. Can
we see why? Indeed, the printf statement of the second processor,
Transputer 2, cannot execute correctly because all traffic is handled by the
virtual channel router and the ChanIn and ChanOut operations will corrupt that
sent by printf and will relay it directly to cio running on the host. However
tempting, the implementation of such a relay task is not trivial, for the
following reasons:
These are requirements that require serious thinking and extensive programming. Definitely not worth the time and effort when all we want is to find the status of a variable.
As we discussed earlier, the idea behind root debugging is to first develop the parallel program so that it runs on the root transputer only. Three fundamental properties of the transputer allow us to take such an approach:
The second, more elusive property, is that the transputer treats hard and soft channels identically in C[1]. The difference between the two types of channels only appears at the microcode level. This gives the transputer invaluable flexibility in dealing with channels. The result is that all functions accepting channels as parameters will work equally well with hard channels as with soft channels. For example, assume that Ti and Tj are two tasks intended to run on two neighboring transputers. Assume furthermore that Ti and Tj exchange an integer in the following fashion:
{---- Task Ti, running on _node_number i ----}
void Taski(Process *P, some argument definitions )
{
...
ChanOut(LINK0OUT, &i, (int) sizeof(int));
...
}
{---Task Tj, running on _node_number j ---}
void Taskj(Process *P, some argument definitions )
{
...
ChanOut(LINK1IN, &j, (int) sizeof(int));
...
}
Then
we can easily make these two tasks run on the same transputer and "talk" over a
soft channel. The only difference in the code above will be in the channels
used.
/*---- Task Ti, running on _node_number i ----*/
void Taski(Process *P, Channel *softc, some argument definitions )
{
...
ChanOut(softc, &i, (int) sizeof(int));
...
}
/*---Task Tj, running on _node_number j ---*/
void Taskj(Process *P, Channel *softc, some argument definitions )
{
...
ChanOut(softc, &j, (int) sizeof(int));
...
}
Note that the code is exactly the same, except for the additional soft channel parameter that is passed to both tasks. We will have to make sure that the same channel is passed to both tasks initially.
Are the two properties seen so far sufficient for us to ensure that an application will run identically on one as well as on N transputers? They are sufficient to ensure that the tasks can run and exchange information on one transputer. Will the time-dependency of events, in particular the scheduling of data exchange be maintained? Yes, because communication over channels is blocking and unbuffered. This third property, forcing channel communication to block until both sender and receiver are ready to participate in the exchange, insures that communication events will occur in the same relative order when the tasks run on one transputer, as when the same tasks run on N transputers. These three properties are thus the key to root-debugging.

Node 2: LINK1 Node 1
Node 2:
Send integer to Node 1
Let's define the two tasks running on Node 1 and on Node 2.
int Task1( arguments to be defined later )You will notice that they use LINK1IN and LINK0OUT, as would have been expected if the two tasks had been intended to run on two neighboring transputers. The example of the previous section showed that the tasks must communicate via soft channels if we want to be able to map them to one transputer. In such a case, should we use LINK0OUT and LINK1IN? To answer this question, we simply need to look up the definitions of these terms in the conc.h header file:
{
int i;
printf("Task 1 started\n");
ChanIn(LINK1IN, &i, (int) sizeof(i));
printf("received %d\n",i);
}
int Task2( arguments to be defined later )
{
int i = 105;
ChanOut(LINK0OUT, &i, (int) sizeof(i));
}
#define LINK0OUT ((Channel *)(NOPROCESS | (_WORD_SIZE * 0))) #define LINK1OUT ((Channel *)(NOPROCESS | (_WORD_SIZE * 1))) #define LINK2OUT ((Channel *)(NOPROCESS | (_WORD_SIZE * 2))) #define LINK3OUT ((Channel *)(NOPROCESS | (_WORD_SIZE * 3))) #define LINK0IN ((Channel *)(NOPROCESS | (_WORD_SIZE * 4))) #define LINK1IN ((Channel *)(NOPROCESS | (_WORD_SIZE * 5))) #define LINK2IN ((Channel *)(NOPROCESS | (_WORD_SIZE * 6))) #define LINK3IN ((Channel *)(NOPROCESS | (_WORD_SIZE * 7)))
/* map.h Remapping the hard channels... */ #ifndef map_h #define map_h #include "conc.h" #define RUN_ON_ROOT /*define to map all tasks to root*/ #define NO_XPUTERS 2 #ifdef RUN_ON_ROOT Channel *Link0In[NO_XPUTERS+1]; Channel *Link0Out[NO_XPUTERS+1]; Channel *Link1In[NO_XPUTERS+1]; Channel *Link1Out[NO_XPUTERS+1]; #undef LINK0IN #define LINK0IN Link0In[_logical_node_number] #undef LINK0OUT #define LINK0OUT Link0Out[_logical_node_number] #undef LINK1IN #define LINK1IN Link1In[_logical_node_number] #undef LINK1OUT #define LINK1OUT Link1Out[_logical_node_number] #endif #endif
The macro RUN_ON_ROOT controls how the mapping is done. If RUN_ON_ROOT is defined, then the whole application is mapped to the root. If it is not defined, then the application is allowed to run in its originally intended mode, using as many transputers as are required.
If RUN_ON_ROOT is defined, then the LINK macros refer to array elements. The idea here is to associate with each node a series of eight soft channels acting as the hard channels, and whose names are closely associated with the hard channels they replace. The logical number of the node is used as an index to select the channel. This selection will allow us to very easily "attach" nodes to each other via the soft channels. But before we do so, let's come back to these channel arrays, and observe how the ChanOut statement of our example translates, depending on whether we define RUN_ON_ROOT or not. Here is the original statement:
ChanOut(LINK0OUT, &i, (int) sizeof(i));When RUN_ON_ROOT is not defined, LINK0OUT refers to the hard channel, and the statement is unchanged by the preprocessor. When the RUN_ON_ROOT is defined, however, the preprocessor pp[2] will change the statement as follows:
ChanOut(Link0Out[_logical_node_number], &i, (int) sizeof(i));
In this case, Task2, which is sending the integer i to Task1, will use a soft channel in the array Link0Out to communicate with Task1. If we take a look at the corresponding ChanIn statement in Task1, we will find that pp will have transformed it into:
ChanIn(LINK1IN, &i, (int) sizeof(i));You probably have caught on now with the role of _logical_node_number. Its function is similar to that of _node_number, but because our task may now be forced to run on the root, _node_number cannot be used anymore to make the code aware of the node it is working on. The solution is to pass to each parallel task a variable identifying the node number with which it is associated. This is what _logical_node_number does. Task1 will be passed the number 1. Task2, the number 2. This way, independently of the mapping, Task1 and Task2 will be know that, logically, they are working for Node 1 and Node 2, respectively.
int InitializeMapping()
{
#ifdef RUN_ON_ROOT
int i;
/*--- use hard channels between Root and host ---*/
Link0In[1] = LINK0IN;
Link0Out[1]= LINK0OUT;
AbortIf(!(Link1In[1]=ChanAlloc()),"Out of memory");
AbortIf(!(Link1Out[1]=ChanAlloc()),"Out of memory");
/*--- others nodes use soft channels ---*/
for (i=2; i<NO_XPUTERS; i++)
{
Link0In[i] = Link1Out[i-1];
Link0Out[i] = Link1In[i-1];
AbortIf(!(Link1In[i] =ChanAlloc()),"Out of memory");
AbortIf(!(Link1Out[i]=ChanAlloc()),"Out of memory");
}
/*--- Last node uses hard channels too ---*/
Link0In[NO_XPUTERS] = Link1Out[NO_XPUTERS-1];
Link0Out[NO_XPUTERS] = Link1In[NO_XPUTERS-1];
Link1In[NO_XPUTERS] = LINK1IN;
Link1Out[NO_XPUTERS] = LINK1OUT;
#endif
return 0;
}
The function abortif, whose code is in the full listing of map.h simply exits if the condition it is testing is NULL. The function InitializeMapping is also part of the map.h file, and creates soft channels that it assigns to the Link0In, Link0Out, Link1In, and Link1Out arrays in order to emulate a chain of transputers. Figure 6-2 illustrates how this operation works. Observe that the physical layout used to connect the nodes influences directly how the arrays are initialized. The array elements here are represented separated from each other, and array cells with identical fill patterns contain identical channel pointers.
The for (i=2; i< NO_XPUTERS; i++) loop is a direct translation of this assignment of matching patterns in the figure. The code directly preceding and directly following the loop ensures that the first Node in the chain uses real hard channels to connect the first node of the chain to the Host, and, by the same token, makes sure the last Node in the chain uses hard channels for the "ending" links of the chain. Although this may seem bizarre, this approach simply reinforces the symmetry of our emulation, and allows this method to be used more broadly, when only parts of the nodes are mapped to the root, and when other applications, possibly fully debugged, are running on the other transputers of the chain.