Parallel Programming in C for the Transputer
© D. Thiébaut, 1995



3-3 The Linking Process

The linking process is assured by the tlnk program. Its function is to take the relocatable object code created by the assembler and to link it with library files, or modules containing the code of the library functions called in the user program, such as printf(), scanf(), strcpy(), or atoi(). The output of tlnk is a file ready to be downloaded to the transputer. The extension of the file is .tld (for transputer download). To operate, the linker requires several pieces of information:

Interactive session

There are two ways to provide the above information to the linker. The first, rather cumbersome way to do this is to simply call the linker and to provide the information on an interactive basis, as shown in Figure 3.4.

Command File

The second method involves writing a command file containing the same information as above, and to have the linker read this file. This file typically has a .lnk extension. The contents of first.lnk is shown below. With a command file, linking becomes an effortless task, that can be easily tailored to batch processing or to the use of the make utility[1].

	FLAG 
	LIST first.map  
	INPUT  first.trl  
	ENTRY  _main  
	LIB  t8lib  
	LOAD 0x80000400  
	STACK  0x80100000
Listing 3-3: Contents of the command file for the linking of first.c
	C:\tlnk

	TLNK Linker, Version 93.1, Copyright (c) 1986-1993 by Logical Systems

	Flags (A|a|C|c|S|s)[none]:
	Temp file path [.]       :
	Listing file [NUL]       : first.map
	Input file(s) [.trl]     : first.trl
	Entry point [_main]      :
	Library file(s) [.tll]   : t8lib
	Output file [first.tld]  :
	Load address [0x80001000]: 0x8000400
	Stack address[0x80001000]: 0x8010000

	External symbols: 119   External symbol table usage: 5%
	Total symbols:    441   Total symbol table usage:    8%

	No errors detected
	C:\

Listing 3-4: Example of interactive use of linker (user input are shown in italics).

The program is almost ready to run! If it were a Turbo C program, for example, then we could simply invoke the program by typing its name at the DOS prompt, and DOS would load the program code from disk, store it in memory, and start it. But in our case, DOS does not recognize the transputer as a processor. Remember from Section 2-5 that the transputer is an attached processor, not a co-processor, and is seen by DOS as nothing more than an I/O device. Hence we need a mechanism for loading the transputer code (stored in the .tld file) in the root transputer. Since DOS won't do it for us, we need a program to do this. This program is called ld-net, for load-network [2].

For most simple applications, the compiling, assembling and linking can be done with the tcc utility supported by LS C. The advantage of tcc is that it automatically determines the appropriate switches to use with the different filters (pp, tcx, tasm, or tlink). Its syntax is as follows:

tcc <input_file_name> [-options]

The extension of the file name will define which filters tcc will call. If you input a file with a .c extension, tcc will call pp, tcc, tasm, and tlink, successively to process the file. If, on the other hand, you feed a file with a .tal extension, then tcc is clever enough to recognize an assembly file and to call the linker directly. The tcc utility will be sufficient for most straightforward applications and systems. In more specialized cases, however, knowing the use of a make file (defined later) will yield the necessary assistance required to automatically generate ready-to-load modules.

3-4 Loading The Program

Loading the transputer program on the transputer network, and launching it requires ld-net to perform several tasks:

Network information file

The loader gets the information about how it should perform these three steps from a file, the network information file, which typically carries a .nif extension. This file can be created with an editor or enterprising readers can write a simple information file generator[3]. The contents of the network information file consists of two parts. The first section contains four lines controlling the loading process. The second section contains information describing the graph representing the transputer network. This information is used by ld-net to determine which program or programs must be loaded onto the transputers[4]. The network information file for our program is shown in the listing below.

	buffer_size 	200;  
	host_server 	cio.exe; 
	level_timeout 	1000;  
	decode_timeout	1000;

	1, first, R0, 0, , ,;
	
Listing 3-5: contents of nif file first.nif for program first.c.

Commands

The first four lines of the file form the command part of the network information file.

Buffer_size

Buffer_size defines the number of bytes of transputer internal memory used when downloading programs. When a program running on several transputers is loaded, it is first passed to the root- transputer, which passes it on to neighboring transputers, which, in turn, pass the program down to their neighbors, and so on. A buffer is used to hold the data transferred. The default value is 255 bytes, and results in the fastest loading time possible. Smaller buffer sizes are possible, but can substantially slow down the loading of programs in large transputer networks.

Host_server

Host_server identifies the program that is to run on the PC host. We will first use the cio.exe program provided by CSA. As we become more experienced with parallel programming, some readers may want to write their own host driver, in which case the network information file host_server command will be changed to reflect the switch to a new driver.

Level_timeout

To understand how level_timeout and decode_timeout (see below) operate, one has to understand the way a transputer network is loaded. The nodes are organized as a tree, with the root transputer coinciding with the root of the tree. Each node is loaded by its parent, and is responsible for passing on modules to be loaded on its child(ren). The level_timeout quantity represents the number of milliseconds required for a node to send a message (program to be loaded) down to its child nodes, and to receive an acknowledge of successful execution. The range is 25 to 1000 milliseconds. If the root transputer or any other transputer in the network does not receive a "transputer loaded" acknowledge from all its children during the allotted time, a time-out condition occurs and the loader stops.

decode_timeout

The decode_timeout quantity is similar to the level_timeout, except that it defines the maximum amount of time required by a single node for getting a message and executing it. An order to clear the memory, for example, is one that requires a nonnegligeable amount of time. The default value is 1000 (or one second), but the allowed range is from 25 to 20000. Most programs running on average size networks will load without problems using the default values. Larger programs, or complex networks may require experimenting with different values.

Node Description

The second part of the network information file simply describes the physical configuration of the transputer network, including neighbor-to-neighbor links, and the allocation of programs to transputers. The format for each line is the following:

Node#, Program, Parent, [Link0], [Link1], [Link2], [link3];

which, in the context of our program first.c, results in:

1, first, R0, 0, , ,;

Node#

Because first.c is written for one transputer only, our network has only one node, and therefore one line is sufficient to describe it[5]. Its first field, 1, indicates that we are using the root transputer. All transputers are assigned a different Id, with the root transputer always assigned Id 1.

Program

First indicates that the program to be loaded is first (the .tld extension is implicit). The parent field R0 indicates that the parent of this transputer is Node 0 (the PC host) and that the parent will reset this transputer through the master reset signal. This makes better sense when the network is a tree, for example, with the root transputer at the root of the tree. Then every transputer node has a parent. The parent of the root node is the PC host, which is always given Id 0. The R symbol indicates how the parent resets the current node. The letter R is used when the reset is passed from Parent to Child through the normal reset channel. The letter S indicates that the reset signal is generated by the parent, rather than passed from an ancestor.

Finally, the remainder of the line, 0, , ,; defines how Transputer 1 connects to the rest of the network through its links. Here, Link0 is connected to the host, with Id 0. The other links are not connected to any transputer involved in the parallel program, and are left blank.

Running the program

We are finally ready to run the program. The command will simply tell the ld-net loader to read the file first.nif.

The result of loading and executing the program is shown in Listing 3-6. As we can see a lot of information is displayed before our program really does anything. The loader, after displaying copyright information, loads a bootstrap into Root Node 1. The purpose of the bootstrap is to receive the contents of first.tld, and to start it[6]. Once this phase is performed, the loader starts cio on the host. At this time, the program first can carry out its input/output operations. The result is the display of the sum of the integers entered by the user.

	C:\ld-net first
	LD-NET (Network Loader), Version 93.1 [Link I/O Driver: 'lspcdma']
	Copyright (c) 1986-1993 by Logical Systems

	Loading first phase of bootstrap to root node 1
	Finished loading first phase, awaiting first acknowledge
	Loading second phase of bootstrap to root node 1
	Bootstrap loaded, awaiting acknowledge
	Successfully bootstrapped root node 1
	
	Bootstrapping the remainder of the network:
	Network successfully bootstrapped
	
	Downloading program: first.tld
	Program downloading completed
	
	CIO ('C' I/O Server), Version 93.1 [Link I/O Driver: 'lspcdma']
	Copyright (c) 1986-1993 by Logical Systems
	
	Enter two integers, a and b:
	a: 45
	b: 123
	45 + 123 = 168

Listing 3- 6: Listing of loading and execution of first.

Analysis



Hidden parallelism

It is instructive to stop here for a brief instant and to analyze what is going on with the program. If we look at first.c in Listing 3-1 again, we see what looks like a perfectly sequential program, with interleaved printf and scanf statements. There is however some hidden parallelism in this picture. The printf and scanf statements hide more than what we are used to in typical C programs. Here, the scanf statement executed on the root transputer results in a message sent over Link0 to the PC host, and a request to input some number of bytes. The cio driver running on the PC intercepts the request, decodes it, and performs an integer input from the keyboard. Once the number is collected, cio sends a message back to the transputer with the information requested. The parallelism exists in space, in the form of two processors (the 80X86 host and the root transputer) executing different parts of the computation. But the parallelism does not exist in time, since the transputer is idle while it is awaiting information from the link (as we saw in Section 2.2). Hence the picture resembles more a relay race with only the processor with the baton is active while the other one is waiting idle. This dependence between transputers and the host for accessing data will be one reason for us to replace cio later on, and to delegate more computing power to the host. We are now ready to start exploring the parallel constructs introduced by Logical Systems in their parallel-C library. We will also increase the level of parallelism by programming transputers connected in a linear chain.


[Previous] [HOME] [NEXT]