FLAGS [A|a|C|c|S|s]
TEMP <file path>
LIST <file name>
INPUT <main_file.trl> [,<other_file.trl>]
ENTRY <entry_point>
LIB <lib_file> [,<lib_file>]
In our case this will be t8lib, which is one of the standard libraries distributed with LS C. In cases where the transputer kit is not populated with off-chip memory, the ltio library can be used, and is added after t8lib on the LIB command line.
OUTPUT <output_file_name>
LOAD <hex, octal or decimal number>
STACK <hex, octal or decimal number>
There are two ways to provide the above information to the linker. The first, rather cumbersome way to do this is to simply call the linker and to provide the information on an interactive basis, as shown in Figure 3.4.
The second method involves writing a command file containing the same information as above, and to have the linker read this file. This file typically has a .lnk extension. The contents of first.lnk is shown below. With a command file, linking becomes an effortless task, that can be easily tailored to batch processing or to the use of the make utility[1].
FLAG LIST first.map INPUT first.trl ENTRY _main LIB t8lib LOAD 0x80000400 STACK 0x80100000Listing 3-3: Contents of the command file for the linking of first.c
C:\tlnk TLNK Linker, Version 93.1, Copyright (c) 1986-1993 by Logical Systems Flags (A|a|C|c|S|s)[none]: Temp file path [.] : Listing file [NUL] : first.map Input file(s) [.trl] : first.trl Entry point [_main] : Library file(s) [.tll] : t8lib Output file [first.tld] : Load address [0x80001000]: 0x8000400 Stack address[0x80001000]: 0x8010000 External symbols: 119 External symbol table usage: 5% Total symbols: 441 Total symbol table usage: 8% No errors detected C:\Listing 3-4: Example of interactive use of linker (user input are shown in italics).
The program is almost ready to run! If it were a Turbo C program, for example, then we could simply invoke the program by typing its name at the DOS prompt, and DOS would load the program code from disk, store it in memory, and start it. But in our case, DOS does not recognize the transputer as a processor. Remember from Section 2-5 that the transputer is an attached processor, not a co-processor, and is seen by DOS as nothing more than an I/O device. Hence we need a mechanism for loading the transputer code (stored in the .tld file) in the root transputer. Since DOS won't do it for us, we need a program to do this. This program is called ld-net, for load-network [2].
For most simple applications, the compiling, assembling and linking can be done with the tcc utility supported by LS C. The advantage of tcc is that it automatically determines the appropriate switches to use with the different filters (pp, tcx, tasm, or tlink). Its syntax is as follows:
The extension of the file name will define which filters tcc will call. If you input a file with a .c extension, tcc will call pp, tcc, tasm, and tlink, successively to process the file. If, on the other hand, you feed a file with a .tal extension, then tcc is clever enough to recognize an assembly file and to call the linker directly. The tcc utility will be sufficient for most straightforward applications and systems. In more specialized cases, however, knowing the use of a make file (defined later) will yield the necessary assistance required to automatically generate ready-to-load modules.
Loading the transputer program on the transputer network, and launching it requires ld-net to perform several tasks:
The loader gets the information about how it should perform these three steps from a file, the network information file, which typically carries a .nif extension. This file can be created with an editor or enterprising readers can write a simple information file generator[3]. The contents of the network information file consists of two parts. The first section contains four lines controlling the loading process. The second section contains information describing the graph representing the transputer network. This information is used by ld-net to determine which program or programs must be loaded onto the transputers[4]. The network information file for our program is shown in the listing below.
buffer_size 200; host_server cio.exe; level_timeout 1000; decode_timeout 1000; 1, first, R0, 0, , ,;Listing 3-5: contents of nif file first.nif for program first.c.
The first four lines of the file form the command part of the network information file.
Buffer_size defines the number of bytes of transputer internal memory used when downloading programs. When a program running on several transputers is loaded, it is first passed to the root- transputer, which passes it on to neighboring transputers, which, in turn, pass the program down to their neighbors, and so on. A buffer is used to hold the data transferred. The default value is 255 bytes, and results in the fastest loading time possible. Smaller buffer sizes are possible, but can substantially slow down the loading of programs in large transputer networks.
Host_server identifies the program that is to run on the PC host. We will first use the cio.exe program provided by CSA. As we become more experienced with parallel programming, some readers may want to write their own host driver, in which case the network information file host_server command will be changed to reflect the switch to a new driver.
To understand how level_timeout and decode_timeout (see below) operate, one has to understand the way a transputer network is loaded. The nodes are organized as a tree, with the root transputer coinciding with the root of the tree. Each node is loaded by its parent, and is responsible for passing on modules to be loaded on its child(ren). The level_timeout quantity represents the number of milliseconds required for a node to send a message (program to be loaded) down to its child nodes, and to receive an acknowledge of successful execution. The range is 25 to 1000 milliseconds. If the root transputer or any other transputer in the network does not receive a "transputer loaded" acknowledge from all its children during the allotted time, a time-out condition occurs and the loader stops.
The decode_timeout quantity is similar to the level_timeout, except that it defines the maximum amount of time required by a single node for getting a message and executing it. An order to clear the memory, for example, is one that requires a nonnegligeable amount of time. The default value is 1000 (or one second), but the allowed range is from 25 to 20000. Most programs running on average size networks will load without problems using the default values. Larger programs, or complex networks may require experimenting with different values.
The second part of the network information file simply describes the physical configuration of the transputer network, including neighbor-to-neighbor links, and the allocation of programs to transputers. The format for each line is the following:
Node#, Program, Parent, [Link0], [Link1], [Link2], [link3];
which, in the context of our program first.c, results in:
1, first, R0, 0, , ,;
Because first.c is written for one transputer only, our network has only one node, and therefore one line is sufficient to describe it[5]. Its first field, 1, indicates that we are using the root transputer. All transputers are assigned a different Id, with the root transputer always assigned Id 1.
First indicates that the program to be loaded is first (the .tld extension is implicit). The parent field R0 indicates that the parent of this transputer is Node 0 (the PC host) and that the parent will reset this transputer through the master reset signal. This makes better sense when the network is a tree, for example, with the root transputer at the root of the tree. Then every transputer node has a parent. The parent of the root node is the PC host, which is always given Id 0. The R symbol indicates how the parent resets the current node. The letter R is used when the reset is passed from Parent to Child through the normal reset channel. The letter S indicates that the reset signal is generated by the parent, rather than passed from an ancestor.
Finally, the remainder of the line, 0, , ,; defines how Transputer 1 connects to the rest of the network through its links. Here, Link0 is connected to the host, with Id 0. The other links are not connected to any transputer involved in the parallel program, and are left blank.
We are finally ready to run the program. The command will simply tell the ld-net loader to read the file first.nif.
The result of loading and executing the program is shown in Listing 3-6. As we can see a lot of information is displayed before our program really does anything. The loader, after displaying copyright information, loads a bootstrap into Root Node 1. The purpose of the bootstrap is to receive the contents of first.tld, and to start it[6]. Once this phase is performed, the loader starts cio on the host. At this time, the program first can carry out its input/output operations. The result is the display of the sum of the integers entered by the user.
C:\ld-net first
LD-NET (Network Loader), Version 93.1 [Link I/O Driver: 'lspcdma']
Copyright (c) 1986-1993 by Logical Systems
Loading first phase of bootstrap to root node 1
Finished loading first phase, awaiting first acknowledge
Loading second phase of bootstrap to root node 1
Bootstrap loaded, awaiting acknowledge
Successfully bootstrapped root node 1
Bootstrapping the remainder of the network:
Network successfully bootstrapped
Downloading program: first.tld
Program downloading completed
CIO ('C' I/O Server), Version 93.1 [Link I/O Driver: 'lspcdma']
Copyright (c) 1986-1993 by Logical Systems
Enter two integers, a and b:
a: 45
b: 123
45 + 123 = 168
It is instructive to stop here for a brief instant and to analyze what is going
on with the program. If we look at first.c in Listing 3-1 again, we see what
looks like a perfectly sequential program, with interleaved printf and scanf
statements. There is however some hidden parallelism in this picture. The
printf and scanf statements hide more than what we are used to in typical C
programs. Here, the scanf statement executed on the root transputer results in
a message sent over Link0 to the PC host, and a request to input some number
of bytes. The cio driver running on the PC intercepts the request, decodes it,
and performs an integer input from the keyboard. Once the number is collected,
cio sends a message back to the transputer with the information requested.
The parallelism exists in space, in the form of two processors (the 80X86 host
and the root transputer) executing different parts of the computation. But
the parallelism does not exist in time, since the transputer is idle while it
is awaiting information from the link (as we saw in Section 2.2). Hence the
picture resembles more a relay race with only the processor with the baton is
active while the other one is waiting idle.
This dependence between transputers and the host for accessing data will be one
reason for us to replace cio later on, and to delegate more computing power to
the host. We are now ready to start exploring the parallel constructs
introduced by Logical Systems in their parallel-C library. We will also
increase the level of parallelism by programming transputers connected in a
linear chain.