Parallel Programming in C for the Transputer
© D. Thiébaut, 1995



2-3 Memory access

The transputer can access a linear address space of 4 Gbytes (232 bytes), corresponding to its 32-bit address registers. Of these 4 Gbytes, 4 Kbytes are built-in the T805 transputer circuit (2 Kbytes in the T400), and correspond to the lower part of the memory address space. Since the registers are 32 bits in length, the transputer accesses 4 bytes at a time when accessing memory. The memory maps of the T805 and T400 transputers are shown in Figure 2-3.



Figure 2-3: Memory maps of the T805 and T400 transputers.

Note that Inmos chose the mapping such that the lowest memory address is negative, 0x80000000, while the highest memory available is the largest positive 32-bit integer: 0x7FFFFFFF. The range 0x80000000 to 0x80000FFC (0x800007FC for the T400) corresponds to the internal memory. 0x80001000, the lowest address in the outside memory of the T800 is referred to as MemStart in the Inmos terminology.


Wrap-around memory

In most systems, not all of the memory address space will actually be populated with semiconductor memory. In a CSA Educational Kits, for example, only 1 Mbytes of external memory is implemented, and only the lower 20 bits of the address bus are used. This results in a "wrap around" of the memory address space. This is an important detail to keep in mind, as it is fairly easy to set-up programs with a stack space or heap space outside the boundaries of the physical memory, resulting in unpredictable results when the wrap-around brings accesses to the stack in the middle of program code!

The wrap-around concept is easily illustrated by a simple example. Assume that we have a processor connected to a 16-word memory, with an 8-bit address bus, and that we are using the lower 4 bits of this bus to access a 16-word memory (24 = 16). Assume furthermore that the upper 4 bits of the address bus are not used at all. Since the selection of a word in memory is insensitive to the upper four address bits, the word accessed when the processor generates the address 0x00 is the same accessed by the addresses 0x10, 0x20, 0x30, ... or 0xF0. If the processor scans its whole memory map by generating all 256 successive addresses between 0x00 and 0xFF, the result is that the 16-word memory sees the series of addresses 0, 1, 2, 3, ..E, F, 0, 1, 2, 3, repeating over and over, "wrapping around" to 0 after F.

Fast internal memory

For the transputer, accessing its internal static memory will be approximately 3 times faster than accessing its external memory. At a 30 MHz clock speed, for example, the T800 supports a memory bandwidth of 120 Mbytes/sec when accessing internal memory, while only achieving at most 40 Mbytes/sec with external memory. The internal memory should hence be used to hold code or data that is accessed often, such as a user stack for example.

Another aspect of the internal memory that directly affects the performance of a transputer system is the data transmission rate through the I/O ports. The performance data published in the Inmos literature assumes that data transfer through the I/O ports are always issued or directed to internal memory. In real applications, when the data cannot reside in on-chip memory and must be stored off-chip, the performance of the I/O port will degrade, with a factor directly proportional to the off-chip memory access time.

2-4 The Serial I/O Ports

The combination of the architecture of the serial I/O ports and the way the transputer manages them contribute to making the transputer a unique circuit especially well tailored for multiprocessing. In the remainder of this book, we will use the terms links and I/O ports interchangeably.

The transputer supports four bi-directional serial links (two in the case of the T400), each link being a set of two wires. When two transputers are connected together, they exchange information through one of their links. The processes (or tasks) running on each of the transputers are then free to communicate by exchanging data or messages over the link. We will use the term link to refer to the physical connection between two transputers, and the term channel to describe the software connection between the two processes. The transfer of data over a serial link is synchronized and unbuffered.

During the communication, the processes that initiated the transfer are blocked. Each process is placed at the rear of the list of inactive tasks. Because the processor and the links operate independently, the processor is free to run another process when one is blocked by communication. If eight such processes require transfer on each direction of the four serial links, all links can be active simultaneously, yielding a total throughput equal to eight times the maximum throughput of one link [1]. As a result the channels require no message queues or message buffers.

Data transfer rates

Data are transferred one byte at a time, each byte generating an acknowledge from the receiving transputer. The typical transfer speed is 10 Mbits/sec, although speeds of 5 and 20 Mbits/sec are also supported. When two transputers exchange data in both directions, the bandwidth on the link can reach a maximum of 2.35 Mbytes/sec.

To get an idea of the speed involved, assume that a 100 Kbyte transputer program must be loaded into the transputer memory, and that a 64 by 64 32-bit integer matrix (16 KBytes) generated by that program must be sent to the host upon completion of the program. Uploading the program will take 0.08 seconds, while downloading the matrix will take 0.01 seconds.

Transparent ports

One of the most remarkable architectural aspects of the I/O ports is that they are memory mapped. This means that programming a port and passing it the address and length of a message is done by writing these numbers to memory locations that are mapped to the registers of the port, hence the term memory-mapped I/O.

As a result, the instructions that are used to program the I/O port are all memory load and store instructions. What makes these instructions access the link is that they are using addresses that map to the link registers. But what if the addresses did actually map to physical memory? The answer is that two tasks running on the same transputer, both using the same addresses, but one initiating an output and the other an input, would be able to communicate.

This is the ingenious way used by Inmos to break the physical barriers imposed on the geographical distribution of tasks. Tasks residing on the same transputer or on neighboring transputers will be able to communicate using the exact same software, independently of whether the medium of communication is a physical link, or one simulated in memory.

We will thus distinguish between hard channels (communication between remote processes) versus soft channels [2] (communication between processes local to the same transputer). Only the microcode will know the difference between a hard channel and a soft channel. Assembly language instructions do not make the distinction, and for this reason, neither will high level language code!

This means that a parallel program containing several concurrent processes can be adapted with very few changes to run on a multi-transputer system, where the processes are distributed to the transputers, and are communicating over hard-channels. This feature allows tasks to communicate with each other through channels in such a way that the physical location of the tasks relative to each other is transparent to the code.

[Previous] [HOME] [NEXT]