Parallel Programming in C for the Transputer
© D. Thiébaut, 1995



2
The Transputer

In this chapter we study the architecture of the transputer. Although this book is about software, and essentially about writing parallel programs in C for the transputer, understanding how the hardware manages parallel tasks and how information is exchanged between the transputers will provide tremendous help when we start writing parallel programs in Chapter 3.

We will first explore the methods used by the transputers to exchange information with each other, how transputers access the memory, and how they support multitasking.

Although the transputers are supported by a language that directly matches their architecture, occam, we will use C for all our programming. Logical Systems' extension of C is so powerful that we will be able to attack all our programming projects without recurring to another language, and in very little time (if you are already familiar with C, of course). An important benefit is that knowing C will allow us to concentrate on the concepts of parallel programming without the nuisance of having to learn a new language at the same time.

In Section 2-1 we present a general introduction of the transputer family, and present its general architecture.

In Section 2-2 we briefly look at the processor part of the transputer and how it is designed to support multitasking. Understanding how the transputer allocates its time among several tasks of varying priorities will become important when we start writing parallel programs.

Section 2-3 deals with the memory and how the transputer manages it. Typically, a transputer contains a few Kilobytes of on-chip random access memory (RAM), and can access several Gigabytes of external memory. Programs and data can occupy both, but on-chip memory is faster, making it the prime target for improving the efficiency of our programs.

The last section covers the Input/Output (I/O) ports. Each transputer has four I/O ports that connect directly with I/O ports of other transputers. The characteristics of the ports, the protocol followed for exchanging information, and how fast the information is transferred are the main ingredient controlling the performance of data-communication in a network of transputers. They will directly influence how we decompose programs, and will be key players in the performance we harness from our programs.

2-1 A processor with memory and I/O ports

In 1985 Inmos introduced a new concept in VLSI (Very Large Scale Integration): That of a single circuit containing a processor, some local memory, and four Input/Output ports.

The circuit was a computer in its own right, containing a processor, some memory to store programs and data, and several ports for exchanging, or transferring information with other transputers or with the outside world. By designing these circuits so that they could be connected together with the same simplicity with which transistors can be in a computer, the transputer was born.

Combining these entities on a single silicon chip was not a technological breakthrough in itself. Other companies, such as Intel or Motorola had introduced versions of their popular 8-bit processors with local memory and interfacing hardware. The novelty was the combination of several factors. One of the most important factor was the introduction of a high-level language, occam [MAY83], whose features were directly supported by the transputer hardware. and that made the transputer a building block for parallel computers. The second prominent factor was the ease with which transputers could be connected to each other with as little as a few electrical wires.

The four bi-directional input/output (I/O) ports of the transputer are designed to interface directly with the ports of other transputers, very much like the pegs on top of Lego blocks fit directly in their bottom cavities. This feature allows for several transputers to fit on a small footprint, with very little extra logic circuits, making it possible to easily fit four transputers with some memory on a PC daughter board (ISA bus), or microchannel board.

This hardware versatility, augmented by a processor designed to support the parallel constructs of the occam language makes the transputer a powerful building block for multiprocessor systems.


Figure 2-1: Transputer block-diagram
and examples of interconnection networks.

Figure 2-1 shows the basic block diagram of a transputer, along with some interconnection schemes: a linear chain (a), offering the simplest of connections, a ring (b), and a mesh (c).

We will use the linear chain often, as it is the simplest interconnection network possible, supported by all transputer-based systems, including CSA's Educational kit. In later chapters we also investigate other networks, such as rectangular meshes, toroidal meshes, or rings.

The term transputer really covers a family of circuits [INMO88a]: Some with a 16-bit word length (T2xx series), others with a 32-bit word length (T805, T800, T425, T414, and the T400). The T4xx and T8xx series differ from each other in several factors:

The amount of on-chip memory. The T414 and T400 have only 2 KBytes of internal memory, while the T8XX series boast 4 KBytes.

Whether they contain a floating point processor. The T800 and T805 do, while the others do not and have to rely on software to implement (at a slower speed) floating point operations.

The number of I/O ports supported. The T4XX series have two, while the T8XX series have four, allowing more versatile networks [1]

An interesting property of the T4xx and T8xx circuits is that they are not-only link-compatible (a T800 and a T400 can be connected together without any special adapter), but they are also pin-to-pin compatible, allowing one to upgrade from a T400-based system to a more powerful T800-based system with a simple circuit swap]

[Previous] [HOME] [NEXT]