IMMD-IV TOP LEFT UP Thomas Thiel, 26/03/95

Architecture of the interconnection network

In this document you will find information about the interconnection network and its subparts. Some of the information given was derived from [DHD+94] (full document in postscript)

Interconnection network

Based on the MEMSY topology each node has access to its own communication memory and to the communication memories of the four neighbouring nodes. Additionally, every node of the B-level has access to the communication memories of its four asigned A-level nodes.

A static implementation of this topology requires up to 9 ports at each node and up to 6 ports at each communication memory. To reduce this complexity and the number of interconnections, a dynamic network component, called coupling unit, has been developed. The use of the coupling units supports the virtual implementation of the described topology and reduces the number of ports needed at the memory interface and the communication memory to three. Only two of these ports are used for the connections within one level.

Each coupling unit is a blocking, multistage, dynamic network with fixed size providing logically complete interconnections between 4 input ports and 4 output ports. The interconnection structure of MEMSY is a hybrid network with global static and local dynamic network properties.

The torus topology of a single MEMSY level is implemented by the arrangement of nodes, communication memories, and coupling units as shown below.

For reasons of complexity the local dynamic network component is not depicted in this figure. It is described in more detail in the next section.

Each node and each memory module is connected to two coupling units. Thus the nearest-neighbour torus topology can easily be established. A square torus network with N = n^2 nodes requires N/2 coupling units. The connections from the nodes of the B-level to the four corresponding communication memories of the A-level are also implemented by using coupling units. These are connected to the third ports.

In our implementation of the interconnection network, accesses to the communication memories via coupling units are executed with a simple memory access protocol. The interconnection network operates in a circuit-switching mode by building up a direct path for each memory access between a node and a communication memory.

Coupling Unit

The hardware component used to implement a multiprocessor system, as described above, is shown below. A coupling unit consists of the following subcomponents:

The structure of the p-ports and m-ports is basically identical to a memory interface with a multiplexed 32 bit address / data bus. The direction of the control flow is different for p-ports and m-ports. An activity (a memory access) can be only initiated at a p-port.

The control unit is a central component within the coupling unit. It always has the complete information about the current switch settings of all switching elements. If a new request is recognized by receiving a valid address, the control unit can decide at once whether the requested access can be performed or has to be delayed. For any access pattern the addressed memory port and all necessary internal subpaths are available when all switching elements contained in the communication path to be built-up are either inactive or possess exactly the switch settings required for the establishment of the interconnection.

The necessary switch settings of all required switching elements are fixed a priori for every possible access pattern. The decision about the performability of a requested access is made by comparing the required switch settings with the current ones.

The coupling unit used as a building block of the MEMSY interconnection network provides mechanisms at hardware level which support the efficient use of alternative communication paths in case of faults. The basis for this fault tolerance feature is the ring structure of the internal subpaths. Alternative communication paths consist of disjoint sets of internal subpaths. Hence, the permanent failure of one internal subpath within a network unit can be tolerated. The result will be a reduced bandwidth but full interconnection is guaranteed (graceful degradation).

Performance

An access to shared data in the communication memories requires a significantly higher access time than an access within the node. In addition to the fact that the reduction in access time caused by using a cache is generally no longer possible, the longer transfer paths and the execution of control mechanisms cause further delays. The sequentialization which can be required when conflicts occur either in the communication memories or in the coupling units can cause additional waiting times. The memory access time in our implementation is normally 1 µs and up to 1.3 µs if blocking occurs due to a quasi-simultaneous access.

Since only data which is shared by nodes is held in the communication memories, such as boundary values of subarrays, the increased access time has only a small influence on the overall computing time. Measurements made using the test system INES specially developed to measure the performance of the coupling hardware show that a high efficiency can be achieved under realistic conditions. Thus reducing the complexity of the network by using coupling units causes only a small reduction in performance compared to a static point to point network.

Communication memory interface integrated in each node

This unit provides the neccessary connections to the interconnection network and is connected directly to the M-bus. The interface provides the same behaviour as a memory board. Memory accesses to the interface are mapped to one of three available ports depending on the address. Timers are used to recognize timeouts on mapped memory accesses.

A simple measurement interface is also located on the communication memory interface. A write access to a specific register is mapped to the measurement port and a strobe signal is generated.

Communication memory

For the memory itself standard components are used. The interface to the interconnection network provides three ports as the communication memory interface of a node does. Accesses to the node are serialized by a special arbiter logic.


Thomas Thiel (thiel@informatik.uni-erlangen.de)