The Remington Rand Univac LARC
Two interesting IO devices on the LARC--the drums and the page recorder
by Norman Hardy
The LARC Drums
Twelve drums were delivered to Livermore. Each stored 250,000 words. They were each about the size of a washing machine--somewhat more than a meter cube. The drum rotated about a horizontal axis. The recording surface was a cylinder concentric with the axis. The read-write heads traveled along a path parallel to the axis. Stepping time was tens of milliseconds. I think that the rotation speed was about 1000 rpm. The IO processor was able to move the heads on different drums at once and also transfer data over two or three channels simultaneously. Transfers could begin as soon as the head reached the right track; it was unnecessary to await the beginning of the block to be transferred. There was some built-in block size--about 100 words, I recall.
Sperry devised a benchmark that showed the advantages of concurrent head motion. The benchmark did neutron diffusion to simulate a reactor. This was easily multi-tasked. IBM's Stretch had a bigger, faster disk, but could achieve only a few accesses per second, having only a single access mechanism.
The LARC Page Recorders
Most of this information is as recounted by George Michael in 1999. This was a device consisting of a sprocketed film magazine and a "characteron" CRT. The LARC IO processor controlled the film motion explicitly, frame by frame, and on a frame, record an image of textual data by causing it to be painted on the CRT once. The lab already had quick turn-around film processing for a variety of purposes. Over-night computing runs produced developed film the next morning.
The characteron appeared in several products at Livermore. It was composed of a CRT with extra beam steering capabilities and a focal plane near the electron gun as well as at the screen. In this extra plane, there was a grid of about 64 stenciled characters. The beam would be guided to one of these characters by the first deflection circuits, and then passed on and steered to the phosphor by the second deflection circuits, where the beam was again focused. The selected character would appear at the specified place on the screen. On one characteron demonstration, not the LARC's, I saw a large circular display with a few thousand characters. The owner handed me a magnifying glass and invited me to examine the period, which was well proportioned. Upon examining the period, I saw "This is a period". The small text was very well formed.
A few years later, Livermore acquired a machine from Stromberg Carlson which used a characteron to produce an image on a Xerographic drum which was then transferred to paper.
Notes on the LARC CPU design
I arrived at Livermore in 1955 as a programmer and an early assignment was to visit Philadelphia to learn about the logic design of the yet undelivered LARC. It was a fascinating experience.
Our LARC had 12 non-interleaved core memory boxes, each with 2500 words. An entire memory cycle for a box was 4 microseconds. The machine ran on a global 4-microsecond cycle divided into eight 500 nanosecond slots. Each circuit in the machine carried a new boolean value each 500 nanoseconds. Memory latency was 5 or 6 slots, I recall. The major units in the machine, except core boxes, were tasked according to the current slot's identity. Thus, the 8 slots on the memory bus were preallocated to these 8 distinct functions:
- Processor 1 data access,
- Processor 2 data access,
- IO processor instruction fetch or data access,
- Processor 1 instruction fetch,
- Processor 2 instruction fetch,
- DMA 1 access
- DMA 2 access
- DMA 3 access
I am unsure that there were indeed three DMA units. I think that there was only one processor on the LARCs that were delivered. Every LARC had a specialized IO processor that tended to the real-time aspects of IO.
Twenty six general purpose registers served as index registers, fixed registers and floating registers. They were hand wound cores with a one microsecond cycle time and one microsecond latency.
Unlike current RISC machines, there were few adders. The main adder, allocated according to the 8-slot schedule, calculated an effective address on one slot, an instruction address on another, a fixed point add result or the mantissa of a floating add on yet another. The adder was not, however, shared between processors, as was the case with the PPUs in the 6600.
The multiplier used a decimal version of carry-save add. The designer told me that the idea was already ancient. I wonder if it had been used in mechanical calculators. Sequentially dependent floating adds would proceed at 4 microseconds each. At most, one instruction could be issued every 4 microseconds. If an instruction modified an index register and the next instruction used that register in its effective address calculation, then there was a 4 microsecond penalty.
Given the above, we can reconstruct a rough description of the degree of LARC pipe lining. Here is the schedule of events for one floating add instruction in a stream of sequentially dependent floating operations:
- 0: Calculate the address of the instruction.
- 1: Summon the instruction.
- 7: Decode instruction and summon index register value.
- 9: Calculate the effective address.
- 10: Summon the core operand.
- 14: Summon register operand.
- 16: Initiate floating add sequence.
- 23: Send floating sum to registers.
The machine did register forwarding for sequentially dependent floating and fixed operations. Such a stream issued an instruction every 4 microseconds. (Floating multiply took 8 microseconds and floating divide took 28 microseconds.) You might describe the degree of pipe lining as three. The 7090, a contemporary machine from IBM, was slightly more than one, as I recall. The Stretch was rather more than three.
Comparing Stretch and LARC strategies leads me to the following points regarding allocation of hardware units to logical tasks:
- The LARC benefited from clever engineers who decided at hardware-design time how to allocate hardware. They were in a position to modify the design of other units to achieve clever melding of the schedule. The Stretch benefited from late binding of allocation of hardware to task.
- The timing rules were easy to understand for the LARC and were indeed well understood two years before the machine was shipped. The timing rules for the Stretch were hard to understand and few understood them well. This understanding came about only a year after the machine was shipped. While the Stretch missed its speed goals (and the LARC didn't), it was less late and was also a rather faster machine.
- Both machines had random access mechanical memory (moving head disks and drums) and their performance characteristics was another saga.
For more information:
For information about this page, contact us at: email@example.com