Early Archival Storage at LLNL
by Sam Coleman
The Lawrence Livermore National Laboratory developed the Octopus network in the 1960s to interconnect the supercomputers and peripherals of that day. Octopus innovations included time-sharing operating systems, remote terminal access, and archival storage. I participated in the development of the MASS and ATL archival systems. The architecture of these systems is shown in Figure 1. I will focus on MASS and ATL, but I will mention some of the storage systems that preceded and followed them. I will also discuss some aspects of the Octopus network that related to storage.
Before the central archives were built, worker machines, like the Control Data 6600s and 7600s, included their own tape farms. Users were assigned individual tapes and were responsible for keeping track of which files were on which tapes. User applications requested tapes by number. The tapes typically remained on the tape drives while those applications ran. This was typical of most computer centers, but was inefficient:
- User tapes were usually mostly blank. It was easier for users to manage tapes containing only a few files and certainly less bother not having to be concerned about overflowing tapes. As a result, the vault contained thousands of 95% or more empty tapes.
- Tapes were usually not loaded ("hung") by operators until the application started executing, introducing inevitable delays in the operation of the worker machines. The operators had lists of upcoming requests and usually pulled the tapes from the vault before they were requested by applications, but there were still delays when applications were ready to run.
- Applications typically read data from the tapes when they started and wrote their results to tape, but the tapes typically sat idle on the drive for long periods. Thus, the utilization of the tape drives was poor, and user jobs were queued waiting for drives to become available.
Users would list some advantages to this approach:
- They had the warm fuzzy feeling of knowing exactly where their files were stored.
- Users could retrieve their tapes from the vault, to transport data somewhere else or to store them in their offices (office environments were less safe than vault storage, of course, but provided fuzzier feelings).
- Tapes were not handled unless the owner requested them, reducing the chance of file loss if a tape was damaged while accessing someone else's data.
Because the efficiency disadvantages of user-owned tapes seemed to outweigh the advantages, LLNL developed a centrally-managed storage system named Elephant (because elephants never forget. In later years, we were also reminded that elephants can be slow and clumsy). The Control Data 7600 computers were the last worker machines to which tape drives were directly connected--from the time of the Cray-1 machines and later, all worker machines accessed centralized storage facilities over the Octopus network.
The IBM Datacell
The first automated archival device in the Elephant system was the Datacell. The Datacell resembled a large bucket, with hundreds of strips of magnetic tape hung vertically in cartridges arranged around the inside of the bucket. When a file was requested, the Datacell rotated the bucket to position the appropriate cartridge holding the required strip of tape. The Datacell then pulled the film strip out of the bucket, wrapped it around a cylinder, and accessed it like a magnetic drum. The total capacity of the Datacell was 400 megabytes. Files recorded on it were intended to be temporary. The Laboratory installed several Datacell devices and provided a number of file-lifetime options, all measured in days. The Datacells were on line from 1966 until 1976.
The IBM Photodigital Store
The first massive storage system was the trillion-bit Photostore. It is said that our Photostore, when it went on line in 1969, doubled the on-line computer capacity of the planet earth. (The Berkeley Lab and Los Alamos also used the Photostore for a while, and the National Security Agency had two, each twice the capacity of ours. However, rumor has it that none of these organizations made good use of the device--NSA never got theirs into production.) The Photostore capacity was so massive, compared to other storage devices, that the Lab programmers assumed that the device would be retired before it filled up.
However, the Photostore was immensely popular, and its users filled it to capacity in about four years. Since the Photostore's optical media was write-once, older files had to be removed from the device and stored off-line to make room for new data. Retrieving the off-line files was a time-consuming, manual operation in which operators retrived the Photostore cells holding the requested files and returned them to the Photostore. The programmers working on the project had to scurry to implement the software to handle the off-line cells.
For more information on the Photostore, see John Fletcher's article.
The Evolving Octopus Architecture
The Octopus network interconnected the Laboratory's supercomputers and various peripheral services. The original Octopus architecture included a pair of DEC PDP-6 computers, later upgraded to PDP-10s, as the network's central node (the "head" of the octopus). "Tentacles" from the PDP-10s provided interactive terminal service, central printing, a high-speed display system (TMDS), file transfers between worker machines, and access to the Photostore. While conceptually clean, this architecture had several drawbacks:
To alleviate these problems, the Octopus philosophy was changed to provide separate server machines for different services, an approach later referred to as a "client-server" architecture. Terminal access and password checking were among the first services moved off of the PDP-10s (see John Fletcher's article on the Ostrich and Kiwi systems). The MASS system was part of this new Octopus architecture.
- With all of the network services controlled by the PDP-10 systems, when those computers failed, the entire network was disabled. Even though the PDP-10s were configured redundantly, with either system able to serve the network, development and maintenance periods, and hardware failures, prevented continuous service. Garret Boer recalls that
- "I spent many, many hours traipsing out to the lab after hours and working with the technicians to get things working. Software problems were almost non-existent. The challenge was to navigate the myriad channels of home-grown hardware and multiple vendor products to isolate hardware failures. I once replaced a card in the PDP-6 myself to get the machine working when the technicians were slow to arrive."
- The data-transfer requirements grew faster than the PDP-10 systems could support.
- Problems with one sub-system, e.g., central printing, could prevent access to other services, like terminal access.
- To replace the PDP-10s with newer machines, all of the network services would have to be converted at the same time, an enormous effort with inevitable transition problems. All of the PDP-10 software, including the operating system itself, was written in assembly code, so porting the services to new machines was difficult--"portability" was an unknown concept in those days. To make matters worse, Lab engineers designed and implemented several new, useful PDP-10 processor instructions. The system programmers used them eagerly, but this made upgrading the PDP-10s even more difficult.
The PDP-10s maintained the Elephant directory structure. The Elephant system received requests for file transfers, looked to see where the files were stored, and forwarded the requests to the appropriate sub-system: The Photostore, the Datacell, caching disks controlled by the PDP-10, MASS, or the ATL.
The Elephant system was was an early capability-based system, designed after the Multics system developed at MIT. Garret Boer attended a conference where their ideas were presented (capabilities and directories were the backbone), and reported their ideas back to the Storage Group. The LLNL Storage Group had a system implemented before MIT did.
The Elephant system used directories from the beginning--it was a long, long time before local systems on the workers followed suit. When some worker-system programmers were tasked with generating an interface to the storage system, they proposed a system which hid the directory structure, so that users would have the same, primitive file list that was implemented on the worker machines. There was a bit of unpleasantness before they yielded and permitted users to use the central directories as intended.
Garret describes an interesting device used in the storage system for a while: A four-foot diameter head-per-track disk. It was not used for file storage, but provided fast access to directories (backed up on the Datacell). After it was taken down for repair, it was restarted--brought gradually up to speed--with a technician holding a stethoscope to the cabinet. Upon hearing anything to indicate that the read-write heads were misbehaving, the start-up would be aborted.
The Multi-Access Storage System (MASS)
Recognizing the need for shorter-term storage with faster access than the Photostore could provide, the Laboratory embarked on the MASS project to provide storage on Control Data Corporation's model 38500 Cartridge Store. MASS had a capacity equal to that of the Photostore.
The MASS system defined where files were located on its storage medium, but was not concerned about how the user referred to those files. When a file was stored, MASS created a pointer, or capability, for the file and returned it to the PDP-10, where it was stored in the user's Elephant directory structure. The capability identified the volume where the file was stored and included track and block information to locate the file in the volume. When the user requested the file, the directory structure retrieved the capability and sent it to MASS, providing the information MASS needed to access the file.
To access MASS, laboratory engineers built channels to transfer data between the worker machines and a buffer memory. Each channel matched the transfer speed of the peripheral processing units (PPUs) on the Control Data 7600 worker machines--40 million bits per second (pretty fast for the mid 1970s!). Each worker had two channels to MASS. The buffer memory had a total throughput of 280 million bits per second. With direct channels to every worker, MASS also took over the PDP-10's job of transferring files between worker machines.
The Control Data Cartridge Store devices were also connected to the buffer memory through Lab-designed interfaces. A pair of TI-980 minicomputers (TI-980A to be precise) orchestrated the file transfers.
No user data flowed through the TI-980s. To set up the data transfers, code on the TI-980s placed messages in the buffer memory and then set hardware registers to communicate with the worker machines through the high-speed channels. The data also bypassed the worker central processors, flowing from the worker disks, through the PPUs and the MASS channels to the buffer memory and, from there, directly to a storage device or to another worker machine.
The TI-980 software was called "Gopher", continuing the convention of naming systems after animals, because the MASS system would "go-fer" one file and then "go-fer" another. Also, the honeycomb cartridge-storage cells of the 38500s were compared to gopher holes. The diagnostic code, which ran on the alternate TI-980, was called MOLE, an acronym for MASS Off-Line Environment.
I wrote the Gopher and MOLE software. Texas Instruments delivered a primitive TI-980 operating system on punched cards with the computers, but we never successfully ran it. I wrote TI-980 code on the PDP-10s in TI-980 assembly language. I used a cross-assembler on the PDP-10, written by the Storage Group, to translate the source code to a TI-980 binary image. I punched the first TI-980 codes onto paper tape on the PDP-10, walked the tape to the TI-980s (an early implementation of a "sneakernet"), and read it into the TI-980 for execution. Each paper tape contained a small bootstrap loader which was executed by the TI-980's microcode. That loader then read the rest of the binary image and stored it onto a local disk drive. The disk drive also contained a bootstrap loader, read by the TI-980's micro-code and invoked by a front-panel switch, which loaded the code into the TI-980 RAM memory and executed it.
One of my first TI-980 codes transferred data from the PDP-10s to the TI-980s over the Octopus network, the network used later to submit file-transfer requests to MASS. This allowed me to assemble software on the PDP-10s and transmit it to MASS over the network, eliminating the need for paper tape.
The MASS control architecture was redundant to improve reliability. There were two independent TI-980 computers. One computer operated the MASS system while the other ran the MOLE diagnostic system. The disk drive I referred to above was actually two removable hard disks on each TI-980. Each removable disk held one megabyte. These disks were driven in parallel, with everything written on one also written to the same sector on the other. LLNL engineers built hardware switches to swap the connections from the operator terminals, displays, the buffer memory, storage devices, and the network between the two TI-980 computers. To switch the control of MASS from one TI-980 to the other, an operator idled both machines, using console commands, spun down all four disk drives, swapped the disk cartridges between the TI-980s, threw the switch to re-configure the peripheral connections, and then re-booted the TI-980s via front-panel switches. The process, which took about a minute, was performed once a week, even if everything was functioning properly, to exercise both TI-980s. The TI-980 not being used to run the production system was used to test new software and to troubleshoot many of the MASS hardware components.
The size of the MASS code was comically small by modern measures. The entire production code consisted of 19,359 source lines, including comments. Some sample code, the routine for looking up operator commands in a table, is shown in Figure 2.
The system was resident in the TI-980's 64 kilobytes of RAM, 8 kilobytes of which was used by the computer's microcode. (The machine contained two 32KB memory cards.) Gopher ran in the 56 remaining kilobytes and included the following features:
- The operating system was entirely interrupt-driven, with eight levels of hardware interrupts. Software modules were assigned levels so that higher-priority tasks, like processing a file transfer request from the PDP-10, took priority over lower-priority tasks, like updating a status display or servicing an interrupt from the operator console.
- The hardware devices were also managed using multiple-level interrupts. A high-priority interrupt from, for example, a tape drive could be serviced and a new command issued to the drive within a few microseconds (typical TI-980 instructions took between .75 and 1.25 microseconds).
- The code was reentrant. For example, sixteen tasks, running the same code, managed the sixteen media drives in the storage devices concurrently. The TI-980 base register pointed to the RAM storage needed by a task while it was executing.
- The 19,359 source lines, in addition to the operating system itself, included all of the code necessary to drive the Control Data robotic devices managing the media cartridges, including error recovery; to operate the media drives to load and unload cartridges and to send data to and from the buffer memory and from the buffer memory to the workers; to maintain a status display on the Laboratory's TMDS television display system, so that users could monitor the length of the MASS queues and the activity on the system; to maintain more detailed status displays for the operators; to communicate with the operators via hard-copy terminals; to communicate with the PDP-10s using a protocol similar to, but predating, today's IP protocol; to maintain queues of file requests and to process those requests; to maintain logs and error records on the local hard drives and to display the error records on demand; to maintain statistics of the activity on each of the media drives; run exercise codes on media cartridges and worker channels, etc. You probably get the idea: We provided a lot of functionality in very few lines of code; writing compact code to fit into the available memory was a major goal of the project.
The Cartridge Store
We had eight Control Data robotic devices, each of which had two tape drives and held 2,000 cartridges. Each cartridge held eight megabytes on 150 inches of three-inch-wide magnetic tape. Of the 150 inches, 100 inches were writable, the rest being leader. To access a file, the robot positioned itself in front of the desired cartridge, blew a jet of air over the top of the cartridge, which blew the cartridge into the robot's "picker". The robot then positioned itself at the receiving port of the tape drive and mechanically shoved the cartridge onto a small conveyor belt. The conveyor belt moved the cartridge to the back of the drive where it was "shuttled" sideways into the load position. The drive then popped open the door of the cartridge and sucked all 150" of tape into the vacuum columns. (Both ends of the tape were connected to the cartridge--one end to the door and the other end to the spindle.) To read or write the cartridge, the drive "shoe-shined" the tape back and forth, in the vacuum columns, past the head. The tape was written in both directions, and the head was positioned to one of eight positions (nine data tracks were written for each head position). Thus, to read or write the entire cartridge, the tape was shoe-shined past the head sixteen times, one for each head position in each direction. When access to the cartridge was complete, the drive wound the tape back into the cartridge, pulling the door closed as the last of the tape was wound up. The drive then shuttled the cartridge onto an output conveyor belt, which moved the cartridge to the output port where the robot picked it up and returned it to its storage cell.
Five cartridges could be queued in the tape drive, with cartridges on either end of each conveyor belt and one loaded in the vacuum columns. The tape drive could load a cartridge, position the tape to an arbitrary position, read a couple of blocks, and then eject the cartridge every seven seconds. The robot could supply and put away cartridges at the same rate. Since most files stored on MASS were small (a few kilobytes), the system could transfer a file every seven seconds from each drive, for a total of 16 files every seven seconds, or 137 files per minute.
Unless the incoming flow of data was heavy, MASS wrote one cartridge at a time. Files were written in the order that the transfer requests were received from the PDP-10s, potentially putting files from many users on a single cartridge. (Since MASS didn't manage the directory structure, it wasn't aware of the ownership of the files it was transferring.) To protect against files being sent to the wrong user, if a tape was mis-positioned, for example, MASS wrote the cartridge number, track number, block number, and a file identifier onto every block of tape. When a block was transferred to the buffer memory, the TI-980 checked these values to be sure that the intended block was read.
The design and construction of the 38500 device at CDC, and the channel interfaces at LLNL, started in 1974. MASS began transferring production files between worker machines in 1978. Our CDC units were prototypes, and we participated in extensive testing. One issue, for example, was the number of times a tape could pass the read/write heads before the tape became unreadable. Our specification was 30,000 passes. This was considerably higher than what was expected for conventional magnetic tape--reading or writing a full MASS cartridge meant a minimum of 16 passes past the head, and considerably more if error recovery was required to read marginal blocks. One version of the tape heads wore out cartridges much too quickly and generated oxide debris in the drives to clog the heads and contaminate other cartridges. Control Data and 3M (the tape manufacturer) made the tape coating harder and the recording heads softer, resulting in rapid head wear. Several iterations were required to balance tape wear against head wear. The 38500 units went into production at the Lab in 1979.
The TI-980s contained extensive error-recovery code to handle tape read and write errors. The tape drive included a second set of read heads which read blocks immediately after they were written. Failures to read what had just been written were reported to the TI-980s. Gopher re-tried writing the block several times. After continued failures, Gopher wrote empty blocks ahead of the data. If the empty spaces exceeded the length of the data block being written, meaning that the block could not be written onto two separate pieces of tape, the cartridge was moved to a different drive, and the failing drive was taken off-line for human intervention. Write failures were usually caused by dirty heads. The heads were manually wet-cleaned during each shift but, sometimes, additional cleanings were necessary. We were concerned that defective cartridges, like defective half-inch tapes, might contaminate multiple drives every time they were used, but this did not turn out to be a serious problem.
The read recovery was also extensive. The drive contained CRC information which allowed it to correct many errors (the CRC was disabled on the read-back check after a block was written). Gopher re-tried failing blocks several times. If that didn't work, it reversed the tape and tried reading the block in the reverse direction. If that didn't work, it "shoe-shined" the tape a couple of times to dislodge particles that might be stuck to the head or the tape, re-positioned the tape, and then tried again. If that didn't work, it unloaded the tape and had the robot load it into the other tape drive on that unit. If the read also failed on that drive, the robot placed the cartridge into an output port and notified the operator. The operator might try the cartridge in another unit or, if reads of several cartridges were failing, clean the heads, or call a Control Data technician. With all of this, unrecoverable read failures were very rare.
The MASS cartridge system was in production for a decade, until 1988. It survived the Livermore earthquakes in 1980, which took the units off-line as their doors flew open and physically moved all of the equipment a couple of inches east on the computer room floor (had the units been positioned perpendicularly to the tremors, we would have had 16,000 cartridges knocked out of their cells). MASS was retired, reluctantly, when Control Data was no longer willing to maintain the hardware. LLNL's was the first system put on line and the last to be retired. Roughly 75 other sites, mostly commercial organizations, used the Cartridge Store. The space occupied by MASS was used to house the next generation archive, the StorageTek silos.
Retiring the Photostore
The Photostore was also retired reluctantly, when IBM was no longer willing maintain it. A lack of parts was blamed, but a more important reason was that Jim Dimmick, IBM's last Photostore engineer, also wanted to retire.
Retiring an archival storage device is more complicated than replacing one's personal computer with a new model. Files recorded over many years onto thousands of volumes must be discarded or moved to another medium. For a centralized archive, the transition to new devices must be transparent to the user, except that improved performance from the new storage system is acceptable, and files must be accessible throughout the transition.
It was well known that only a small number of files sent to the archive were ever retrieved, but identifying the still-useful files was the problem. We laboriously printed every user's directory structure and asked him or her to identify or delete the files that were no longer needed. This effort was mildly successful, with a small percentage of the archival files deleted. Some users simply announced that all of their files needed to be saved or declared that it was not worth their time to clean out the unneeded files. The Photostore had been recorded four times over by that time, and much of that data had to be transferred before the Photostore could be shut down. (We are often concerned about the mean time to failure of archival devices, and the potential lifetime of media, but we rarely consider the mean time to product retirement or the mean time to bankruptcy by the manufacturer.) Thus, we had a decade's worth of data to move.
Lots of companies made big promises about new storage devices. We were first promised holographic storage in 1975, a technology which is discussed as a future possibility today. Similarly, optical tape has been promised for years. When the end of Photostore maintenance was announced, however, the only practical alternative was conventional, half-inch, 6250-bpi magnetic tape on drives connected to the PDP-10s. Thus, we switched the stream of incoming archival data to magnetic tape and wrote code to transfer the remaining Photostore files to tape.
Transferring files by user or by directory, with the resulting random Photostore accesses, was impractical. Instead, the Storage Group wrote code to load each Photostore chip and transfer all of its files that were still referenced by the directory structure. It took over two years to copy the still-active Photostore files. Some files could not be transferred to tape--the Photostore recorder tended to "drift" over time, and readers were "tuned" periodically when they had difficulty reading the chips. Jim Dimmick could usually adjust a reader to read any particular chip in an emergency, but no reader settings were adequate to deal with all of the chips.
Also, chips were occasionally lost. They would be found lying around somewhere in the system with no hint as to where they belonged. Using a microscope, Jim Dimmick would then look at the actual bits encoded on the chip. The information at the head of each chip would enable us to locate the box where it had resided.
As the Photostore files were transferred, more and more user file requests required tape mounts. The number of tapes that one could carry on one's arm from the vault to the tape farm became a matter of pride among the operators. The users were not amused, since manual access to vault tapes was much slower that the Photostore access to which they had become accustomed. Even though the MASS system provided good access to short-term storage, a faster way to access archive files was needed.
The Automated Tape Library (ATL)
About the time that we started to off-load the Photostore, Xytex Corporation announced their Automated Tape Library. However, in 1975, before the product was finished, Xytex was purchased by Calcomp. Calcomp told the Xytex folks in Boulder, Colorado that their operation would not be disrupted. Two weeks later, Calcomp announced that they were moving the Xytex operation to Anaheim. The entire development team in Boulder resigned rather than move to Anaheim, halting product development . Nevertheless, our need was great, so we bought an ATL. The device was a rectangular box with magnetic tapes stored along its interior walls, with a single robot which traversed the length of the unit, delivering tapes to eight conventional tape drives attached to the outside of the box.
I wrote the code to drive the ATL. The ATL was designed to be driven by IBM mainframe machines running the MVS operating system. Such mainframes, however, were beyond our budget, and we had no experience with MVS or the PL-1 language it used. Therefore, we chose to drive the device with a mini-computer. The favored machines at that time were DEC computers, so I was given an LSI-11 to drive the ATL.
The ATL software, named Antelope (AnTeLope), was similar to the MASS robotic code, but was smaller, just 8,781 source lines. The Antelope lookup routine appears in Figure 3. The LSI-11 drove the ATL's robot and loaded and unloaded tapes, but the data channels remained connected to the PDP-10, which handled the data transfer. I modeled the ATL code after the MASS code, writing it in assembly code for the LSI-11.
Like Gopher, I assembled Antelope on the PDP-10 and transferred it over the network to the LSI-11. (Gale Marshall wrote all of the cross-assemblers on the PDP-10, using a common code base, which is why the codes in Figures 2 and 3 look similar.) The LSI-11 was also a 64KB machine. It was the first machine that we used that did not include console lights and switches to display and set memory, look at registers, etc. All of that was done through the microcode and the operator display. Nevertheless, it was unnerving, at first, to deal with a machine with a single power switch and no lights!
The ATL was a nightmare to control. As I mentioned, the development staff declined to move to Anaheim, so bugs that resided in the ATL's internal control software when Calcomp bought the company remained until the device was retired. (The Braegan Corporation bought the ATL from Calcomp and was the company most often associated with the product.) What made writing the ATL software interesting was the lack of protective code in the ATL itself. The ATL was the only device I worked with capable of physically damaging itself if given the wrong sequence of commands. For example, if one gave the robot a command to move to a location, say, ten feet beyond the end of its track, it would try its best to get there, hitting the end of the rail at high speed. Calcomp apparently recognized this quirk from other installations, since our device was delivered with large rubber bumpers at each end of the track. As another example, to mount a tape on a drive, the robotic "arm" for that drive moved up to fetch the tape from its input port where the robot had placed it, move back down, and then move the tape through a hole in the ATL to mount it on the drive. However, issuing a command for that move was dangerous--if the arm happened to be extended out of the ATL at the time, it would attempt to move diagonally up toward the input port, crunch against the wall of the ATL, and damage a set of cables running along the top of the arm. Aggravating this situation was the fact that it was impossible to determine the position of the arm from its "sense bytes". The sense bytes included position information, but it was not reliable. I found that the only way to reliably retract the arm into the ATL was to issue a "reset" command to the device, normally recommended only for initializing the device after powering it up. Thus, every "mount" command was preceded by a "reset".
The manufacturer's operator instructions included advice on what to do if one was caught inside the ATL when the robot started to move. (Operators had to enter the ATL often to retrieve tapes dropped by the robot, hopefully before the robot ran over them. Normally, power to the robot was removed when the door was opened, but the interlock was easily over-ridden for maintenance.) The advice was to jump onto the robot as it approached and ride it until it stopped, not to try to out-run it. To possibly help the hapless operator, I always delayed issuing the first robot movement and then issued a command to move the robot to the center of the ATL when it first came up. This was effective because there was an audible click when the power first came on. Our operators never had to ride the robot but, more then once, an operator exited the ATL in a hurry when the power came on. We had a large sign printed which I could put on the ATL's doors, saying "Do Not Enter--Software Development in Progress", when strange robot movements might occur.
The ATL served as a cache for the vault, holding the 2,000 most recently accessed tapes. Antelope maintained an operator display of requested tapes. Operators could put incoming tapes into two input ports or they could mount the tapes directly onto drives. Since tapes contained no external machine-readable identification, like the bar codes used on later devices, Antelope first loaded tapes from the input ports onto drives, and then asked the PDP-10 to identify the tape from the internal tape header. The operators soon learned that, since the tapes would be loaded anyway, they might as well put the tapes directly onto empty drives to begin with. After the PDP-10 identified a tape, it looked in its queue and processed any file-transfer requests waiting for that tape. When the PDP-10 was finished reading or writing a tape, it sent a message to Antelope, which assigned a slot in the ATL and put the tape away. Antelope maintained a small number of empty tape slots for incoming tapes by placing the least-recently-accessed tapes into four output ports to be returned to the vault by the operators.
Calcomp's MVS driver for the ATL specified the drive on which each tape was to be mounted and required the operator to enter a console command for each tape mounted. The MVS code also assigned permanent locations in the ATL for each tape and required the operator to issue console demands to remove tapes. I saw no reason for this complexity. With Antelope, operators could plunk a requested tape onto any drive, which invoked an interrupt to Antelope, which asked the PDP-10 to identify the tape. Because the ATL robot was slower than MASS, Antelope assigned the closest empty slots to tapes being put away. Thus, it was likely that a tape stored in the ATL would be returned to a different slot after an access. During normal operation, the operators monitored the display of requested tapes (an audible sound alerted them to new requests), mounted tapes fetched from the vault, and returned tapes to the vault, all without having to enter console commands.
The ATL was also in production for a decade, from 1979 to 1989. In addition to the flow of new data, all of the remaining Photostore files were recorded onto ATL tapes. Since new data was more likely to be accessed than the Photostore files, we maintained separate queues of files being written to tape. Like MASS, volumes usually contained files from many users. If the physical end of a tape was encountered while writing a short file or the beginning of a long file, the end of the tape was abandoned and the file was re-started on a new tape. Long files could, however, span from one tape to another. All of the tapes were thus almost filled to capacity, with little wasted space.
The Amdahl Storage System
The MASS and ATL systems served LLNL for a decade, storing short-term and long-term files, respectively. However, the demand for archival storage steadily increased, and we were soon looking for more capacity and more performance. By the mid 80s, the PDP-10 systems were becoming long in the tooth. The PDP-10 line of computers was obsolescent, and porting the assembly-coded software to other machines was not practical. It was time to start from scratch, with new hardware and with software written in higher-level languages. In 1985, we went out for bid for a computer system to replace the storage functionality of the PDP-10s.
The most interesting aspect of our search for a new computer system was benchmarking the three finalists: Amdahl, IBM, and Control Data. I and two of our Unix programmers, Mark Gary and Rich Ruef, traveled to the three vendors to test the proposed systems. We brought with us a set of benchmark routines, written in C and based on algorithms that we would use in the new archival system. The vendors wanted advance copies of the tests, of course, but we declined, thinking that the process required to get them running on the proposed computers would be of interest. The benchmark routines ran without difficulty on Amdahl's Unix system. Toward the end of the day, we requested that they power down the system, and then bring it up again, so that we could see what we would face after Livermore's not infrequent power failures. The Amdahl people were perplexed and probably a little worried--no prospective customer had ever made such a request, and they didn't know where the circuit breakers for the test machine were located. They ultimately decided to press the emergency power-off button on the console. With a little trepidation and a bit of humor, with the potential customers looking over his shoulder, an engineer pushed the button and the room went quiet. Then we watched them bring the system back up. After seven minutes, we could log into the machine and, after ten minutes, we were running the benchmarks again. The power-up procedures seemed straightforward.
IBM was next, having proposed an MVS system. We still had little experience with MVS, but LLNL had installed an MVS system to run some high-speed printers, so we figured that we could deal with it. The benchmark codes ran well after a number of JCL (Job Control Language) instructions were modified. My two Unix experts, not long out of college, got a big kick out the JCL error messages, which talked about errors in "column" such-and-such of a JCL "card". The IBM engineers took a great deal of ribbing about "columns" and "cards". They were also good sports about our comments concerning IBM's "clean-office" policy and the requirement that they wear suits and ties until 4:00 PM, even while stringing cables under the machine room floor. (Being prospective customers, they couldn't require us to dress up, and we took full advantage of the fact.) You could set your watch when the ties came off at 4:00.
We were unable to run a couple of tests successfully and we never determined why some of our disk seeks took fifteen seconds to complete in IBM's C environment. Nevertheless, the benchmark went fairly well until we asked for the power-failure test. Again, the solution was to hit the emergency power-off button. This time, a disk unit failed and we had to wait a few hours for a replacement. Bringing up MVS took something over an hour, with screen after screen of apparent mumbo-jumbo, but the IBM people were quite proficient with the JCL cards and columns, and we figured that it was something we could learn.
Our last stop was at Control Data in Minneapolis. CDC had proposed their relatively new operating system, the name of which escapes me, that provided a C environment. This time, the benchmarks failed to run. Mark and Rich dove into their debugging mode. With the C code doing very strange things, they analyzed the output of the C compiler (this was the first time they had seen the CDC instruction set) and found that the assembly output of the compiler was simply wrong. Many phone calls to CDC experts ensued, including calls to the compiler group in Canada. CDC's recommendation was classic: "We suggest that you tweak the source code and try it again." So, we "tweaked" the C code and, sure enough, it worked. After several such tweaks, we were able to complete most of the benchmarks. We couldn't get the rest to work. We thanked our lucky stars that we had not supplied the tests in advance. Had CDC been able to "tweak" the compiler and the system to successfully run our tests, we would not have seen the true state of the system software.
The power-down test was last, with CDC also determining that the only known way to bring the system down was to hit the emergency-off button. This time, a printer, a system disk, and one other device failed. While the CDC folks struggled throughout the night to get the system up, the three of us practiced tossing write-protect rings from the magnetic tape reels onto the chair-back posts from across the room. Rich got pretty good at it.
Not surprisingly, Amdahl won the benchmark competition and the procurement. We began developing a new archival system, modeled after the NLTSS distributed operating system being developed at the Laboratory and the IEEE Mass Storage Reference Model. The first services to come up on the Amdahl system were the Directory and Bitfile servers. Garret Boer wrote code on the PDP-10 to transfer the Elephant directory structure to the Amdahl Directory Server. The Bitfile Server, managing a large complement of Amdahl disks, replaced the short-term storage function of MASS. In 1988, MASS was powered down for the last time, after the last of its files had expired. Because MASS files had limited lifetimes, we did not have to transfer them as we had transferred the Photostore files.
The Storage Technology Silos
Also during 1988, the Laboratory acquired some of the first Storage Technology "silos". They were installed on the computer room floor where MASS had been. Each of the five silos held 6,000 cartridges. With 200 megabytes on each cartridge, the silo capacity was 1.2 terabytes. The five silos, then, had a capacity 48 times that of the Photostore or MASS.
Like the ATL, the STK devices were designed to be attached to MVS systems. One of the sticking points in the procurement was our obtaining the specifications for the interface to the robotic devices. STK valued the software interface more than the design of the hardware, thinking that anyone could build the robot using parts from the local hardware store. But we, and a non-disclosure agreement, finally convinced STK to give us the interface. They insisted on a lucrative consulting contract, assuming that we would require a lot of their help to develop the software to drive the device (later, they confided to us that they really didn't think we could do it at all). They underestimated the members of the Storage Group. Mark Gary, Jack Kordas, and Loellyn Cassell had the robots humming after just three months of work and one consulting question to STK, the answer to which was that the hardware didn't function as it was described in the documentation. The team's software was superior to the Unix drivers that STK released years later. (I enjoyed demonstrating, particularly to STK personnel, Mark's software's ability to recover after I opened the door to a silo and manually moved cartridges hither and yond amongst the cells.)
The STK robots provided grist for the Storage Group folklore. Operators sometimes had to enter the silos, like the ATL, and preferred not to be run over by the robot (being caught by the STK circular robot was referred to as being "cuisinarted"). Unlike the ATL, the STK silo contained a large red power-off button inside the silo for use by a hapless operator trapped inside. During the acceptance tests of the silos, the question of testing the emergency button was raised. The members of the Storage Group readily volunteered their group leader (me) to test the button. Not wanting to appear cowardly, I entered the silo and allowed the door to be locked and the power applied. Lights came on as the device whined to life. I pushed the button. Nothing happened. Seeing my frantic waving through the small viewing port, the engineers mercifully powered down the silo from the outside. It turned out that the wire going to the panic button was not connected. We rather strongly suggested to STK that the emergency circuit should contain a normally-closed, not a normally-open, switch so that any wiring error would make it appear that the button had been pressed. Thus was born the legend of Cuisinart-Sam.
Each robot "hand" (two per silo) used the image from a television camera to position itself and to read the cartridge bar codes. We connected the cameras to sixteen monitors in the operator's room to monitor the operation of the system. As with the MASS and ATL systems, I found it very helpful to simply sit in the machine room and watch the robots make their rounds--inefficiencies and anomalous situations, not easily identified by the programmer at his desk, were easily spotted in the machine room.
The STK robotic system went on line in 1988. Once again, we transferred all of the remaining ATL archival files to the STK cartridges, including many files that had first been recorded on the Photostore.
The STK robots, drives, and cartridges have been upgraded several times since then and, as far as I know, are still in use at LLNL. The software system developed on the Amdahl machines was licensed to General Atomics and re-named UniTree. The UniTree system, after being bought by several companies in succession, is still being sold as a commercial product.
Dealing with Users
Throughout the years, the relationship between us system folks and the users of the storage system was cordial and productive. We always joked that users were really nice people but we wouldn't want our kids to marry one. The users no doubt felt the same about us. However, we hardly had time to relax after putting a higher-performing system on line before the users saturated it. We talked about and tried a number of ways to limit the use of the storage system to what it could handle. For every limiting algorithm, the users countered with ways to get around it. For example, to maintain space on the local worker disks and on MASS and the Datacells we, and the systems people maintaining the worker systems, implemented code to delete the least-recently used files. The users simply wrote code to access their files artificially, rendering our statistics useless. If we limited the size of files, the users wrote code to break up larger files into chunks smaller than our limit. When users discovered the algorithms used to store files on the various central storage devices, they tuned their accesses so that their files resided on the most desirable device. When the Photostore filled and older chips being accessed had to be fetched from the vault manually, users wrote code to look for files about to be removed from the Photostore, read those files up to the worker machines, and then send them back to central storage to be put onto fresh chips. This, of course, accelerated the transfer of older chips to the vault. Later attempts to automatically archive worker files were similarly abused. The program that implemented all of these shenanigans on the worker machines was called CYA, for "cover your backside". For many years, CYA was the most frequently run program at the computer center. I wouldn't be surprised if it still is.
We also advocated charging for storage accesses, using an economic model that would allow users to trade CPU cycles for storage, or vice versa, and we advocated implementing quotas to regulate the use of central storage. The response was always that the larger users had plenty of money and clout to render the limits useless, and that such policies would only hurt the smaller users.
Garret Boer summed up the situation perfectly:
- "Any attempt to curb user abuse will only result in more expensive abuse."
The Not-Invented-Here Syndrome
All of the systems described here were developed by the Computation Department at LLNL. Similar systems were developed at the other DOE weapon labs, NERSC, NASA, NCAR, and elsewhere. Representatives from these organizations tried valiantly, but unsuccessfully, to collaborate. Every organization concluded that it was preferable to go it alone. It was our aversion to MVS that discouraged us from adopting the Los Alamos CFS system, and it was probably LANL's aversion to Unix that prevented them from helping us with UniTree. I had hoped that the development of the IEEE Mass Storage Reference Model might draw us closer. I started the IEEE Storage System Standards Working Group in 1990, hoping that developing standards would increase the synergy amongst the organizations. I also suggested what became known as the National Storage Laboratory (NSL) to test the new notion of network-attached storage. The NSL was successful to a degree. The effort was managed by IBM, located at NERSC, and used the General Atomics UniTree software. Nevertheless, when NERSC needed a new archival system, they installed CFS. In hindsight, I believe that, with effectively infinite tax money to spend, it was simply too much fun writing stuff from scratch, and rivalry between the organizations precluded cooperation. These problems were finally overcome when the NSL work evolved into the High Performance Storage System (HPSS) project. The DOE labs and IBM finally began to cooperate, although it took a tremendous amount of work, spearheaded by Dick Watson, to maintain the collaborative effort. That work, still managed by IBM, has been highly successful and is ongoing today.
 A reluctance to move to Anaheim is an attitude that any Boulder resident can understand. Dale Zalewski, who worked for Xytex in Boulder, recalls offering to move if CalComp included ten acres of land within a ten-minute drive to work. CalComp declined.
Dale also provided this anecdote: "Xytex was founded by a group of renegade engineers from IBM. While at IBM, they presented an idea for an automated tape library. IBM rejected the idea. About two years, after the Xytex tape library made its debut, IBM announced its Model 3850 mass storage system, which used IBM's proprietary data cartridge. We regarded IBM's naming their product the Model 3850 as a left-handed salute to us--the street address of Xytex was 3850 Frontier Avenue, Boulder, Colorado!" (It was similarly less than coincidence that Control Data assigned the model number 38500 to the Cartridge Store.)
For information about this page, contact us at: firstname.lastname@example.org