This document provides a brief overview of a little known but
very rich thread of computing history at Lawrence Livermore
Laboratory (now LLNL), that of "Capability Computing." It focuses largely on the
development of two operating systems, RATS (the
RISOS ARPAnet Terminal System) [RATS75-2] and NLTSS (Network LTSS)
[NLTSS79-7]. However,
it also touches on some other historical threads, some network
protocol development (most LINCS - the Livermore Interactive Network
Communication System), and provides some basic background on the
technology involved.
Since this document focuses on
"Capability" computing, and since
(sadly) the vast majority of information technology (IT) workers and even
the majority of computer scientists are unfamiliar with what the
term "Capability" means in the context of access control in computing systems, I'll
define it here. A "Capability" is a computer representation of the
authority to access a computer resource. A capability both identifies a
resource and defines some authority for accessing
the resource. For most people in the IT area
the simplest example to think of as a resource is a file, though
it's important that the capability concept is generalized to
be the authority to access any resource, e.g. a process, directory, access
to a network port, the authority to print to a printer, the authority
to write to a screen or read from a keyboard, etc., etc.
For the case of a file, one capability might grant the authority
to read the file and another capability might grant the authority
to both read and write the file.
For most information technology people this concept is unfamiliar
because it doesn't appear in most operating systems.
Most operating systems and all of today's commercially significant
systems (systems like
Unix, Windows, VMS, TOPS-10 and Tenex/TOPS-20, OS-360,
etc., etc.) use an "ambient authority" model for granting
authorities to running processes. In such ambient authority
systems when a program runs in a process,
the process almost always has the authorities of a "user" - that
is the authorities of a person who has access to the system (an
account on the system). In such systems programs run
in some sense as a surrogate of the user
with all the authorities the user has. If the process runs
as user Jed then it has user Jed's authorities (access to user
Jed's files, directories, network access, etc.). If the
process runs as user Root or user Administrator then
the program has all authorities available on the
system.
Capability based operating systems are quite different.
In a capablity based operating system processes only
run with the authorities they are explicitly granted - as
capabilities. For example, if a person wants to edit
a file they run an editor program. In a traditional
operating system the editor program has access to all the user's
files. When the user commands the editor to begin
editing a specific file, the editor can open the
file because, in its position as proxy for the user, it
has the authority to open any of the user's files. In a capability based system
when the editor runs it only has access to the file
and the keyboard and display "capabilities". It doesn't
run as the user, but on behalf of the user with more
limited authorities.
Why is this distinction important? The easiest way to
understand the importance of the difference between having
processes run as a user and having them run with a specific
set of capabilities (authorities) is to consider the concerns
about computer viruses. It is fairly common for people
to receive programs either from email sources or pick
them up off the Web. A program might be a computer game,
or a funny interactive Christmas card. It might be
a video rendering or graphical display program. It
might help people sort/manage databases of pictures
or audio/music files, etc., etc..
When a person runs such a program in a traditional
operating system (e.g. Unix or Windows), the program
runs with all the authorities of the user. It might actually
be a "Trojan horse" and do more than was intended. It
might destroy all the users files or look for sensitive
information and send it out on the Internet or any
number of things that it has the authority to do by virtue
of its authorities running as the proxy of the user.
In a capability based operating system any program
(e.g. one picked up in email or off the Web) can run
in a process with just the authorities it needs to do what
is requested by the user. This ability to restrict
authorities to what is needed is commonly referred to as
the Principle Of Least Authority, POLA. With programs
running under POLA the damage that they might be able
to do if they are untrustworthy is limited.
While the case of process initialization is important,
the most general case that capability based operating
systems deal with is that of communication between
mutually suspicious processes. The way this works is
commonly described these days with an ABC or Alice, Bob,
and Carol diagram like that on shown in the
diagram on this Web page. Alice and Bob are processes
that are mutually suspicious. Alice has an authority to a resource
managed by Carol. Alice wishes to send a message to Bob
(e.g. make a request of Bob) that grants Bob the authority
to the resource managed by Carol that Alice had. For
example, Carol could be a file server and Bob could be
a remote editor. Alice sends her "capability" to the
file resource serviced by Carol to Bob to make use
of Bob's editing service.
A common metaphor used to clarify the capability concept
is to compare a capability to a physical key. If I want to have
my car serviced, I take it to a repair shop and give
the service person the key to the car. This gives the
service person the authority to operate my car. When I
ask for such a service I don't give the service person
all my authorities as a person, e.g. my house key, keys to
my safe deposit boxes, bank pins and credit cards, etc.
Yet this is what traditional operating systems (all of
today's commmercially significant operating systems)
demand whenever a user requests a service. Capability
based operating systems not only allow compartmentalized
(POLA) running of applications, but also allow most parts
of the system, all but the base capability communication
mechanism, to be broken into domains of mutual suspicion.
This naturally forms what's been referred to as a
micro kernel architecture. With a capability based system
programming is object oriented at the interfaces between
processes - domains with separate authorities.
With that introduction it's time to pick up the historical
thread of capability computing and how it passed through
LLNL. In 1965 a paper was published
by Jack Dennis and Earl Van Horn titled, "Programming
Semantics for Multiprogrammed Computations" [1]. That paper is typically considered
the seminal paper describing capabilities, capability
computing, and capability communication. It describes
what is now considered a traditional capability-list
operating system where each process has an associated
protected list of descriptors that constitute its authorities
referred to as its capability list or c-list.
After that paper was written an operating system was
designed and implemented following the design principles
in that paper for a PDP-1 computer at the Massachusetts
Institute of Technology. One of the many people who
did programming work on that system was Charles
Landau. Charlie and I both came to Livermore
Laboratory in 1972 and joined an ARPA funded project
called RISOS, Research Into the Security of Operating
Systems.
The RISOS work is discussed some in my interview and elsewhere, e.g. [4]. Basically the idea of the RISOS
project was to find security problems in operating systems and in doing
so improve OS security. One aspect of this work was that RISOS had a
computer to use for communicating with other computers. The computer that
was purchased was a PDP-11/45. One thing a bit unusual about that system
was that it was installed in a large trailer with the idea that it could
be trucked into a secure installation and connected to systems in such
a physically secured facility in order to use the RISOS computer for
testing. In addition to supporting telephone
dial out/in and communication, this system needed to be connectecd to the
early ARPA network. Charlie implemented the operating system for this
computer, RATS [2]. It was natural that in doing so he
would implement a system that was capability based and similar to the system
on the PDP-1 at MIT. Beyond being of value in its role in communicating
with other systems there was the hope that this system could be an
illustrative example of secure system design and implementation.
I was the technical liaison to the
ARPA network community for our ARPA network node. In that role I
was involved in the development of the early ARPA network protocols
such as Telnet (for remote connection of interactive logon sessions),
the File Transfer Protocol (FTP) and others. It also happened that I
was involved in the security analysis of the Tenex operating system
developed by Bolt Beranek and Newman (BBN). BBN was the prime
contractor to ARPA for the development of the ARPA network
interface nodes called Interface Message Processors or IMPs.
The Tenex system had the most advanced network support
in that time period, though Tenex fell into the category
of a traditional "ambient authority" operating system (all
user processes ran as a proxy for a human user with all
that person's authorities).
One project that BBN worked on for Tenex was called the
Resource Sharing Executive or RSExec [5].
The idea of the RSExec was to allow resources available on
one Tenex system to be utilized by other Tenex systems
across the ARPA network. My exposure to the ARPA network
resource sharing protocols like Telent, FTP, and RSExec
combined with my capability experience with RATS started
me thinking about a capability sharing protocol. The
advantage that I saw to such a mechanism is that one could
develop a protocol to share a generic resource as
represented by a capability and then have that one protocol
suffice to share all resources on a system since all
such resources are accessed via capability. Doing so would
eliminate the need for developing or supporting resource specific
protocols like Telent, FTP, and all the mechanisms in RSExec.
When I started working on what eventually became the
Distributed Capability Computing System [6]
it was one of the few times in my life when I experienced
what might be referred to as a "brain storm". That was
a situation where the more I worked on the DCCS protocol
and implementation design the more it seemed the obviously
"right" basis for network resource sharing. I remember
spending a long sleepless night where I put togther all the
basic mechanisms of the DCCS e.g. as pictured in this
diagram from that period:
The basic idea of the DCCS is to extend what has more
recently been referred to as a transparent proxying mechanism
(creating a new capability that works just like an existing
capability but is supported by another "server" process)
across a network. When it comes time to pass a capability
from one process to another process on another system on
the network, the extension mechanism saves the capability
being passed locally and forwards identifying information
to the remote system. The remote system then emulates the
capability by accepting any requests on the capability
and bundling the request up to send across the network
to the system that kept the real capability in a service
list just for such requests. There are some interesting
issues that must be solved as authorities to resources (network
capabilities) are passed around between systems on the
network [5], but it all seems to work
out well.
After the DCCS mechanism was designed and published naturally we
considered implementing it using RATS as the base capability
system and the ARPA network as the communication infrastructure.
There was even an ARPA network RFC written and registered to
this end [6]. Unfortunately when we looked
more carefully at the base RATS capabilities we found that one
fundamentally important capability, the file capability,
could not be transparently proxied and therefore could not
be extended across the network using the DCCS mechanism.
This was unfortunate in some respects, but it clarified for me
how important the ability to do transparent proxying is
and prompted me to consider network extension as fundamental
to any capability authorities communication mechanism, what I
now refer to as "network discipline."
The RISOS work including the RATS and DCCS development was done
in relative isolation at the Livermore Laboratory. Independent
of the RISOS work a rich tradition and technical base of
computer network (Octopus) and operating system (LTSS)
development had been ongoing at LLL since at least the
middle 1960s. By the time he RATS/DCCS work was winding
down in the middle 1970s there was some thought that Livermore
could benefit by an update to the operating systems and
networking protocols that had been developed to that time.
An early effort to update the Livermore operating systems
was an OS design effort lead by Charlie Landau. Not surprisingly
the programmer's manual that he produced for this MUTT
(LTSS + (1,1,1,1)) system looked a lot like that for the RATS
system. However, about that time Charlie left Livermore
for Tymshare. There he and Norm Hardy and Bill Frantz
and others continued what might be considered a fork of this
capability development thread and developed the GNOSIS
and later the KeyKOS operating systems that were descriptor
based capability systems along the lines of RATS. Perhaps
the history of those developments will be described elsewhere.
At Livermore the idea of revamping the operating system and
networking protocols emerged again in about 1978. At that
time I had just finished some local area networking research
[7] and found myself involved in these
discussions.
After some preliminary discussions it fell out that
John Fletcher, a fairly newly hired Dick Watson,
and I ended up with the lead design roles for
this effort.
One aspect that I remember most about these early design
discussions was the clash between John Fletcher and I over
the capabilities as descriptors model. I think it relevant
to note that there was an independent "capability"
thread that had been injected into Livermore some 10 years
earlier. John Fletcher had read some of the early Multics
[9] papers out of MIT (also Project MAC
like the Dennis and Van Horn paper) and had picked up, adopted,
and implemented a descriptor based directed graph directory
structure for the Elephant storage system at Livermore.
I still consider this architecture to have the most
effective sharing mechanism that I've seen. To simplify
a bit, each user had a "home" directory and a "Take"
directory for receiving objects (files and directories
in the Elephant system) from others. Every user also had
access to what amounted to a large directory containing
insert only access to the "Take" directories of all users.
With this scheme any user could create a new "group"
directory for the purpose of sharing with others. By
placing the new directory into the "Take" directories
of a group of other users (e.g. place a "group project"
directory into the "Take" directories of Alice, Bob,
and Carol) then anybody given the directory could
share any of its contents (e.g. shared access to
files or other directories). One aspect of this
system is that anybody that has access to such a shared
directory can give it to anybody else - effectively
expanding the group. This is very unlike the group
mechanisms in systems like Unix or Windows or other
"third generation" operating systems where administrative
(root) intervention is needed to create and support
resource sharing groups.
The debate that John Fetcher and I had would today be
described as a debate over whether to implement capabilities
(resource authorities) as system protected descriptors
(think Unix file descriptor) or as data (more like
a password, but with resource identifying and server
address information encoded in the data). John argued
persuasively that even if one implemented a descriptor
based system for one computer, to share resources on
the network that it was attached to (the network was
the computer after all) required a mechanism (e.g. like
the DCCS) for essentially flattening the capabilities
into some sort of a "data" form. If such a mechanism
could work for the network, why not for local capabilities
within a single computer system using an inter process
communication mechanism?
We were not alone in having such a debate. There were a
number of systems developed that used each type of
capability. Capability as descriptor systems include
the Dennis and Van Horn system, RATS, CMU Hydra, GNOSIS
and KeyKOS, the more recent EROS, the LANL developed
Demos system, CalTSS, Mach and others. This represents
a rich heritage that is still ongoing to some extent
today. Capability as data systems include NLTSS,
Amoeba, and the Australian "Monash" systems and one
or two others. Not quite as rich a tradition, but still
with significant representation. Developments along
these lines also continue today in mechanisms like
YURLs and others.
In some respects in looking back I feel that I failed
to make the case effectively for capabilities as
descriptors. One aspect of least authority that
capabilities as data don't capture is the authority
to communicate. That is, if a process has a capability
to some resource (e.g. a file) then to exercise that
authority it must be able to communicate to the file
server. Following POLA reasoning, a process should only
be able to communicate to processes that service
capabilities that the subject process owns. This
makes the capability system essentially support what
today one might call a local firewall mechanism.
However, with capabilities the restrictions on
communication are much more natural and direct because
they evolve out of the need to exercise the authority
to access resources that are explicitly granted under
the POLA policy.
While both John Fletcher and I understood this sort
of restriction on communication from the descriptor
based capability systems, such restrictions seemed
to fly in the face of the nominal opportunity to
communicate anywhere in computer networks. If
computers on a network could communicate anywhere
on the network, what sense did it make to restrict
processes (virtual processors) within our operating
system to only be able to communicate where they had
explicit authorities to communicate? Today I think it
is much easier to see the value in naturally restricting
communication by processes within an operating system
to POLA access just as their resource access is
restricted.
In some ways it was lucky that John Fletcher won
this argument in that it greatly simplified the
design of the NLTSS [10] system that we developed at
LLNL. In fact it was that
simplification that partly sold the pure message
passing model to me. NLTSS had only one system
call - a call that said "communicate" where there
could be any number of send and receive buffers
specified in the call. This simplicity at the lowest
level of the system made possible some quite elegant
mechanisms at higher levels (e.g. the separation of
the transport and application level services, the
support for mixing multiprocessing with thread
services that uniquely in NLTSS allowed blocking
at the thread level - not the process level -
for I/O, and other aspects of the system).
I now believe that this base simplification could
have been made even if the underlying message
passing kernel knew just enough about capabilities
to identify their network addresses and to validate
the authority to communicate that they carried.
Still, in those early days we didn't know this and
trying to incorporate such a mechanism into NTLSS at that time
would have effectively blocked development. It
wasn't until we later developed a public key model
of capability as data protection
mechanism [8] that I believe such authority to communicate
could have effectively been bundled with our
capabilities as data that could be easily moved
across the network.
Still, the weakness of the NLTSS system from a
user perspective had nothing to do with any of
these underlying issues. The problem that it had
was that the only user libraries that it had available
and therefore must support were the libraries developed
around the earlier LTSS system. Ultimately NLTSS,
which had a much richer and more capable base
resource model, ended up having to effectively emulate
the older LTSS system - even to the extent of having
processes run with the authorities of users (ambient
authority). A glaring example of this restriction
is that of the process model for the two systems.
In the NLTSS system processes were simply resources
protected with capabilities. Such capabilities
could be stored in directories, communicated in
messages, etc. However, the only user libraries
supported at LLNL at the time were built around
a simple "controller - controllee" model of processes
where there was a linear chain of processes (not
even as rich as the Unix tree structure for processes)
that could send signals up and down the chain in a rather
complex way. Naturally an NLTSS emulation of this
structure was no more capabable than the LTSS base
implementation of the mechanism and it was
necessarily somewhat less effecient. The fact that
a system like NLTSS could share its resources (e.g.
processes, directories, files, etc.) more flexibly and even over a network
was not available in the older LTSS system and so
didn't show up in the support libraries, making these
facilities irrelevant for users. NLTSS ended up looking
to users like a somewhat inefficiently implemented
port of the older LTSS system.
This aspect of adapting to a set of existing user
libraries might be referred to as the "tyranny of
the API" and is seen quite clearly in the dominance
of the Unix and Windows APIs and even in their relative
similarities today. It's effectively impossible to
introduce explicit resource sharing (POLA) into these
APIs, so one instead sees awkward add-ons such as
the mechanisms of SELinux (who knows what Microsoft
will come up with) to try to address POLA
concerns. Backward compatibility is the key, but
it also makes any innovative POLA resource sharing
mechanisms nearly impossible to introduce. I personally
believe that the only route to acceptance for such
mechanisms is to introduce them in an area (e.g. such
as network resource sharing) outside the realm of
the typical OS API and then have libraries gradually
develop to incorporate these more flexible and
powerful (e.g. for protection/security) facilities.
It was because of this "tyranny of the API" that the
capability thread at LLNL was effectively submerged
and disappeared when NLTSS (and LTSS) were displaced by Unix.
[1] Jack B. Dennis, Earl C. Van Horn, "Programming Semantics for Multiprogrammed Computations,"
Association For Computing Machinery Conference on Programing Languages and Pragmatics, San Dimas, California, August 1965.
[2] C. R. Landau, "The RATS Operating System,"
Lawrence Livermore National Laboratory, Report UCRL-77378 (1975).
[3] J. E. Donnelley, "DCAS" - A Distributed Capability
Access System, Lawrence Livermore Laboratory Report UCID-16903, August 1975.
[4] R. P. Abbott, J. E. Donnelley, et. al., Security
Analysis and Enhancement of Computer Operating Systems, National Bureau of Standards Report NBSIR 76-1041, April 1976.
[5] B.P. Cosell, P.R. Johnson, et. al., "An Operating System
for Computer Resource Sharing*",
Proceedings of the Fifth Symposium on Operating System Principles, November 19-21, 1975.
Also available on-line from the ACM: ACM.
[6] J. E. Donnelley, A Distributed Capability Computing
System, Proceedings of the Third International Conference on Computer
Communication, August 1976, pp. 432-440. Also available at: http://www.webstart.com/jed/papers/DCCS/.
[9] Daley, R. C., and J. B. Dennis, Virtual memory, processes, and sharing in Multics, Commun. ACM 11, 306-312, May 1968.
[10] An amusing aspect of the NLTSS saga is the
name - Network LTSS, or Network Livermore Time Sharing System.
This name started being used on a temporary basis while
early design was underway with the thought that we would
chose a more permanent name if it looked like the system
was going to be coded. When it became clear that we were going
to put the system into production we had a meeting
(the whole LINCS/NLTSS design was very meeting heavy
with infamous "Friday Afternoon meeting"s that were knock
down, drag out affairs) and chose the name "LINOS" for
LINCS Operating System. LINCS was itself an acronym
for LIvermore Network Communications Systems - the protocol
set for the Livermore network and NLTSS. Unfortunately,
by the time we got around to choosing this name, some budget
documents had already found their way to Washington with
the "NLTSS" name on them. We were told that it was too
late to change the name.