Preserving Software:
Why and How
John G. Zabolitzky
ICOS Vision Systems GmbH
Date
published: 13 September 2002
Information
technology is the largest industry worldwide today, and is
slated to be even more dominant in the future. The roots of
this industry are hardware and software developments over
the past 50 years. Future generations will find it hard to
understand if these roots are not preserved today.
While a lot of hardware artifacts have been well preserved,
I strongly suspect that some past software artifacts have
already been destroyed irrecoverably. (Where are the early
UNIVAC operating systems? The first FORTRAN/COBOL/ ALGOL compilers?)
There is a grave danger that in the near future large bodies
of historically relevant software will be destroyed. Now is
the time, therefore, to take action and preserve these objects
for future study.
Why
It has been argued that no working hardware of bygone times
is available or will be available, and therefore the preservation
of software does not make any sense. This statement is incorrect
in every respect, as I will show.
First, it is simply not true that no historic working hardware
is available. There are several examples, even of the rather
fragile type:
- IBM 650 at IBM Haus zur Geschichte der Datenverarbeitung,
Sindelfingen, Germany, and several later IBM machines at
the same location.
- IBM 700 series machines in the collection of Paul Pierce,
in Oregon, USA.
- Control Data and Cray machines in my own workshop (see
www.cray-cyber.org).
[end of page 1]
- Hundreds of early Digital Equipment machines in the possession
of smaller organizations and individual collectors (true
for other minicomputer manufacturers as well).
- Thousands of first-generation personal computers in various
private collections and at smaller organizations.
- The Computer History Museum at Moffet Field, Mountain
View, California, is working on restoring a number of systems;
there is no doubt about their future success.
- There are also commercial operations known to operate
20+ year old equipment, because it is in some cases cheaper
than rewriting the software.
Conspicuously, the large, renowned public museums of the
world are absent from this list. This is simply a matter of
allocation of effort, financially as well as creatively. Restoration
and continued operation of historic computing equipment requires
special knowledge and skills unique to this task. In all of
the above cases volunteers in possession of such knowledge
and skills have done the actual work, and are currently training
younger interested people so that knowledge and skills will
be preserved. It is nothing but a matter of decision making
to establish similar operations at any of the large institutions;
with their clout they will easily attract volunteers to do
the work, if it is only well publicized. The present state,
however, is a more distributed effort by specialists, see
list above.
I must conclude that hardware will be preserved and operational
in the futureif not at the large institutions then at
distributed and specialized locations. It will be possible
to run historic software on these machines. The meaningful
operation of these machines will require the possession of
an operating system, programming system, and application software,
at least in binary object code format.
Second, the task of hardware preservation is significantly
simplified if operating system software is available in source
code format.
The major complication for preserving, restoring and operating
historic computers results from the moving parts subject to
mechanical wearNOT from the electronic parts (with perhaps
the exception of short-lived vacuum tubes). The encapsulated
semiconductors, passive components, circuit boards, connectors,
and wiring making up the processing units are essentially
free of natural degradation, assuming that the atmosphere
is non-corrosive. Stocking of spare parts is possible; in
many cases, parts of modern manufacture may be used.
The problems arise from magnetic disks, tapes, mechanical
card/tape readers/punches, and generally all peripheral subsystems
with mechanical parts. Magnetic surfaces, positioning mechanisms,
transport mechanisms are susceptible to degradation due to
normal operation. However, the processing of informationwhich
is the core of preserving a historical operationmay
be kept up by substituting modern peripherals for the historic
ones. [end of page 2] For example, a historic disk controller
and disk drive could be replaced by a specially designed new
disk controller and modern hard disk drives. This has been
done in some cases, and is a limited volume of work. This
can be done one hundred percent software compat-ible with the original
disk subsystem, and moves disk reliability up to modern standards,
very significantly reducing the maintenance effort on the
historic machine. Since the historic drives still may be powered
up and will provide the historic look-feel-and-sound, this
replacement is neutral to the observer.
This kind of operational replacement is simplified if it
is not transparent to the operating system software. If that
software is accessible in source code format, and not only
in binary object code format, it could be modified for the
convenience of the new hardware add-on designer.
Logical operation, i.e., information processing on essentially
historic machines, will be possible on an even larger scale
than for the case of pure historic machine configurations.
There is a mutual reinforcement: The more software that is
available, the more hardware can be kept running, and the
more hardware that is running, the more software is worthwhile
to preserve.
This route has in fact been followed at some commercial installations.
Some companies have chosen to keep their software investment
valid, avoiding the writing of new software for an existing
application by upgrading old computers with new hardware.
Third, simulators for a number of machines have been built,
and can be and will be built in the future. Historic software
can be run on such simulators. On the user level, assuming
some dialog on some terminal equipment, there need not be
any perceptible difference between simulated operation and
software running on historic hardware.
This method of operation can be carried further: a simulator
can be interfaced to any piece of historical hardware desired
to be revitalized. For example, the operator's console of
a historic machine could be physically interfaced to the simulator.
The same is true for any peripheral. Simulators will run the
binary object code. However, interfacing to newer peripherals
will be significantly simplified if the source code is available.
Anything from pure simulation to full complement of historic
peripheral operation can be realized with the use of simulators.
The experience of the original user sitting at a terminal
or console can be one hundred percent recreated in this way, provided
the software is available.
This approach is in fact similar to and related to emulation
of the IBM 1401 on the more modern IBM System/360s and beyond,
which has been, and maybe still is, used widely. [end of page
3]
Fourth, the evolution of software methods, techniques, styles,
etc., is described in many books and articles. However, all
of that is essentially hearsay: what actually has been
done (and what may be different from what the active players
in this area may report since they might have wished to do
something different) can only be discerned and proven by examining
the source code. The source code of any piece of software
is the only original, the only artifact containing the full
information. Everything else is an inferior copy.
The binary object code format of software is of course much
more abundant than the source code format. Any installation
of any computer system requires the possession of the binary
object code for the operating system, the programming system,
and the applications. Only the manu-facturers of the software,
be it the computer manufacturer or some software company,
will hold the source code format. Unfortunately, the process
of generating the object from the source obliterates an enormous
amount of information present in the source code. How was
this actually accomplished? What methods and techniques were
used? Why could Company X be so many times more efficient
than Company Y? These questions can reliably be answered only
if one is in possession of the source code, the only proof
of what actually was done.
There can be no doubt that the distributed community of smaller
organizations and private collectors is in collective possession
of a very large amount of object code. However, source code
of sensitive and historically important software cannot be
expected to be found there (e.g., the IBM System 360 Operating
System and Digital Equipment VAX system software).
The preservation of source code format of historically important
software, which is mostly owned by large corporations, can
be dealt with only by renowned organizations carrying the
clout to impress these companies.
This is important because only the simpler, lower level functionality
of any information processing system is implemented in hardware.
The more complex features making up the functionality as well
as the look-and-feel of such systems are implemented in software.
Preserving such software is the only means of preserving the
experience of twentieth-century information processing for
future generations.
While the object code in conjunction with actual hardware
or simulators can recreate the look-and-feel, insight into
the actual implementation work can be gained only by looking
at the source code. The source code can provide the understanding
of "how did they do that in those days?" Furthermore,
possession of the source code simplifies the task of keeping
old hardware running.
Fifth, we must consider the audience. It has been asked, "Who
will look at past software in the future?" There are
several answers to this question: [end of page 4]
- The IT industry being the largest today and expected to
grow further will force a growing fraction of the population
to be not only computer-literate, but also software-literate.
The number of software developers capable of reading source
code is constantly growing, not only in absolute terms but
also as a percentage of the entire population. Some of these
individuals in the future will want to know how things were
done in the past.
- There is a lot of arbitrariness in software; most things
can be done in many different ways. Future historians might
want to follow the evolutionary paths of software methods
or techniques: Who has used what technique at which point
in time? Who was influenced by what? The source code is
the only proof.
- Large and small organizations as well as private individuals
operate historic hardware or simulators. It is a growing
community.
- How can we know what kind of questions future generations
will ask? We should provide the basic artifact such that
these people will be able to find out whatever they may
want.
Finally, why should software be preserved in object code
as well as in source code format? This is the only way to
allow future generations access to two dimensions of past
and current information processing:
- The sensual experience of walking into a mainframe computer
room, the combination of air movement and smell, vibrations
through acoustics as well as the floor, noise of working
peripherals, and optical perception of console lights and
screens; only with software can computers be shown with
lively personalities as opposed to dead corpses.
- The intellectual study of software evolution as documented
within the source code as the only original artifact of
what actually was done.
How
There is a concern that preservation of software is technically
difficult because of the deterioration of media. In fact,
the opposite is true. If conceived of in the proper way preservation
of software is a rather simple taskgetting hold of the
software is the difficult part.
The important notion is that software is not a physical artifact.
Software consists of bits, and not of atoms. While it may
be nice to preserve actual media like half-inch tapes, floppy
disks, etc., this is by no means central. The preservation
of media is a task in its own, and is entirely independent
from the preservation of software. As is well known, media
in fact need regular copying in order to keep the information
on them validbe it software or anything else. This requires
operation of the appropriate copying hardware. Obviously,
wherever actual historic hardware is in operation, the operators
of that hardware will see to the copying of their media in
the normal way of making backups, in the interest of continued
operation. [end of page 5] The preservation of information
on historic media is therefore to be viewed as a small and
necessary part of operating the corresponding drives.
Preservation of software is identical to preservation of
information. The physical hardware used and the media used
are quite irrelevant. Conceive of a piece of software simply
as some computer file, to be stored on any computer of choice
for convenience. The "software archive" now is a
directory tree on some computer. This computer at the same
time is connected within the preserving organizations computer
network, like other file servers, and all (or most) of the
organizations' computers. The normal, everyday file backup
procedures of that organization then may be extended to the
"software archive" as wellwhoever is doing
the backup of the other data of the organization, e.g., payroll
data, databases (archives of physical artifacts) etc., without
much additional work can do that backup as well. The historic
software therefore lives on standard, normal, modern hard
disk drives of some standard file server computer, and is
regularly backed up to tape, using the normal operating procedures
of the organization's IT department.
In the course of time file servers will be modernized. In
the same way as other information was easily moved from one
generation of equipment to the next generation, it can safely
be assumed that the same will be true in the future. When
a new generation of file servers is installed, all infor-mation
needs to be transferredincluding the historic software
archive. When backup media is changed in the future, no problem
will arise. In the same way that the lives of current data
files of any organization are essen-tially infinite, this
will also be true for the software archive. If at some future
point in time a totally different computer architecture grows
to reign, it is perfectly safe to assume that the proper data
migration tools will be providedsimply because everyone
on the planet will need them. Therefore, once we have the
software archive within a standard file system, the task of
preservation is solved.
In this way it is also trivial to make the software publicly
accessible. Where allowed, the file server could be the Internet
FTP server of the organization. The directory structure serves
as the catalog. Of course, additional cataloging would be
required to facilitate search operations.
The only remaining point is to transfer the software from
whatever media it may be in, to some standard file format
on current file server systems. There are a number of topics
to be discussed here:
- physical procedure
- data formats for binary objects as well as source code
formats
- directory structures and aggregation [end of page 6]
1. Physical procedure. The software to be preserved will
come on some more or less standard or outmoded media. For
all of these, however, current commercial solutions exist
to transfer the data (bitstrings representing the software)
onto current, modern media.
Examples:
Half-inch tape. There exist half-inch magnetic tape
drives which can be attached to PC-type computers. A software
archiving station would consist of such a tape drive, attached
to some standard PC, networked to the software archive file
server. In the case of older tapes, which may have deteri-orated,
some expertise and tools for data reconstruction may be
required. Commercial services also exist which will do the
entire data transfer.
Floppy disks. Rather old floppy disk drives may
still be attached to modern PC-type computers; there exist
universal reading software products which will read non-Microsoft
disk formats. More convenient may be reading the disks on
their native computers, assuming some such can be found,
and transferring the data via common exchange media, or
some communication (RS-232) or networking method.
Disk packs. These would require an operational computer
and data communications or exchange media. Most machines
with disk packs also would allow copying to half-inch tape,
which then would be processed as outlined above.
Other magnetic tape formats can be handled in much the
same way.
2. Data formats. This item requires a little more consideration
since it is related to the anticipated usage of the data,
i.e. binary or source software.
The binary would be useful only for running on historic hardware,
or else on a simulator. Therefore, just mapping tape blocks
or disk blocks to a modern file format would do. The extraction
of the data to whatever format the future user might want
it to be in would be the inverse of one of the methods above.
For source code software two different potential usages exist:
human reading for investigations into software methods, etc.,
or compilation (possibly after modification) on historic hardware
or a simulator. For both applications program source code
format of the file server system would be appropriate. This
might require some small data extraction program within section
(1) above. For the second application, alternatively tape
or disk blocks of the target (historic) system would be useful
as well. Since both formats carry all the information in principle,
they can be transformed into each other by a suitable reformatting
program. [end of page 7] It is a mere matter of convenience
if both formats or only one of the two is offered on the file
server.
3. Directory structures and aggregation. It should be fairly
simple to decide on a case-by-case basis, probably starting
with some manu-facturer at the top level ("IBM",
"CDC", "MS", etc.), descending into machine
families where appropriate, from there descending into the
maybe several operating and programming systems and applications.
None of these tasks is new. The entire set of procedures
has been carried out repeatedly, on smaller and on larger
scales, by private collectors and small institutions. The
expertise exists and can be tapped when and where required.
What To Do
URGENT: Because of lack of concern, motivated by tightness
of space, corporations will destroy and have destroyed in
the past important source code programs. While object code
may still exist in some places, this is not the original as
explained above. The prevention of further destruction should
be a primary and urgent goal. This primary goal should be
the acquisition of source code format software from the relevant
owners. The first step might be the creation of a list of
"national heritage software," which one may then
seek to acquire.
Passive collection, as opposed to the above active strategy,
will most probably result in a set of software which has only
little relevance. Important pieces will be missing, and pieces
of lesser importance will fill up space.
In order to make the best use of it, the following items
optimally should be found in conjunction with a given piece
of software, and all cataloged with reference to the same
short-hand title:
- source code in human readable format
- source code in target machine disk/tape blocks
- object code in target machine disk/tape blocks
user manual(s) for this software
- installation and maintenance manual(s), where appropriate
- design documents for this software (probably the most
difficult to find, but also the most valuable to have)
- other documentation, like user guides, books, tutorials,
reports on that software
- example run, input and output (frequently contained in
the user manual)
For larger programming/operating systems each of these may
be an entire set, in the case of the software items a directory
tree. [end of page 8]
John G. Zabolitzky, "Preserving Software:
Why and How," Iterations: An Interdisciplinary Journal
of Software History 1 (September 13, 2002): 1-8.
Back to Top
| Table of Contents | CBI
Home