1 files changed, 378 insertions, 0 deletions
diff --git a/share/doc/papers/newvm/1.t b/share/doc/papers/newvm/1.t
new file mode 100644
index 000000000000..02ac8be22d1a
--- /dev/null
+++ b/share/doc/papers/newvm/1.t
@@ -0,0 +1,378 @@
+.\" Copyright (c) 1986 The Regents of the University of California.
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\" 3. All advertising materials mentioning features or use of this software
+.\"    must display the following acknowledgement:
+.\"	This product includes software developed by the University of
+.\"	California, Berkeley and its contributors.
+.\" 4. Neither the name of the University nor the names of its contributors
+.\"    may be used to endorse or promote products derived from this software
+.\"    without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\"	@(#)1.t	5.1 (Berkeley) 4/16/91
+.\" $FreeBSD$
+.\"
+.NH
+Motivations for a New Virtual Memory System
+.PP
+The virtual memory system distributed with Berkeley UNIX has served
+its design goals admirably well over the ten years of its existence.
+However the relentless advance of technology has begun to render it
+obsolete.
+This section of the paper describes the current design,
+points out the current technological trends,
+and attempts to define the new design considerations that should
+be taken into account in a new virtual memory design.
+.NH 2
+Implementation of 4.3BSD virtual memory
+.PP
+All Berkeley Software Distributions through 4.3BSD
+have used the same virtual memory design.
+All processes, whether active or sleeping, have some amount of
+virtual address space associated with them.
+This virtual address space
+is the combination of the amount of address space with which they initially
+started plus any stack or heap expansions that they have made.
+All requests for address space are allocated from available swap space
+at the time that they are first made;
+if there is insufficient swap space left to honor the allocation,
+the system call requesting the address space fails synchronously.
+Thus, the limit to available virtual memory is established by the
+amount of swap space allocated to the system.
+.PP
+Memory pages are used in a sort of shell game to contain the
+contents of recently accessed locations.
+As a process first references a location
+a new page is allocated and filled either with initialized data or
+zeros (for new stack and break pages).
+As the supply of free pages begins to run out, dirty pages are
+pushed to the previously allocated swap space so that they can be reused
+to contain newly faulted pages.
+If a previously accessed page that has been pushed to swap is once
+again used, a free page is reallocated and filled from the swap area
+[Babaoglu79], [Someren84].
+.NH 2
+Design assumptions for 4.3BSD virtual memory
+.PP
+The design criteria for the current virtual memory implementation
+were made in 1979.
+At that time the cost of memory was about a thousand times greater per
+byte than magnetic disks.
+Most machines were used as centralized time sharing machines.
+These machines had far more disk storage than they had memory
+and given the cost tradeoff between memory and disk storage,
+wanted to make maximal use of the memory even at the cost of
+wasting some of the disk space or generating extra disk I/O.
+.PP
+The primary motivation for virtual memory was to allow the
+system to run individual programs whose address space exceeded
+the memory capacity of the machine.
+Thus the virtual memory capability allowed programs to be run that
+could not have been run on a swap based system.
+Equally important in the large central timesharing environment
+was the ability to allow the sum of the memory requirements of
+all active processes to exceed the amount of physical memory on
+the machine.
+The expected mode of operation for which the system was tuned
+was to have the sum of active virtual memory be one and a half
+to two times the physical memory on the machine.
+.PP
+At the time that the virtual memory system was designed,
+most machines ran with little or no networking.
+All the file systems were contained on disks that were
+directly connected to the machine.
+Similarly all the disk space devoted to swap space was also
+directly connected.
+Thus the speed and latency with which file systems could be accessed
+were roughly equivalent to the speed and latency with which swap
+space could be accessed.
+Given the high cost of memory there was little incentive to have
+the kernel keep track of the contents of the swap area once a process
+exited since it could almost as easily and quickly be reread from the
+file system.
+.NH 2
+New influences
+.PP
+In the ten years since the current virtual memory system was designed,
+many technological advances have occurred.
+One effect of the technological revolution is that the
+micro-processor has become powerful enough to allow users to have their
+own personal workstations.
+Thus the computing environment is moving away from a purely centralized
+time sharing model to an environment in which users have a
+computer on their desk.
+This workstation is linked through a network to a centralized
+pool of machines that provide filing, computing, and spooling services.
+The workstations tend to have a large quantity of memory, 
+but little or no disk space.
+Because users do not want to be bothered with backing up their disks,
+and because of the difficulty of having a centralized administration
+backing up hundreds of small disks, these local disks are typically 
+used only for temporary storage and as swap space.
+Long term storage is managed by the central file server.
+.PP
+Another major technical advance has been in all levels of storage capacity.
+In the last ten years we have experienced a factor of four decrease in the
+cost per byte of disk storage.
+In this same period of time the cost per byte of memory has dropped
+by a factor of a hundred!
+Thus the cost per byte of memory compared to the cost per byte of disk is
+approaching a difference of only about a factor of ten.
+The effect of this change is that the way in which a machine is used
+is beginning to change dramatically.
+As the amount of physical memory on machines increases and the number of
+users per machine decreases, the expected
+mode of operation is changing from that of supporting more active virtual 
+memory than physical memory to that of having a surplus of memory that can
+be used for other purposes.
+.PP
+Because many machines will have more physical memory than they do swap
+space (with diskless workstations as an extreme example!),
+it is no longer reasonable to limit the maximum virtual memory
+to the amount of swap space as is done in the current design.
+Consequently, the new design will allow the maximum virtual memory
+to be the sum of physical memory plus swap space.
+For machines with no swap space, the maximum virtual memory will
+be governed by the amount of physical memory.
+.PP
+Another effect of the current technology is that the latency and overhead
+associated with accessing the file system is considerably higher
+since the access must be over the network
+rather than to a locally-attached disk.
+One use of the surplus memory would be to
+maintain a cache of recently used files;
+repeated uses of these files would require at most a verification from
+the file server that the data was up to date.
+Under the current design, file caching is done by the buffer pool,
+while the free memory is maintained in a separate pool.
+The new design should have only a single memory pool so that any
+free memory can be used to cache recently accessed files.
+.PP
+Another portion of the memory will be used to keep track of the contents
+of the blocks on any locally-attached swap space analogously
+to the way that memory pages are handled.
+Thus inactive swap blocks can also be used to cache less-recently-used
+file data.
+Since the swap disk is locally attached, it can be much more quickly
+accessed than a remotely located file system.
+This design allows the user to simply allocate their entire local disk
+to swap space, thus allowing the system to decide what files should
+be cached to maximize its usefulness.
+This design has two major benefits.
+It relieves the user of deciding what files
+should be kept in a small local file system.
+It also insures that all modified files are migrated back to the
+file server in a timely fashion, thus eliminating the need to dump
+the local disk or push the files manually.
+.NH
+User Interface
+.PP
+This section outlines our new virtual memory interface as it is
+currently envisioned.
+The details of the system call interface are contained in Appendix A.
+.NH 2
+Regions
+.PP
+The virtual memory interface is designed to support both large,
+sparse address spaces as well as small, densely-used address spaces.
+In this context, ``small'' is an address space roughly the
+size of the physical memory on the machine, 
+while ``large'' may extend up to the maximum addressability of the machine.
+A process may divide its address space up into a number of regions.
+Initially a process begins with four regions; 
+a shared read-only fill-on-demand region with its text,
+a private fill-on-demand region for its initialized data,
+a private zero-fill-on-demand region for its uninitialized data and heap,
+and a private zero-fill-on-demand region for its stack.
+In addition to these regions, a process may allocate new ones.
+The regions may not overlap and the system may impose an alignment
+constraint, but the size of the region should not be limited
+beyond the constraints of the size of the virtual address space.
+.PP
+Each new region may be mapped either as private or shared.
+When it is privately mapped, changes to the contents of the region
+are not reflected to any other process that map the same region.
+Regions may be mapped read-only or read-write.
+As an example, a shared library would be implemented as two regions;
+a shared read-only region for the text, and a private read-write
+region for the global variables associated with the library.
+.PP
+A region may be allocated with one of several allocation strategies.
+It may map some memory hardware on the machine such as a frame buffer.
+Since the hardware is responsible for storing the data,
+such regions must be exclusive use if they are privately mapped.
+.PP
+A region can map all or part of a file.
+As the pages are first accessed, the region is filled in with the
+appropriate part of the file.
+If the region is mapped read-write and shared, changes to the
+contents of the region are reflected back into the contents of the file.
+If the region is read-write but private,
+changes to the region are copied to a private page that is not
+visible to other processes mapping the file,
+and these modified pages are not reflected back to the file.
+.PP
+The final type of region is ``anonymous memory''.
+Uninitialed data uses such a region, privately mapped;
+it is zero-fill-on-demand and its contents are abandoned
+when the last reference is dropped.
+Unlike a region that is mapped from a file,
+the contents of an anonymous region will never be read from or
+written to a disk unless memory is short and part of the region
+must be paged to a swap area.
+If one of these regions is mapped shared,
+then all processes see the changes in the region.
+This difference has important performance considerations;
+the overhead of reading, flushing, and possibly allocating a file
+is much higher than simply zeroing memory.
+.PP
+If several processes wish to share a region,
+then they must have some way of rendezvousing.
+For a mapped file this is easy;
+the name of the file is used as the rendezvous point.
+However, processes may not need the semantics of mapped files
+nor be willing to pay the overhead associated with them.
+For anonymous memory they must use some other rendezvous point.
+Our current interface allows processes to associate a
+descriptor with a region, which it may then pass to other
+processes that wish to attach to the region.
+Such a descriptor may be bound into the UNIX file system
+name space so that other processes can find it just as
+they would with a mapped file.
+.NH 2
+Shared memory as high speed interprocess communication
+.PP
+The primary use envisioned for shared memory is to
+provide a high speed interprocess communication (IPC) mechanism
+between cooperating processes.
+Existing IPC mechanisms (\fIi.e.\fP pipes, sockets, or streams)
+require a system call to hand off a set
+of data destined for another process, and another system call
+by the recipient process to receive the data.
+Even if the data can be transferred by remapping the data pages
+to avoid a memory to memory copy, the overhead of doing the system
+calls limits the throughput of all but the largest transfers.
+Shared memory, by contrast, allows processes to share data at any
+level of granularity without system intervention.
+.PP
+However, to maintain all but the simplest of data structures,
+the processes must serialize their modifications to shared
+data structures if they are to avoid corrupting them.
+This serialization is typically done with semaphores.
+Unfortunately, most implementations of semaphores are
+done with system calls. 
+Thus processes are once again limited by the need to do two
+system calls per transaction, one to lock the semaphore, the
+second to release it.
+The net effect is that the shared memory model provides little if 
+any improvement in interprocess bandwidth.
+.PP
+To achieve a significant improvement in interprocess bandwidth
+requires a large decrease in the number of system calls needed to
+achieve the interaction.
+In profiling applications that use
+serialization locks such as the UNIX kernel,
+one typically finds that most locks are not contested.
+Thus if one can find a way to avoid doing a system call in the case
+in which a lock is not contested,
+one would expect to be able to dramatically reduce the number
+of system calls needed to achieve serialization.
+.PP
+In our design, cooperating processes manage their semaphores
+in their own address space.
+In the typical case, a process executes an atomic test-and-set instruction
+to acquire a lock, finds it free, and thus is able to get it.
+Only in the (rare) case where the lock is already set does the process
+need to do a system call to wait for the lock to clear.
+When a process is finished with a lock, 
+it can clear the lock itself.
+Only if the ``WANT'' flag for the lock has been set is
+it necessary for the process to do a system call to cause the other
+process(es) to be awakened.
+.PP
+Another issue that must be considered is portability.
+Some computers require access to special hardware to implement
+atomic interprocessor test-and-set. 
+For such machines the setting and clearing of locks would
+all have to be done with system calls;
+applications could still use the same interface without change,
+though they would tend to run slowly.
+.PP
+The other issue of compatibility is with System V's semaphore
+implementation.
+Since the System V interface has been in existence for several years,
+and applications have been built that depend on this interface,
+it is important that this interface also be available.
+Although the interface is based on system calls for both setting and
+clearing locks,
+the same interface can be obtained using our interface without
+system calls in most cases.
+.PP
+This implementation can be achieved as follows.
+System V allows entire sets of semaphores to be set concurrently.
+If any of the locks are unavailable, the process is put to sleep
+until they all become available.
+Under our paradigm, a single additional semaphore is defined
+that serializes access to the set of semaphores being simulated.
+Once obtained in the usual way, the set of semaphores can be
+inspected to see if the desired ones are available.
+If they are available, they are set, the guardian semaphore
+is released and the process proceeds.
+If one or more of the requested set is not available,
+the guardian semaphore is released and the process selects an
+unavailable semaphores for which to wait.
+On being reawakened, the whole selection process must be repeated.
+.PP
+In all the above examples, there appears to be a race condition.
+Between the time that the process finds that a semaphore is locked,
+and the time that it manages to call the system to sleep on the
+semaphore another process may unlock the semaphore and issue a wakeup call.
+Luckily the race can be avoided.
+The insight that is critical is that the process and the kernel agree
+on the physical byte of memory that is being used for the semaphore.
+The system call to put a process to sleep takes a pointer
+to the desired semaphore as its argument so that once inside
+the kernel, the kernel can repeat the test-and-set.
+If the lock has cleared
+(and possibly the wakeup issued) between the time that the process
+did the test-and-set and eventually got into the sleep request system call,
+then the kernel immediately resumes the process rather than putting
+it to sleep.
+Thus the only problem to solve is how the kernel interlocks between testing
+a semaphore and going to sleep;
+this problem has already been solved on existing systems.
+.NH
+References
+.sp
+.IP [Babaoglu79] 20
+Babaoglu, O., and Joy, W.,
+``Data Structures Added in the Berkeley Virtual Memory Extensions
+to the UNIX Operating System''
+Computer Systems Research Group, Dept of EECS, University of California,
+Berkeley, CA 94720, USA, November 1979.
+.IP [Someren84] 20
+Someren, J. van,
+``Paging in Berkeley UNIX'',
+Laboratorium voor schakeltechniek en techneik v.d. 
+informatieverwerkende machines,
+Codenummer 051560-44(1984)01, February 1984.