diff options
Diffstat (limited to 'share/doc/smm')
88 files changed, 19931 insertions, 0 deletions
diff --git a/share/doc/smm/01.setup/0.t b/share/doc/smm/01.setup/0.t new file mode 100644 index 000000000000..c7fd752f990b --- /dev/null +++ b/share/doc/smm/01.setup/0.t @@ -0,0 +1,126 @@ +.\" Copyright (c) 1988, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds Ux \s-1UNIX\s0 +.ds Bs \s-1BSD\s0 +.\" Current version: +.ds 4B 4.4\*(Bs +.ds Ps 4.3\*(Bs +.\" tape and disk naming +.ds Mt mt +.ds Dk sd +.ds Dn disk +.ds Pa c +.\" block size used on the tape +.ds Bb 10240 +.ds Bz 20 +.\" document date +.ds Dy July 27, 1993 +.de Sm +\s-1\\$1\s0\\$2 +.. +.de Pn \" pathname +.ie n \fI\\$1\fP\\$2 +.el \f(CW\\$1\fP\\$2 +.. +.de Li \" literal +\f(CW\\$1\fP\\$2 +.. +.de I \" italicize first arg +\fI\\$1\fP\^\\$2 +.. +.de Xr \" manual reference +\fI\\$1\fP\^\\$2 +.. +.de Fn \" function +\fI\\$1\fP\^()\\$2 +.. +.bd S B 3 +.EH 'SMM:1-%''Installing and Operating \*(4B UNIX' +.OH 'Installing and Operating \*(4B UNIX''SMM:1-%' +.de Sh +.NH \\$1 +\\$2 +.nr PD .1v +.XS \\n% +.ta 0.6i +\\*(SN \\$2 +.XE +.nr PD .3v +.. +.TL +Installing and Operating \*(4B UNIX +.br +\*(Dy +.AU +Marshall Kirk McKusick +.AU +Keith Bostic +.AU +Michael J. Karels +.AU +Samuel J. Leffler +.AI +Computer Systems Research Group +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, California 94720 +(415) 642-7780 +.AU +Mike Hibler +.AI +Center for Software Science +Department of Computer Science +University of Utah +Salt Lake City, Utah 84112 +(801) 581-5017 +.AB +.PP +This document contains instructions for the +installation and operation of the +\*(4B release of UNIX\** +as distributed by The University of California at Berkeley. +.FS +UNIX is a registered trademark of USL in the USA and some other countries. +.FE +.PP +It discusses procedures for installing UNIX on a new machine, +and for upgrading an existing \*(Ps UNIX system to the new release. +An explanation of how to lay out filesystems on available disks +and the space requirements for various parts of the system are given. +A brief overview of the major changes to +the system between \*(Ps and \*(4B are outlined. +An explanation of how to set up terminal lines and user accounts, +and how to do system-specific tailoring is provided. +A description of how to install and configure the \*(4B networking +facilities is included. +Finally, the document details system operation procedures: +shutdown and startup, filesystem backup procedures, +resource control, performance monitoring, and procedures for recompiling +and reinstalling system software. +.AE +.bp +3 diff --git a/share/doc/smm/01.setup/1.t b/share/doc/smm/01.setup/1.t new file mode 100644 index 000000000000..6b31cde500cf --- /dev/null +++ b/share/doc/smm/01.setup/1.t @@ -0,0 +1,166 @@ +.\" Copyright (c) 1988, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds lq `` +.ds rq '' +.ds LH "Installing/Operating \*(4B +.ds RH Introduction +.ds CF \*(Dy +.LP +.bp +.Sh 1 "Introduction" +.PP +This document explains how to install the \*(4B Berkeley +version of UNIX on your system. +The filesystem format is compatible with \*(Ps +and it will only be necessary for you to do a full bootstrap +procedure if you are installing the release on a new machine. +The object file formats are completely different from the System +V release, so the most straightforward procedure for upgrading +a System V system is to do a full bootstrap. +.PP +The full bootstrap procedure +is outlined in section 2; the process starts with copying a filesystem +image onto a new disk. +This filesystem is then booted and used to extract the remainder of the +system binaries and sources from the archives on the tape(s). +.PP +The technique for upgrading a \*(Ps system is described +in section 3 of this document. +The upgrade procedure involves extracting system binaries +onto new root and +.Pn /usr +filesystems and merging local +configuration files into the new system. +User filesystems may be upgraded in place. +Most \*(Ps binaries may be used with \*(4B in the course +of the conversion. +It is desirable to recompile local sources after the conversion, +as the new compiler (GCC) provides superior code optimization. +Consult section 3.5 for a description of some of the differences +between \*(Ps and \*(4B. +.Sh 2 "Distribution format" +.PP +The distribution comes in two formats: +.DS +(3)\0\0 6250bpi 2400' 9-track magnetic tapes, or +(1)\0\0 8mm Exabyte tape +.DE +.PP +If you have the facilities, we \fBstrongly\fP recommend copying the +magnetic tape(s) in the distribution kit to guard against disaster. +The tapes contain \*(Bb-byte records. +There are interspersed tape marks; +end-of-tape is signaled by a double end-of-file. +The first file on the tape is architecture dependent. +Additional files on the tape(s) +contain tape archive images of the system binaries and sources (see +.Xr tar (1)\**). +.FS +References of the form \fIX\fP(Y) mean the entry named +\fIX\fP in section Y of the ``UNIX Programmer's Manual''. +.FE +See the tape label for a description of the contents +and format of each individual tape. +.Sh 2 "UNIX device naming" +.PP +Device names have a different syntax depending on whether you are talking +to the standalone system or a running UNIX kernel. +The standalone system syntax is currently architecture dependent and is +described in the various architecture specific sections as applicable. +When not running standalone, devices are available via files in the +.Pn /dev/ +directory. +The file name typically encodes the device type, its logical unit and +a partition within that unit. +For example, +.Pn /dev/sd2b +refers to the second partition (``b'') of +SCSI (``sd'') drive number ``2'', while +.Pn /dev/rmt0 +refers to the raw (``r'') interface of 9-track tape (``mt'') unit ``0''. +.PP +The mapping of physical addressing information (e.g. controller, target) +to a logical unit number is dependent on the system configuration. +In all simple cases, where only a single controller is present, a drive +with physical unit number 0 (e.g., as determined by its unit +specification, either unit plug or other selection mechanism) +will be called unit 0 in its UNIX file name. +This is not, however, strictly +necessary, since the system has a level of indirection in this naming. +If there are multiple controllers, the disk unit numbers will normally +be counted sequentially across controllers. This can be taken +advantage of to make the system less dependent on the interconnect +topology, and to make reconfiguration after hardware failure easier. +.PP +Each UNIX physical disk is divided into at most 8 logical disk partitions, +each of which may occupy any consecutive cylinder range on the physical +device. The cylinders occupied by the 8 partitions for each drive type +are specified initially in the disk description file +.Pn /etc/disktab +(c.f. +.Xr disktab (5)). +The partition information and description of the +drive geometry are written in one of the first sectors of each disk with the +.Xr disklabel (8) +program. Each partition may be used for either a +raw data area such as a paging area or to store a UNIX filesystem. +It is conventional for the first partition on a disk to be used +to store a root filesystem, from which UNIX may be bootstrapped. +The second partition is traditionally used as a paging area, and the +rest of the disk is divided into spaces for additional ``mounted +filesystems'' by use of one or more additional partitions. +.Sh 2 "UNIX devices: block and raw" +.PP +UNIX makes a distinction between ``block'' and ``raw'' (character) +devices. Each disk has a block device interface where +the system makes the device byte addressable and you can write +a single byte in the middle of the disk. The system will read +out the data from the disk sector, insert the byte you gave it +and put the modified data back. The disks with the names +.Pn /dev/xx0[a-h] , +etc., are block devices. +There are also raw devices available. +These have names like +.Pn /dev/rxx0[a-h] , +the ``r'' here standing for ``raw''. +Raw devices bypass the buffer cache and use DMA directly to/from +the program's I/O buffers; +they are normally restricted to full-sector transfers. +In the bootstrap procedures we +will often suggest using the raw devices, because these tend +to work faster. +Raw devices are used when making new filesystems, +when checking unmounted filesystems, +or for copying quiescent filesystems. +The block devices are used to mount filesystems. +.PP +You should be aware that it is sometimes important whether to use +the character device (for efficiency) or not (because it would not +work, e.g. to write a single byte in the middle of a sector). +Do not change the instructions by using the wrong type of device +indiscriminately. diff --git a/share/doc/smm/01.setup/2.t b/share/doc/smm/01.setup/2.t new file mode 100644 index 000000000000..8a7d579d392f --- /dev/null +++ b/share/doc/smm/01.setup/2.t @@ -0,0 +1,1642 @@ +.\" Copyright (c) 1988, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds lq `` +.ds rq '' +.ds LH "Installing/Operating \*(4B +.ds RH Bootstrapping +.ds CF \*(Dy +.Sh 1 "Bootstrap procedure" +.PP +This section explains the bootstrap procedure that can be used +to get the kernel supplied with this distribution running on your machine. +If you are not currently running \*(Ps you will +have to do a full bootstrap. +Section 3 describes how to upgrade a \*(Ps system. +An understanding of the operations used in a full bootstrap +is helpful in doing an upgrade as well. +In either case, it is highly desirable to read and understand +the remainder of this document before proceeding. +.PP +The distribution supports a somewhat wider set of machines than +those for which we have built binaries. +The architectures that are supported only in source form include: +.IP \(bu +Intel 386/486-based machines (ISA/AT or EISA bus only) +.IP \(bu +Sony News MIPS-based workstations +.IP \(bu +Omron Luna 68000-based workstations +.LP +If you wish to run one of these architectures, +you will have to build a cross compilation environment. +Note that the distribution does +.B not +include the machine support for the Tahoe and VAX architectures +found in previous BSD distributions. +Our primary development environment is the HP9000/300 series machines. +The other architectures are developed and supported by +people outside the university. +Consequently, we are not able to directly test or maintain these +other architectures, so cannot comment on their robustness, +reliability, or completeness. +.Sh 2 "Bootstrapping from the tape" +.LP +The set of files on the distribution tape are as follows: +.IP 1) +A +.Xr dd (1) +(HP300), +.Xr tar (1) +(DECstation), or +.Xr dump (8) +(SPARC) image of the root filesystem +.IP 2) +A +.Xr tar +image of the +.Pn /var +filesystem +.IP 3) +A +.Xr tar +image of the +.Pn /usr +filesystem +.IP 4) +A +.Xr tar +image of +.Pn /usr/src/sys +.IP 5) +A +.Xr tar +image of +.Pn /usr/src +except sys and contrib +.IP 6) +A +.Xr tar +image of +.Pn /usr/src/contrib +.IP 7) +(8mm Exabyte tape distributions only) +A +.Xr tar +image of +.Pn /usr/src/X11R5 +.LP +The tape bootstrap procedure used to create a +working system involves the following major steps: +.IP 1) +Transfer a bootable root filesystem from the tape to a disk +and get it booted and running. +.IP 2) +Build and restore the +.Pn /var +and +.Pn /usr +filesystems from tape with +.Xr tar (1). +.IP 3) +Extract the system and utility source files as desired. +.PP +The following sections describe the above steps in detail. +The details of the first step vary between architectures. +The specific steps for the HP300, SPARC, and DECstation are +given in the next three sections respectively. +You should follow the instructions for your particular architecture. +In all sections, +commands you are expected to type are shown in italics, while that +information printed by the system is shown emboldened. +.Sh 2 "Booting the HP300" +.Sh 3 "Supported hardware" +.LP +The hardware supported by \*(4B for the HP300/400 is as follows: +.TS +center box; +lw(1i) lw(4i). +CPU's T{ +68020 based (318, 319, 320, 330 and 350), +68030 based (340, 345, 360, 370, 375, 400) and +68040 based (380, 425, 433). +T} +_ +DISK's T{ +HP-IB/CS80 (7912, 7914, 7933, 7936, 7945, 7957, 7958, 7959, 2200, 2203) +and SCSI-I (including magneto-optical). +T} +_ +TAPE's T{ +Low-density CS80 cartridge (7914, 7946, 9144), +high-density CS80 cartridge (9145), +HP SCSI DAT and +SCSI Exabyte. +T} +_ +RS232 T{ +98644 built-in single-port, 98642 4-port and 98638 8-port interfaces. +T} +_ +NETWORK T{ +98643 internal and external LAN cards. +T} +_ +GRAPHICS T{ +Terminal emulation and raw frame buffer support for +98544 / 98545 / 98547 (Topcat color & monochrome), +98548 / 98549 / 98550 (Catseye color & monochrome), +98700 / 98710 (Gatorbox), +98720 / 98721 (Renaissance), +98730 / 98731 (DaVinci) and +A1096A (Hyperion monochrome). +T} +_ +INPUT T{ +General interface supporting all HIL devices. +(e.g. keyboard, 2 and 3 button mice, ID module, ...) +T} +_ +MISC T{ +Battery-backed real time clock, +builtin and 98625A/B HP-IB interfaces, +builtin and 98658A SCSI interfaces, +serial printers and plotters on HP-IB, +and SCSI autochanger device. +T} +.TE +.LP +Major items that are not supported +include the 310 and 332 CPU's, 400 series machines +configured for Domain/OS, EISA and VME bus adaptors, audio, the centronics +port, 1/2" tape drives (7980), CD-ROM, and the PVRX/TVRX 3D graphics displays. +.Sh 3 "Standalone device file naming" +.LP +The standalone system device name syntax on the HP300 is of the form: +.DS +xx(a,c,u,p) +.DE +where +\fIxx\fP is the device type, +\fIa\fP specifies the adaptor to use, +\fIc\fP the controller, +\fIu\fP the unit, and +\fIp\fP a partition. +The \fIdevice type\fP differentiates the various disks and tapes and is one of: +``rd'' for HP-IB CS80 disks, +``ct'' for HP-IB CS80 cartridge tapes, or +``sd'' for SCSI-I disks +(SCSI-I tapes are currently not supported). +The \fIadaptor\fP field is a logical HP-IB or SCSI bus adaptor card number. +This will typically be +0 for SCSI disks, +0 for devices on the ``slow'' HP-IB interface (usually tapes) and +1 for devices on the ``fast'' HP-IB interface (usually disks). +To get a complete mapping of physical (select-code) to logical card numbers +just type a ^C at the standalone prompt. +The \fIcontroller\fP field is the disk or tape's target number on the +HP-IB or SCSI bus. +For SCSI the range is 0 to 6 (7 is the adaptor address) and +for HP-IB the range is 0 to 7. +The \fIunit\fP field is unused and should be 0. +The \fIpartition\fP field is interpreted differently for tapes +and disks: for disks it is a disk partition (in the range 0-7), +and for tapes it is a file number offset on the tape. +Thus, partition 2 of a SCSI disk drive at target 3 on SCSI bus 1 +would be ``sd(1,3,0,2)''. +If you have only one of any type bus adaptor, you may omit the adaptor +and controller numbers; +e.g. ``sd(0,2)'' could be used instead of ``sd(0,0,0,2)''. +The following examples always use the full syntax for clarity. +.Sh 3 "The procedure" +.LP +The basic steps involved in bringing up the HP300 are as follows: +.IP 1) +Obtain a second disk and format it, if necessary. +.IP 2) +Copy a root filesystem from the +tape onto the beginning of the disk. +.IP 3) +Boot the UNIX system on the new disk. +.IP 4) +(Optional) Build a root filesystem optimized for your disk. +.IP 5) +Label the disks with the +.Xr disklabel (8) +program. +.Sh 4 "Step 1: selecting and formatting a disk" +.PP +For your first system you will have to obtain a formatted disk +of a type given in the ``supported hardware'' list above. +If you want to load an entire binary system +(i.e., everything except +.Pn /usr/src ), +on the single disk you will need a minimum of 290MB, +ruling out anything smaller than a 7959B/S disk. +The disklabel included in the bootstrap root image is laid out +to accommodate this scenario. +Note that an HP SCSI magneto-optical disk will work fine for this case. +\*(4B will boot and run (albeit slowly) using one. +If you want to load source on a single disk system, +you will need at least 640MB (at least a 2213A SCSI or 2203A HP-IB disk). +A disk as small as the 7945A (54MB) can be used for the bootstrap +procedure but will hold only the root and primary swap partitions. +If you plan to use multiple disks, +refer to section 2.5 for suggestions on partitioning. +.PP +After selecting a disk, you may need to format it. +Since most HP disk drives come pre-formatted +(except optical media) +you probably will not, but if necessary, +you can format a disk under HP-UX using the +.Xr mediainit (1m) +program. +Once you have \*(4B up and running on one machine you can use the +.Xr scsiformat (8) +program to format additional SCSI disks. +Any additional HP-IB disks will have to be formatted using HP-UX. +.Sh 4 "Step 2: copying the root filesystem from tape to disk" +.PP +Once you have a formatted second disk you can use the +.Xr dd (1) +command under HP-UX to copy the root filesystem image from +the tape to the beginning of the second disk. +For HP's, the root filesystem image is the first file on the tape. +It includes a disklabel and bootblock along with the root filesystem. +An example command to copy the image from tape to the beginning of a disk is: +.DS +.ft CW +dd if=/dev/rmt/0m of=/dev/rdsk/1s0 bs=\*(Bzb +.DE +The actual special file syntax may vary depending on unit numbers and +the version of HP-UX that is running. +Consult the HP-UX +.Xr mt (7) +and +.Xr disk (7) +man pages for details. +.PP +Note that if you have a SCSI disk, you don't necessarily have to use +HP-UX (or an HP) to create the boot disk. +Any machine and operating system that will allow you to copy the +raw disk image out to block 0 of the disk will do. +.PP +If you have only a single machine with a single disk, +you may still be able to install and boot \*(4B if you have an +HP-IB cartridge tape drive. +If so, you can use a more difficult approach of booting a +standalone copy program from the tape, and using that to copy the +root filesystem image from the tape to the disk. +To do this, you need to extract the first file of the distribution tape +(the root image), copy it over to a machine with a cartridge drive +and then copy the image onto tape. +For example: +.DS +.ft CW +dd if=/dev/rst0 of=bootimage bs=\*(Bzb +rcp bootimage foo:/tmp/bootimage +<login to foo> +dd if=/tmp/bootimage of=/dev/rct/0m bs=\*(Bzb +.DE +Once this tape is created you can boot and run the standalone tape +copy program from it. +The copy program is loaded just as any other program would be loaded +by the bootrom in ``attended'' mode: +reset the CPU, +hold down the space bar until the word ``Keyboard'' appears in the +installed interface list, and +enter the menu selection for SYS_TCOPY. +Once loaded and running: +.DS +.TS +lw(2i) l. +\fBFrom:\fP \fI^C\fP (control-C to see logical adaptor assignments) +\fBhpib0 at sc7\fP +\fBscsi0 at sc14\fP +\fBFrom:\fP \fIct(0,7,0,0)\fP (HP-IB tape, target 7, first tape file) +\fBTo:\fP \fIsd(0,0,0,2)\fP (SCSI disk, target 0, third partition) +\fBCopy completed: 1728 records copied\fP +.TE +.DE +.LP +This copy will likely take 30 minutes or more. +.Sh 4 "Step 3: booting the root filesystem" +.PP +You now have a bootable root filesystem on the disk. +If you were previously running with two disks, +it would be best if you shut down the machine and turn off power on +the HP-UX drive. +It will be less confusing and it will eliminate any chance of accidentally +destroying the HP-UX disk. +If you used a cartridge tape for booting you should also unload the tape +at this point. +Whether you booted from tape or copied from disk you should now reboot +the machine and do another attended boot (see previous section), +this time with SYS_TBOOT. +Once loaded and running the boot program will display the CPU type and +prompt for a kernel file to boot: +.DS +.B +HP433 CPU +Boot +.R +\fB:\fP \fI/kernel\fP +.DE +.LP +After providing the kernel name, the machine will boot \*(4B with +output that looks about like this: +.DS +.B +597480+34120+139288 start 0xfe8019ec +Copyright (c) 1982, 1986, 1989, 1991, 1993 + The Regents of the University of California. +Copyright (c) 1992 Hewlett-Packard Company +Copyright (c) 1992 Motorola Inc. +All rights reserved. + +4.4BSD UNIX #1: Tue Jul 20 11:40:36 PDT 1993 + mckusick@vangogh.CS.Berkeley.EDU:/usr/obj/sys/compile/GENERIC.hp300 +HP9000/433 (33MHz MC68040 CPU+MMU+FPU, 4k on-chip physical I/D caches) +real mem = xxx +avail mem = ### +using ### buffers containing ### bytes of memory +(... information about available devices ...) +root device? +.R +.DE +.PP +The first three numbers are printed out by the bootstrap program and +are the sizes of different parts of the system (text, initialized and +uninitialized data). The system also allocates several system data +structures after it starts running. The sizes of these structures are +based on the amount of available memory and the maximum count of active +users expected, as declared in a system configuration description. This +will be discussed later. +.PP +UNIX itself then runs for the first time and begins by printing out a banner +identifying the release and +version of the system that is in use and the date that it was compiled. +.PP +Next the +.I mem +messages give the +amount of real (physical) memory and the +memory available to user programs +in bytes. +For example, if your machine has 16Mb bytes of memory, then +\fBxxx\fP will be 16777216. +.PP +The messages that come out next show what devices were found on +the current processor. These messages are described in +.Xr autoconf (4). +The distributed system may not have +found all the communications devices you have +or all the mass storage peripherals you have, especially +if you have more than +two of anything. You will correct this when you create +a description of your machine from which to configure a site-dependent +version of UNIX. +The messages printed at boot here contain much of the information +that will be used in creating the configuration. +In a correctly configured system most of the information +present in the configuration description +is printed out at boot time as the system verifies that each device +is present. +.PP +The \*(lqroot device?\*(rq prompt was printed by the system +to ask you for the name of the root filesystem to use. +This happens because the distribution system is a \fIgeneric\fP +system, i.e., it can be bootstrapped on a cpu with its root device +and paging area on any available disk drive. +You will most likely respond to the root device question with ``sd0'' +if you are booting from a SCSI disk, +or with ``rd0'' if you are booting from an HP-IB disk. +This response shows that the disk it is running +on is drive 0 of type ``sd'' or ``rd'' respectively. +If you have other disks attached to the system, +it is possible that the drive you are using will not be configured +as logical drive 0. +Check the autoconfiguration messages printed out by the kernel to +make sure. +These messages will show the type of every logical drive +and their associated controller and slave addresses. +You will later build a system tailored to your configuration +that will not prompt you for a root device when it is bootstrapped. +.DS +\fBroot device?\fP \fI\*(Dk0\fP +\fBWARNING: preposterous time in filesystem \-\- CHECK AND RESET THE DATE!\fP +\fBerase ^?, kill ^U, intr ^C\fP +\fB#\fP +.DE +.PP +The \*(lqerase ...\*(rq message is part of the +.Pn /.profile +that was executed by the root shell when it started. This message +tells you about the settings of the character erase, +line erase, and interrupt characters. +.PP +UNIX is now running, +and the \fIUNIX Programmer's Manual\fP applies. The ``#'' is the prompt +from the Bourne shell, and lets you know that you are the super-user, +whose login name is \*(lqroot\*(rq. +.PP +At this point, the root filesystem is mounted read-only. +Before continuing the installation, the filesystem needs to be ``updated'' +to allow writing and device special files for the following steps need +to be created. +This is done as follows: +.DS +.TS +lw(2i) l. +\fB#\fP \fImount_mfs -s 1000 -T type /dev/null /tmp\fP (create a writable filesystem) +(\fItype\fP is the disk type as determined from /etc/disktab) +\fB#\fP \fIcd /tmp\fP (connect to that directory) +\fB#\fP \fImount \-uw /tmp/\*(Dk#a /\fP (read-write mount root filesystem) +.TE +.DE +.Sh 4 "Step 4: (optional) restoring the root filesystem" +.PP +The root filesystem that you are currently running on is complete, +however it probably is not optimally laid out for the disk on +which you are running. +If you will be cloning copies of the system onto multiple disks for +other machines, you are advised to connect one of these disks to +this machine, and build and restore a properly laid out root filesystem +onto it. +If this is the only machine on which you will be running \*(4B +or peak performance is not an issue, you can skip this step and +proceed directly to step 5. +.PP +Connect a second disk to your machine. +If you bootstrapped using the two disk method, you can +overwrite your initial HP-UX disk, as it will no longer +be needed (assuming you have no plans to run HP-UX again). +.PP +To really create the root filesystem on drive 1 +you should first label the disk as described in step 5 below. +Then run the following commands: +.DS +\fB#\fP\|\fInewfs /dev/r\*(Dk1a\fP +\fB#\fP\|\fImount /dev/\*(Dk1a /mnt\fP +\fB#\fP\|\fIcd /mnt\fP +\fB#\fP\|\fIdump 0f \- /dev/r\*(Dk0a | restore xf \-\fP +(Note: restore will ask if you want to ``set owner/mode for '.''' +to which you should reply ``yes''.) +.DE +.PP +When this completes, +you should then shut down the system, and boot on the disk that +you just created following the procedure in step (3) above. +.Sh 4 "Step 5: placing labels on the disks" +.PP +For each disk on the HP300, \*(4B places information about the geometry +of the drive and the partition layout at byte offset 1024. +This information is written with +.Xr disklabel (8). +.PP +The root image just loaded includes a ``generic'' label intended to allow +easy installation of the root and +.Pn /usr +and may not be suitable for the actual +disk on which it was installed. +In particular, +it may make your disk appear larger or smaller than its real size. +In the former case, you lose some capacity. +In the latter, some of the partitions may map non-existent sectors +leading to errors if those partitions are used. +It is also possible that the defined geometry will interact poorly with +the filesystem code resulting in reduced performance. +However, as long as you are willing to give up a little space, +not use certain partitions or suffer minor performance degradation, +you might want to avoid this step; +especially if you do not know how to use +.Xr ed (1). +.PP +If you choose to edit this label, +you can fill in correct geometry information from +.Pn /etc/disktab . +You may also want to rework the ``e'' and ``f'' partitions used for loading +.Pn /usr +and +.Pn /var . +You should not attempt to, and +.Xr disklabel +will not let you, modify the ``a'', ``b'' and ``d'' partitions. +To edit a label: +.DS +\fB#\fP \fIEDITOR=ed\fP +\fB#\fP \fIexport EDITOR\fP +\fB#\fP \fIdisklabel -r -e /dev/r\fBXX#\fPd +.DE +where \fBXX\fP is the type and \fB#\fP is the logical drive number; e.g. +.Pn /dev/rsd0d +or +.Pn /dev/rrd0d . +Note the explicit use of the ``d'' partition. +This partition includes the bootblock as does ``c'' +and using it allows you to change the size of ``c''. +.PP +If you wish to label any additional disks, run the following command for each: +.DS +\fB#\|\fP\fIdisklabel -rw \fBXX# type\fP \fI"optional_pack_name"\fP +.DE +where \fBXX#\fP is the same as in the previous command +and \fBtype\fP is the HP300 disk device name as listed in +.Pn /etc/disktab . +The optional information may contain any descriptive name for the +contents of a disk, and may be up to 16 characters long. This procedure +will place the label on the disk using the information found in +.Pn /etc/disktab +for the disk type named. +If you have changed the disk partition sizes, +you may wish to add entries for the modified configuration in +.Pn /etc/disktab +before labeling the affected disks. +.PP +You have now completed the HP300 specific part of the installation. +Now proceed to the generic part of the installation +described starting in section 2.5 below. +Note that where the disk name ``sd'' is used throughout section 2.5, +you should substitute the name ``rd'' if you are running on an HP-IB disk. +Also, if you are loading on a single disk with the default disklabel, +.Pn /var +should be restored to the ``f'' partition and +.Pn /usr +to the ``e'' partition. +.Sh 2 "Booting the SPARC" +.Sh 3 "Supported hardware" +.LP +The hardware supported by \*(4B for the SPARC is as follows: +.TS +center box; +lw(1i) lw(4i). +CPU's T{ +SPARCstation 1 series (1, 1+, SLC, IPC) and +SPARCstation 2 series (2, IPX). +T} +_ +DISK's T{ +SCSI. +T} +_ +TAPE's T{ +none. +T} +_ +NETWORK T{ +SPARCstation Lance (le). +T} +_ +GRAPHICS T{ +bwtwo and cgthree. +T} +_ +INPUT T{ +Keyboard and mouse. +T} +_ +MISC T{ +Battery-backed real time clock, +built-in serial devices, +Sbus SCSI controller, +and audio device. +T} +.TE +.LP +Major items that are not supported include +anything VME-based, +the GX (cgsix) display, +the floppy disk, and SCSI tapes. +.Sh 3 "Limitations" +.LP +There are several important limitations on the \*(4B distribution +for the SPARC: +.IP 1) +You +.B must +have SunOS 4.1.x or Solaris to bring up \*(4B. +There is no SPARCstation bootstrap code in this distribution. The +Sun-supplied boot loader will be used to boot \*(4B; you must copy +this from your SunOS distribution. This imposes several +restrictions on the system, as detailed below. +.IP 2) +The \*(4B SPARC kernel does not remap SCSI IDs. A SCSI disk at +target 0 will become ``sd0'', where in SunOS the same disk will +normally be called ``sd3''. If your existing SunOS system is +diskful, it will be least painful to have SunOS running on the disk +on target 0 lun 0 and put \*(4B on the disk on target 3 lun 0. Both +systems will then think they are running on ``sd0'', and you can +boot either system as needed simply by changing the EEPROM's boot +device. +.IP 3) +There is no SCSI tape driver. +You must have another system for tape reading and backups. +.IP 4) +Although the \*(4B SPARC kernel will handle existing SunOS shared +libraries, it does not use or create them itself, and therefore +requires much more disk space than SunOS does. +.IP 5) +It is currently difficult (though not completely impossible) to +run \*(4B diskless. These instructions assume you will have a local +boot, swap, and root filesystem. +.IP 6) +When using a serial port rather than a graphics display as the console, +only port +.Pn ttya +can be used. +Attempts to use port +.Pn ttyb +will fail when the kernel tries +to print the boot up messages to the console. +.Sh 3 "The procedure" +.PP +You must have a spare disk on which to place \*(4B. +The steps involved in bootstrapping this tape are as follows: +.IP 1) +Bring up SunOS (preferably SunOS 4.1.x or Solaris 1.x, although +Solaris 2 may work \(em this is untested). +.IP 2) +Attach auxiliary SCSI disk(s). Format and label using the +SunOS formatting and labeling programs as needed. +Note that the root filesystem currently requires at least 10 MB; 16 MB +or more is recommended. The b partition will be used for swap; +this should be at least 32 MB. +.IP 3) +Use the SunOS +.Xr newfs +to build the root filesystem. You may also +want to build other filesystems at the same time. (By default, the +\*(4B +.Xr newfs +builds a filesystem that SunOS will not handle; if you +plan to switch OSes back and forth you may want to sacrifice the +performance gain from the new filesystem format for compatibility.) +You can build an old-format filesystem on \*(4B by giving the \-O +option to +.Xr newfs (8). +.Xr Fsck (8) +can convert old format filesystems to new format +filesystems, but not vice versa, +so you may want to initially build old format filesystems so that they +can be mounted under SunOS, +and then later convert them to new format filesystems when you are +satisfied that \*(4B is running properly. +In any case, +.B +you must build an old-style root filesystem +.R +so that the SunOS boot program will work. +.IP 4) +Mount the new root, then copy the SunOS +.Pn /boot +into place and use the SunOS ``installboot'' program +to enable disk-based booting. +Note that the filesystem must be mounted when you do the ``installboot'': +.DS +.ft CW +# mount /dev/sd3a /mnt +# cp /boot /mnt/boot +# cd /usr/kvm/mdec +# installboot /mnt/boot bootsd /dev/rsd3a +.DE +The SunOS +.Pn /boot +will load \*(4B kernels; there is no SPARCstation +bootstrap code on the distribution. Note that the SunOS +.Pn /boot +does not handle the new \*(4B filesystem format. +.IP 5) +Restore the contents of the \*(4B root filesystem. +.DS +.ft CW +# cd /mnt +# rrestore xf tapehost:/dev/nrst0 +.DE +.IP 6) +Boot the supplied kernel: +.DS +.ft CW +# halt +ok boot sd(0,3)kernel -s [for old proms] OR +ok boot disk3 -s [for new proms] +\&... [\*(4B boot messages] +.DE +.LP +To install the remaining filesystems, use the procedure described +starting in section 2.5. +In these instructions, +.Pn /usr +should be loaded into the ``e'' partition and +.Pn /var +in the ``f'' partition. +.LP +After completing the filesystem installation you may want +to set up \*(4B to reboot automatically: +.DS +.ft CW +# halt +ok setenv boot-from sd(0,3)kernel [for old proms] OR +ok setenv boot-device disk3 [for new proms] +.DE +If you build backwards-compatible filesystems, either with the SunOS +newfs or with the \*(4B ``\-O'' option, you can mount these under +SunOS. The SunOS fsck will, however, always think that these filesystems +are corrupted, as there are several new (previously unused) +superblock fields that are updated in \*(4B. Running ``fsck \-b32'' +and letting it ``fix'' the superblock will take care of this. +.sp 0.5 +If you wish to run SunOS binaries that use SunOS shared libraries, you +simply need to copy all the dynamic linker files from an existing +SunOS system: +.DS +.ft CW +# rcp sunos-host:/etc/ld.so.cache /etc/ +# rcp sunos-host:'/usr/lib/*.so*' /usr/lib/ +.DE +The SunOS compiler and linker should be able to produce SunOS binaries +under \*(4B, but this has not been tested. If you plan to try it you +will need the appropriate .sa files as well. +.Sh 2 "Booting the DECstation" +.Sh 3 "Supported hardware" +.LP +The hardware supported by \*(4B for the DECstation is as follows: +.TS +center box; +lw(1i) lw(4i). +CPU's T{ +R2000 based (3100) and +R3000 based (5000/200, 5000/20, 5000/25, 5000/1xx). +T} +_ +DISK's T{ +SCSI-I (tested RZ23, RZ55, RZ57, Maxtor 8760S). +T} +_ +TAPE's T{ +SCSI-I (tested DEC TK50, Archive DAT, Emulex MT02). +T} +_ +RS232 T{ +Internal DEC dc7085 and AMD 8530 based interfaces. +T} +_ +NETWORK T{ +TURBOchannel PMAD-AA and internal LANCE based interfaces. +T} +_ +GRAPHICS T{ +Terminal emulation and raw frame buffer support for +3100 (color & monochrome), +TURBOchannel PMAG-AA, PMAG-BA, PMAG-DV. +T} +_ +INPUT T{ +Standard DEC keyboard (LK201) and mouse. +T} +_ +MISC T{ +Battery-backed real time clock, +internal and TURBOchannel PMAZ-AA SCSI interfaces. +T} +.TE +.LP +Major items that are not supported include the 5000/240 +(there is code but not compiled in or tested), +R4000 based machines, FDDI and audio interfaces. +Diskless machines are not supported but booting kernels and bootstrapping +over the network is supported on the 5000 series. +.Sh 3 "The procedure" +.PP +The first file on the distribution tape is a tar file that contains +four files. +The first step requires a running UNIX (or ULTRIX) system that can +be used to extract the tar archive from the first file on the tape. +The command: +.DS +.ft CW +tar xf /dev/rmt0 +.DE +will extract the following four files: +.DS +A) root.image: \fIdd\fP image of the root filesystem +B) kernel.tape: \fIdd\fP image for creating boot tapes +C) kernel.net: file for booting over the network +D) root.dump: \fIdump\fP image of the root filesystem +.DE +There are three basic ways a system can be bootstrapped corresponding to the +first three files. +You may want to read the section on bootstrapping the HP300 +since many of the steps are similar. +A spare, formatted SCSI disk is also useful. +.Sh 4 "Procedure A: copy root filesystem to disk" +.PP +This procedure is similar to the HP300. +If you have an extra disk, the easiest approach is to use \fIdd\fP\|(1) +under ULTRIX to copy the root filesystem image to the beginning +of the spare disk. +The root filesystem image includes a disklabel and bootblock along with the +root filesystem. +An example command to copy the image to the beginning of a disk is: +.DS +.ft CW +dd if=root.image of=/dev/rz1c bs=\*(Bzb +.DE +The actual special file syntax will vary depending on unit numbers and +the version of ULTRIX that is running. +This system is now ready to boot. You can boot the kernel with one of the +following PROM commands. If you are booting on a 3100, the disk must be SCSI +id zero because of a bug. +.DS +.ft CW +DEC 3100: boot \-f rz(0,0,0)kernel +DEC 5000: boot 5/rz0/kernel +.DE +You can then proceed to section 2.5 +to create reasonable disk partitions for your machine +and then install the rest of the system. +.Sh 4 "Procedure B: bootstrap from tape" +.PP +If you have only a single machine with a single disk, +you need to use the more difficult approach of booting a +kernel and mini-root from tape or the network, and using it to restore +the root filesystem. +.PP +First, you will need to create a boot tape. This can be done using +\fIdd\fP as in the following example. +.DS +.ft CW +dd if=kernel.tape of=/dev/nrmt0 bs=1b +dd if=root.dump of=/dev/nrmt0 bs=\*(Bzb +.DE +The actual special file syntax for the tape drive will vary depending on +unit numbers, tape device and the version of ULTRIX that is running. +.PP +The first file on the boot tape contains a boot header, kernel, and +mini-root filesystem that the PROM can copy into memory. +Installing from tape has only been tested +on a 3100 and a 5000/200 using a TK50 tape drive. Here are two example +PROM commands to boot from tape. +.DS +.ft CW +DEC 3100: boot \-f tz(0,5,0) m # 5 is the SCSI id of the TK50 +DEC 5000: boot 5/tz6 m # 6 is the SCSI id of the TK50 +.DE +The `m' argument tells the kernel to look for a root filesystem in memory. +Next you should proceed to section 2.4.3 to build a disk-based root filesystem. +.Sh 4 "Procedure C: bootstrap over the network" +.PP +You will need a host machine that is running the \fIbootp\fP server +with the +.Pn kernel.net +file installed in the default directory defined by the +configuration file for +.Xr bootp . +Here are two example PROM commands to boot across the net: +.DS +.ft CW +DEC 3100: boot \-f tftp()kernel.net m +DEC 5000: boot 6/tftp/kernel.net m +.DE +This command should load the kernel and mini-root into memory and +run the same as the tape install (procedure B). +The rest of the steps are the same except +you will need to start the network +(if you are unsure how to fill in the <name> fields below, +see sections 4.4 and 5). +Execute the following to start the networking: +.DS +.ft CW +# mount \-uw / +# echo 127.0.0.1 localhost >> /etc/hosts +# echo <your.host.inet.number> myname.my.domain myname >> /etc/hosts +# echo <friend.host.inet.number> myfriend.my.domain myfriend >> /etc/hosts +# ifconfig le0 inet myname +.DE +Next you should proceed to section 2.4.3 to build a disk-based root filesystem. +.Sh 3 "Label disk and create the root filesystem" +.LP +There are five steps to create a disk-based root filesystem. +.IP 1) +Label the disk. +.DS +.ft CW +# disklabel -W /dev/rrz?c # This enables writing the label +# disklabel -w -r -B /dev/rrz?c $DISKTYPE +# newfs /dev/rrz?a +\&... +# fsck /dev/rrz?a +\&... +.DE +Supported disk types are listed in +.Pn /etc/disktab . +.IP 2) +Restore the root filesystem. +.DS +.ft CW +# mount \-uw / +# mount /dev/rz?a /a +# cd /a +.DE +.ti +0.4i +If you are restoring locally (procedure B), run: +.DS +.ft CW +# mt \-f /dev/nrmt0 rew +# restore \-xsf 2 /dev/rmt0 +.DE +.ti +0.4i +If you are restoring across the net (procedure c), run: +.DS +.ft CW +# rrestore xf myfriend:/path/to/root.dump +.DE +.ti +0.4i +When the restore finishes, clean up with: +.DS +.ft CW +# cd / +# sync +# umount /a +# fsck /dev/rz?a +.DE +.IP 3) +Reset the system and initialize the PROM monitor to boot automatically. +.DS +.ft CW +DEC 3100: setenv bootpath boot \-f rz(0,?,0)kernel +DEC 5000: setenv bootpath 5/rz?/kernel -a +.DE +.IP 4) +After booting UNIX, you will need to create +.Pn /dev/mouse +to run X Window System as in the following example. +.DS +.ft CW +rm /dev/mouse +ln /dev/xx /dev/mouse +.DE +The 'xx' should be one of the following: +.DS +pm0 raw interface to PMAX graphics devices +cfb0 raw interface to TURBOchannel PMAG-BA color frame buffer +xcfb0 raw interface to maxine graphics devices +mfb0 raw interface to mono graphics devices +.DE +You can then proceed to section 2.5 to install the rest of the system. +Note that where the disk name ``sd'' is used throughout section 2.5, +you should substitute the name ``rz''. +.Sh 2 "Disk configuration" +.PP +All architectures now have a root filesystem up and running and +proceed from this point to layout filesystems to make use +of the available space and to balance disk load for better system +performance. +.Sh 3 "Disk naming and divisions" +.PP +Each physical disk drive can be divided into up to 8 partitions; +UNIX typically uses only 3 or 4 partitions. +For instance, the first partition, \*(Dk0a, +is used for a root filesystem, a backup thereof, +or a small filesystem like, +.Pn /var/tmp ; +the second partition, \*(Dk0b, +is used for paging and swapping; and +a third partition, typically \*(Dk0e, +holds a user filesystem. +.PP +The space available on a disk varies per device. +Each disk typically has a paging area of 30 to 100 megabytes +and a root filesystem of about 17 megabytes. +.\" XXX check +The distributed system binaries occupy about 150 (180 with X11R5) megabytes +.\" XXX check +while the major sources occupy another 250 (340 with X11R5) megabytes. +The +.Pn /var +filesystem as delivered on the tape is only 2Mb, +however it should have at least 50Mb allocated to it just for +normal system activity. +Usually it is allocated the last partition on the disk +so that it can provide as much space as possible to the +.Pn /var/users +filesystem. +See section 2.5.4 for further details on disk layouts. +.PP +Be aware that the disks have their sizes +measured in disk sectors (usually 512 bytes), while the UNIX filesystem +blocks are variable sized. +If +.Sm BLOCKSIZE=1k +is set in the user's environment, all user programs report +disk space in kilobytes, otherwise, +disk sizes are always reported in units of 512-byte sectors\**. +.FS +You can thank System V intransigence and POSIX duplicity for +requiring that 512-byte blocks be the units that programs report. +.FE +The +.Pn /etc/disktab +file used in labelling disks and making filesystems +specifies disk partition sizes in sectors. +.Sh 3 "Layout considerations" +.PP +There are several considerations in deciding how +to adjust the arrangement of things on your disks. +The most important is making sure that there is adequate space +for what is required; secondarily, throughput should be maximized. +Paging space is an important parameter. +The system, as distributed, sizes the configured +paging areas each time the system is booted. Further, +multiple paging areas of different sizes may be interleaved. +.PP +Many common system programs (C, the editor, the assembler etc.) +create intermediate files in the +.Pn /tmp +directory, so the filesystem where this is stored also should be made +large enough to accommodate most high-water marks. +Typically, +.Pn /tmp +is constructed from a memory-based filesystem (see +.Xr mount_mfs (8)). +Programs that want their temporary files to persist +across system reboots (such as editors) should use +.Pn /var/tmp . +If you plan to use a disk-based +.Pn /tmp +filesystem to avoid loss across system reboots, it makes +sense to mount this in a ``root'' (i.e. first partition) +filesystem on another disk. +All the programs that create files in +.Pn /tmp +take care to delete them, but are not immune to rare events +and can leave dregs. +The directory should be examined every so often and the old +files deleted. +.PP +The efficiency with which UNIX is able to use the CPU +is often strongly affected by the configuration of disk controllers; +it is critical for good performance to balance disk load. +There are at least five components of the disk load that you can +divide between the available disks: +.IP 1) +The root filesystem. +.IP 2) +The +.Pn /var +and +.Pn /var/tmp +filesystems. +.IP 3) +The +.Pn /usr +filesystem. +.IP 4) +The user filesystems. +.IP 5) +The paging activity. +.LP +The following possibilities are ones we have used at times +when we had 2, 3 and 4 disks: +.TS +center doublebox; +l | c s s +l | lw(5) | lw(5) | lw(5). + disks +what 2 3 4 +_ +root 0 0 0 +var 1 2 3 +usr 1 1 1 +paging 0+1 0+2 0+2+3 +users 0 0+2 0+2 +archive x x 3 +.TE +.PP +The most important things to consider are to +even out the disk load as much as possible, and to do this by +decoupling filesystems (on separate arms) between which heavy copying occurs. +Note that a long term average balanced load is not important; it is +much more important to have an instantaneously balanced +load when the system is busy. +.PP +Intelligent experimentation with a few filesystem arrangements can +pay off in much improved performance. It is particularly easy to +move the root, the +.Pn /var +and +.Pn /var/tmp +filesystems and the paging areas. Place the +user files and the +.Pn /usr +directory as space needs dictate and experiment +with the other, more easily moved filesystems. +.Sh 3 "Filesystem parameters" +.PP +Each filesystem is parameterized according to its block size, +fragment size, and the disk geometry characteristics of the +medium on which it resides. Inaccurate specification of the disk +characteristics or haphazard choice of the filesystem parameters +can result in substantial throughput degradation or significant +waste of disk space. As distributed, +filesystems are configured according to the following table. +.DS +.TS +center; +l l l. +Filesystem Block size Fragment size +_ +root 8 kbytes 1 kbytes +usr 8 kbytes 1 kbytes +users 4 kbytes 512 bytes +.TE +.DE +.PP +The root filesystem block size is +made large to optimize bandwidth to the associated disk. +The large block size is important as many of the most +heavily used programs are demand paged out of the +.Pn /bin +directory. +The fragment size of 1 kbyte is a ``nominal'' value to use +with a filesystem. With a 1 kbyte fragment size +disk space utilization is about the same +as with the earlier versions of the filesystem. +.PP +The filesystems for users have a 4 kbyte block +size with 512 byte fragment size. These parameters +have been selected based on observations of the +performance of our user filesystems. The 4 kbyte +block size provides adequate bandwidth while the +512 byte fragment size provides acceptable space compaction +and disk fragmentation. +.PP +Other parameters may be chosen in constructing filesystems, +but the factors involved in choosing a block +size and fragment size are many and interact in complex +ways. Larger block sizes result in better +throughput to large files in the filesystem as +larger I/O requests will then be done by the +system. However, +consideration must be given to the average file sizes +found in the filesystem and the performance of the +internal system buffer cache. The system +currently provides space in the inode for +12 direct block pointers, 1 single indirect block +pointer, 1 double indirect block pointer, +and 1 triple indirect block pointer. +If a file uses only direct blocks, access time to +it will be optimized by maximizing the block size. +If a file spills over into an indirect block, +increasing the block size of the filesystem may +decrease the amount of space used +by eliminating the need to allocate an indirect block. +However, if the block size is increased and an indirect +block is still required, then more disk space will be +used by the file because indirect blocks are allocated +according to the block size of the filesystem. +.PP +In selecting a fragment size for a filesystem, at least +two considerations should be given. The major performance +tradeoffs observed are between an 8 kbyte block filesystem +and a 4 kbyte block filesystem. Because of implementation +constraints, the block size versus fragment size ratio can not +be greater than 8. This means that an 8 kbyte filesystem +will always have a fragment size of at least 1 kbytes. If +a filesystem is created with a 4 kbyte block size and a +1 kbyte fragment size, then upgraded to an 8 kbyte block size +and 1 kbyte fragment size, identical space compaction will be +observed. However, if a filesystem has a 4 kbyte block size +and 512 byte fragment size, converting it to an 8K/1K +filesystem will result in 4-8% more space being +used. This implies that 4 kbyte block filesystems that +might be upgraded to 8 kbyte blocks for higher performance should +use fragment sizes of at least 1 kbytes to minimize the amount +of work required in conversion. +.PP +A second, more important, consideration when selecting the +fragment size for a filesystem is the level of fragmentation +on the disk. With an 8:1 fragment to block ratio, storage fragmentation +occurs much sooner, particularly with a busy filesystem running +near full capacity. By comparison, the level of fragmentation in a +4:1 fragment to block ratio filesystem is one tenth as severe. This +means that on filesystems where many files are created and +deleted, the 512 byte fragment size is more likely to result in apparent +space exhaustion because of fragmentation. That is, when the filesystem +is nearly full, file expansion that requires locating a +contiguous area of disk space is more likely to fail on a 512 +byte filesystem than on a 1 kbyte filesystem. To minimize +fragmentation problems of this sort, a parameter in the super +block specifies a minimum acceptable free space threshold. When +normal users (i.e. anyone but the super-user) attempt to allocate +disk space and the free space threshold is exceeded, the user is +returned an error as if the filesystem were really full. This +parameter is nominally set to 5%; it may be changed by supplying +a parameter to +.Xr newfs (8), +or by updating the super block of an existing filesystem using +.Xr tunefs (8). +.PP +Finally, a third, less common consideration is the attributes of +the disk itself. The fragment size should not be smaller than the +physical sector size of the disk. As an example, the HP magneto-optical +disks have 1024 byte physical sectors. Using a 512 byte fragment size +on such disks will work but is extremely inefficient. +.PP +Note that the above discussion considers block sizes of up to only 8k. +As of the 4.4 release, the maximum block size has been increased to 64k. +This allows an entirely new set of block/fragment combinations for which +there is little experience to date. +In general though, unless a filesystem is to be used +for a special purpose application (for example, storing +image processing data), we recommend using the +values supplied above. +Remember that the current +implementation limits the block size to at most 64 kbytes +and the ratio of block size versus fragment size must be 1, 2, 4, or 8. +.PP +The disk geometry information used by the filesystem +affects the block layout policies employed. The file +.Pn /etc/disktab , +as supplied, contains the data for most +all drives supported by the system. Before constructing +a filesystem with +.Xr newfs (8) +you should label the disk (if it has not yet been labeled, +and the driver supports labels). +If labels cannot be used, you must instead +specify the type of disk on which the filesystem resides; +.Xr newfs +then reads +.Pn /etc/disktab +instead of the pack label. +This file also contains the default +filesystem partition +sizes, and default block and fragment sizes. To +override any of the default values you can modify the file, +edit the disk label, +or use an option to +.Xr newfs . +.Sh 3 "Implementing a layout" +.PP +To put a chosen disk layout into effect, you should use the +.Xr newfs (8) +command to create each new filesystem. +Each filesystem must also be added to the file +.Pn /etc/fstab +so that it will be checked and mounted when the system is bootstrapped. +.PP +First we will consider a system with a single disk. +There is little real choice on how to do the layout; +the root filesystem goes in the ``a'' partition, +.Pn /usr +goes in the ``e'' partition, and +.Pn /var +fills out the remainder of the disk in the ``f'' partition. +This is the organization used if you loaded the disk-image root filesystem. +With the addition of a memory-based +.Pn /tmp +filesystem, its fstab entry would be as follows: +.TS +center; +lfC lfC l l n n. +/dev/\*(Dk0a / ufs rw 1 1 +/dev/\*(Dk0b none swap sw 0 0 +/dev/\*(Dk0b /tmp mfs rw,-s=14000,-b=8192,-f=1024,-T=sd660 0 0 +/dev/\*(Dk0e /usr ufs ro 1 2 +/dev/\*(Dk0f /var ufs rw 1 2 +.TE +.PP +If we had a second disk, we would split the load between the drives. +On the second disk, we place the +.Pn /usr +and +.Pn /var +filesystems in their usual \*(Dk1e and \*(Dk1f +partitions respectively. +The \*(Dk1b partition would be used as a second paging area, +and the \*(Dk1a partition left as a spare root filesystem +(alternatively \*(Dk1a could be used for +.Pn /var/tmp ). +The first disk still holds the +the root filesystem in \*(Dk0a, and the primary swap area in \*(Dk0b. +The \*(Dk0e partition is used to hold home directories in +.Pn /var/users . +The \*(Dk0f partition can be used for +.Pn /usr/src +or alternately the \*(Dk0e partition can be extended to cover +the rest of the disk with +.Xr disklabel (8). +As before, the +.Pn /tmp +directory is a memory-based filesystem. +Note that to interleave the paging between the two disks +you must build a system configuration that specifies: +.DS +config kernel root on \*(Dk0 swap on \*(Dk0 and \*(Dk1 +.DE +The +.Pn /etc/fstab +file would then contain +.TS +center; +lfC lfC l l n n. +/dev/\*(Dk0a / ufs rw 1 1 +/dev/\*(Dk0b none swap sw 0 0 +/dev/\*(Dk1b none swap sw 0 0 +/dev/\*(Dk0b /tmp mfs rw,-s=14000,-b=8192,-f=1024,-T=sd660 0 0 +/dev/\*(Dk1e /usr ufs ro 1 2 +/dev/\*(Dk0f /usr/src ufs rw 1 2 +/dev/\*(Dk1f /var ufs rw 1 2 +/dev/\*(Dk0e /var/users ufs rw 1 2 +.TE +.PP +To make the +.Pn /var +filesystem we would do: +.DS +\fB#\fP \fIdisklabel -wr \*(Dk1 "disk type" "disk name"\fP +\fB#\fP \fInewfs \*(Dk1f\fP +(information about filesystem prints out) +\fB#\fP \fImkdir /var\fP +\fB#\fP \fImount /dev/\*(Dk1f /var\fP +.DE +.Sh 2 "Installing the rest of the system" +.PP +At this point you should have your disks partitioned. +The next step is to extract the rest of the data from the tape. +At a minimum you need to set up the +.Pn /var +and +.Pn /usr +filesystems. +You may also want to extract some or all the program sources. +Since not all architectures support tape drives or don't support the +correct ones, you may need to extract the files indirectly using +.Xr rsh (1). +For example, for a directly connected tape drive you might do: +.DS +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP +\fB#\fP \fItar xbpf \*(Bz /dev/nr\*(Mt0\fP +.DE +The equivalent indirect procedure (where the tape drive is on machine ``foo'') +is: +.DS +\fB#\fP \fIrsh foo mt -f /dev/nr\*(Mt0 fsf\fP +\fB#\fP \fIrsh foo dd if=/dev/nr\*(Mt0 bs=\*(Bzb | tar xbpf \*(Bz -\fP +.DE +Obviously, the target machine must be connected to the local network +for this to work. +To do this: +.DS +\fB#\fP \fIecho 127.0.0.1 localhost >> /etc/hosts\fP +\fB#\fP \fIecho \fPyour.host.inet.number myname.my.domain myname\fI >> /etc/hosts\fP +\fB#\fP \fIecho \fPfriend.host.inet.number myfriend.my.domain myfriend\fI >> /etc/hosts\fP +\fB#\fP \fIifconfig le0 inet \fPmyname +.DE +where the ``host.inet.number'' fields are the IP addresses for your host and +the host with the tape drive +and the ``my.domain'' fields are the names of your machine and the tape-hosting +machine. +See sections 4.4 and 5 for more information on setting up the network. +.PP +Assuming a directly connected tape drive, here is how to extract and +install +.Pn /var +and +.Pn /usr : +.br +.ne 5 +.TS +lw(2i) l. +\fB#\fP \fImount \-uw /dev/\*(Dk#a /\fP (read-write mount root filesystem) +\fB#\fP \fIdate yymmddhhmm\fP (set date, see \fIdate\fP\|(1)) +\&.... +\fB#\fP \fIpasswd -l root\fP (set password for super-user) +\fBNew password:\fP (password will not echo) +\fBRetype new password:\fP +\fB#\fP \fIpasswd -l toor\fP (set password for super-user) +\fBNew password:\fP (password will not echo) +\fBRetype new password:\fP +\fB#\fP \fIhostname mysitename\fP (set your hostname) +\fB#\fP \fInewfs r\*(Dk#p\fP (create empty user filesystem) +(\fI\*(Dk\fP is the disk type, \fI#\fP is the unit number, +\fIp\fP is the partition; this takes a few minutes) +\fB#\fP \fImount /dev/\*(Dk#p /var\fP (mount the var filesystem) +\fB#\fP \fIcd /var\fP (make /var the current directory) +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +\fB#\fP \fItar xbpf \*(Bz /dev/nr\*(Mt0\fP (extract all of var) +(this takes a few minutes) +\fB#\fP \fInewfs r\*(Dk#p\fP (create empty user filesystem) +(as before \fI\*(Dk\fP is the disk type, \fI#\fP is the unit number, +\fIp\fP is the partition) +\fB#\fP \fImount /dev/\*(Dk#p /mnt\fP (mount the new /usr in temporary location) +\fB#\fP \fIcd /mnt\fP (make /mnt the current directory) +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +\fB#\fP \fItar xbpf \*(Bz /dev/nr\*(Mt0\fP (extract all of usr except usr/src) +(this takes about 15-20 minutes) +\fB#\fP \fIcd /\fP (make / the current directory) +\fB#\fP \fIumount /mnt\fP (unmount from temporary mount point) +\fB#\fP \fIrm -r /usr/*\fP (remove excess bootstrap binaries) +\fB#\fP \fImount /dev/\*(Dk#p /usr\fP (remount /usr) +.TE +If no disk label has been installed on the disk, the +.Xr newfs +command will require a third argument to specify the disk type, +using one of the names in +.Pn /etc/disktab . +If the tape had been rewound or positioned incorrectly before the +.Xr tar , +to extract +.Pn /var +it may be repositioned by the following commands. +.DS +\fB#\fP \fImt -f /dev/nr\*(Mt0 rew\fP +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf 1\fP +.DE +The data on the second and third tape files has now been extracted. +If you are using 6250bpi tapes, the first reel of the +distribution is no longer needed; you should now mount the second +reel instead. The installation procedure continues from this +point on the 8mm tape. +The next step is to extract the sources. +As previously noted, +.Pn /usr/src +.\" XXX Check +requires about 250-340Mb of space. +Ideally sources should be in a separate filesystem; +if you plan to put them into your +.Pn /usr +filesystem, it will need at least 500Mb of space. +Assuming that you will be using a separate filesystem on \*(Dk0f for +.Pn /usr/src , +you will start by creating and mounting it: +.DS +\fB#\fP \fInewfs \*(Dk0f\fP +(information about filesystem prints out) +\fB#\fP \fImkdir /usr/src\fP +\fB#\fP \fImount /dev/\*(Dk0f /usr/src\fP +.DE +.LP +First you will extract the kernel source: +.DS +.TS +lw(2i) l. +\fB#\fP \fIcd /usr/src\fP +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +(this should only be done on Exabyte distributions) +\fB#\fP \fItar xpbf \*(Bz /dev/nr\*(Mt0\fP (extract the kernel sources) +(this takes about 15-30 minutes) +.TE +.DE +.LP +The next tar file contains the sources for the utilities. +It is extracted as follows: +.DS +.TS +lw(2i) l. +\fB#\fP \fIcd /usr/src\fP +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +\fB#\fP \fItar xpbf \*(Bz /dev/rmt12\fP (extract the utility source) +(this takes about 30-60 minutes) +.TE +.DE +.PP +If you are using 6250bpi tapes, the second reel of the +distribution is no longer needed; you should now mount the third +reel instead. The installation procedure continues from this +point on the 8mm tape. +.PP +The next tar file contains the sources for the contributed software. +It is extracted as follows: +.DS +.TS +lw(2i) l. +\fB#\fP \fIcd /usr/src\fP +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +(this should only be done on Exabyte distributions) +\fB#\fP \fItar xpbf \*(Bz /dev/rmt12\fP (extract the contributed software source) +(this takes about 30-60 minutes) +.TE +.DE +.PP +If you received a distribution on 8mm Exabyte tape, +there is one additional tape file on the distribution tape +that has not been installed to this point; it contains the +sources for X11R5 in +.Xr tar (1) +format. As distributed, X11R5 should be placed in +.Pn /usr/src/X11R5 . +.DS +.TS +lw(2i) l. +\fB#\fP \fIcd /usr/src\fP +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +\fB#\fP \fItar xpbf \*(Bz /dev/nr\*(Mt0\fP (extract the X11R5 source) +(this takes about 30-60 minutes) +.TE +.DE +Many of the X11 utilities search using the path +.Pn /usr/X11 , +so be sure that you have a symbolic link that points at +the location of your X11 binaries (here, X11R5). +.PP +Having now completed the extraction of the sources, +you may want to verify that your +.Pn /usr/src +filesystem is consistent. +To do so, you must unmount it, and run +.Xr fsck (8); +assuming that you used \*(Dk0f you would proceed as follows: +.DS +.TS +lw(2i) l. +\fB#\fP \fIcd /\fP (change directory, back to the root) +\fB#\fP \fIumount /usr/src\fP (unmount /usr/src) +\fB#\fP \fIfsck /dev/r\*(Dk0f\fP +.TE +.DE +The output from +.Xr fsck +should look something like: +.DS +.B +** /dev/r\*(Dk0f +** Last Mounted on /usr/src +** Phase 1 - Check Blocks and Sizes +** Phase 2 - Check Pathnames +** Phase 3 - Check Connectivity +** Phase 4 - Check Reference Counts +** Phase 5 - Check Cyl groups +23000 files, 261000 used, 39000 free (2200 frags, 4600 blocks) +.R +.DE +.PP +If there are inconsistencies in the filesystem, you may be prompted +to apply corrective action; see the +.Xr fsck (8) +or \fIFsck \(en The UNIX File System Check Program\fP (SMM:3) for more details. +.PP +To use the +.Pn /usr/src +filesystem, you should now remount it with: +.DS +\fB#\fP \fImount /dev/\*(Dk0f /usr/src\fP +.DE +or if you have made an entry for it in +.Pn /etc/fstab +you can remount it with: +.DS +\fB#\fP \fImount /usr/src\fP +.DE +.Sh 2 "Additional conversion information" +.PP +After setting up the new \*(4B filesystems, you may restore the user +files that were saved on tape before beginning the conversion. +Note that the \*(4B +.Xr restore +program does its work on a mounted filesystem using normal system operations. +This means that filesystem dumps may be restored even +if the characteristics of the filesystem changed. +To restore a dump tape for, say, the +.Pn /a +filesystem something like the following would be used: +.DS +\fB#\fP \fImkdir /a\fP +\fB#\fP \fInewfs \*(Dk#p\fI +\fB#\fP \fImount /dev/\*(Dk#p /a\fP +\fB#\fP \fIcd /a\fP +\fB#\fP \fIrestore x\fP +.DE +.PP +If +.Xr tar +images were written instead of doing a dump, you should +be sure to use its `\-p' option when reading the files back. No matter +how you restore a filesystem, be sure to unmount it and check its +integrity with +.Xr fsck (8) +when the job is complete. diff --git a/share/doc/smm/01.setup/3.t b/share/doc/smm/01.setup/3.t new file mode 100644 index 000000000000..e9b2e0ccd82e --- /dev/null +++ b/share/doc/smm/01.setup/3.t @@ -0,0 +1,1987 @@ +.\" Copyright (c) 1980, 1986, 1988, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds lq `` +.ds rq '' +.ds RH "Upgrading a \*(Ps System +.ds CF \*(Dy +.Sh 1 "Upgrading a \*(Ps system" +.PP +This section describes the procedure for upgrading a \*(Ps +system to \*(4B. This procedure may vary according to the version of +the system running before conversion. +If you are converting from a +System V system, some of this section will still apply (in particular, +the filesystem conversion). However, many of the system configuration +files are different, and the executable file formats are completely +incompatible. +.PP +In particular be wary when using this information to upgrade +a \*(Ps HP300 system. +There are at least four different versions of ``\*(Ps'' out there: +.IP 1) +HPBSD 1.x from Utah. +.br +This was the original version of \*(Ps for HP300s from which the +other variants (and \*(4B) are derived. +It is largely a \*(Ps system with Sun's NFS 3.0 filesystem code and +some \*(Ps-Tahoe features (e.g. networking code). +Since the filesystem code is 4.2/4.3 vintage and the filesystem +hierarchy is largely \*(Ps, most of this section should apply. +.IP 2) +MORE/bsd from Mt. Xinu. +.br +This is a \*(Ps-Tahoe vintage system with Sun's NFS 4.0 filesystem code +upgraded with Tahoe UFS features. +The instructions for \*(Ps-Tahoe should largely apply. +.IP 3) +\*(Ps-Reno from CSRG. +.br +At least one site bootstrapped HP300 support from the Reno distribution. +The Reno filesystem code was somewhere between \*(Ps and \*(4B: the VFS switch +had been added but many of the UFS features (e.g. ``inline'' symlinks) +were missing. +The filesystem hierarchy reorganization first appeared in this release. +Be extremely careful following these instructions if you are +upgrading from the Reno distribution. +.IP 4) +HPBSD 2.0 from Utah. +.br +As if things were not bad enough already, +this release has the \*(4B filesystem and networking code +as well as some utilities, but still has a \*(Ps hierarchy. +No filesystem conversions are necessary for this upgrade, +but files will still need to be moved around. +.Sh 2 "Installation overview" +.PP +If you are running \*(Ps, upgrading your system +involves replacing your kernel and system utilities. +In general, there are three possible ways to install a new \*(Bs distribution: +(1) boot directly from the distribution tape, use it to load new binaries +onto empty disks, and then merge or restore any existing configuration files +and filesystems; +(2) use an existing \*(Ps or later system to extract the root and +.Pn /usr +filesystems from the distribution tape, +boot from the new system, then merge or restore existing +configuration files and filesystems; or +(3) extract the sources from the distribution tape onto an existing system, +and use that system to cross-compile and install \*(4B. +For this release, the second alternative is strongly advised, +with the third alternative reserved as a last resort. +In general, older binaries will continue to run under \*(4B, +but there are many exceptions that are on the critical path +for getting the system running. +Ideally, the new system binaries (root and +.Pn /usr +filesystems) should be installed on spare disk partitions, +then site-specific files should be merged into them. +Once the new system is up and fully merged, the previous root and +.Pn /usr +filesystems can be reused. +Other existing filesystems can be retained and used, +except that (as usual) the new +.Xr fsck +should be run before they are mounted. +.PP +It is \fBSTRONGLY\fP advised that you make full dumps of each filesystem +before beginning, especially any that you intend to modify in place +during the merge. +It is also desirable to run filesystem checks +of all filesystems to be converted to \*(4B before shutting down. +This is an excellent time to review your disk configuration +for possible tuning of the layout. +Most systems will need to provide a new filesystem for system use +mounted on +.Pn /var +(see below). +However, the +.Pn /tmp +filesystem can be an MFS virtual-memory-resident filesystem, +potentially freeing an existing disk partition. +(Additional swap space may be desirable as a consequence.) +See +.Xr mount_mfs (8). +.PP +The recommended installation procedure includes the following steps. +The order of these steps will probably vary according to local needs. +.IP \(bu +Extract root and +.Pn /usr +filesystems from the distribution tapes. +.IP \(bu +Extract kernel and/or user-level sources from the distribution tape +if space permits. +This can serve as the backup documentation as needed. +.IP \(bu +Configure and boot a kernel for the local system. +This can be delayed if the generic kernel from the distribution +supports enough hardware to proceed. +.IP \(bu +Build a skeletal +.Pn /var +filesystem (see +.Xr mtree (8)). +.IP \(bu +Merge site-dependent configuration files from +.Pn /etc +and +.Pn /usr/lib +into the new +.Pn /etc +directory. +Note that many file formats and contents have changed; see section 3.4 +of this document. +.IP \(bu +Copy or merge files from +.Pn /usr/adm , +.Pn /usr/spool , +.Pn /usr/preserve , +.Pn /usr/lib , +and other locations into +.Pn /var . +.IP \(bu +Merge local macros, dictionaries, etc. into +.Pn /usr/share . +.IP \(bu +Merge and update local software to reflect the system changes. +.IP \(bu +Take off the rest of the morning, you've earned it! +.PP +Section 3.2 lists the files to be saved as part of the conversion process. +Section 3.3 describes the bootstrap process. +Section 3.4 discusses the merger of the saved files back into the new system. +Section 3.5 gives an overview of the major +bug fixes and changes between \*(Ps and \*(4B. +Section 3.6 provides general hints on possible problems to be +aware of when converting from \*(Ps to \*(4B. +.Sh 2 "Files to save" +.PP +The following list enumerates the standard set of files you will want to +save and suggests directories in which site-specific files should be present. +This list will likely be augmented with non-standard files you +have added to your system. +If you do not have enough space to create parallel +filesystems, you should create a +.Xr tar +image of the following files before the new filesystems are created. +The rest of this subsection describes where theses files +have moved and how they have changed. +.TS +lfC c l. +/.cshrc \(dg root csh startup script (moves to \f(CW/root/.cshrc\fP) +/.login \(dg root csh login script (moves to \f(CW/root/.login\fP) +/.profile \(dg root sh startup script (moves to \f(CW/root/.profile\fP) +/.rhosts \(dg for trusted machines and users (moves to \f(CW/root/.rhosts\fP) +/etc/disktab \(dd in case you changed disk partition sizes +/etc/fstab * disk configuration data +/etc/ftpusers \(dg for local additions +/etc/gettytab \(dd getty database +/etc/group * group data base +/etc/hosts \(dg for local host information +/etc/hosts.equiv \(dg for local host equivalence information +/etc/hosts.lpd \(dg printer access file +/etc/inetd.conf * Internet services configuration data +/etc/named* \(dg named configuration files +/etc/netstart \(dg network initialization +/etc/networks \(dg for local network information +/etc/passwd * user data base +/etc/printcap * line printer database +/etc/protocols \(dd in case you added any local protocols +/etc/rc * for any local additions +/etc/rc.local * site specific system startup commands +/etc/remote \(dg auto-dialer configuration +/etc/services \(dd for local additions +/etc/shells \(dd list of valid shells +/etc/syslog.conf * system logger configuration +/etc/securettys * merged into ttys +/etc/ttys * terminal line configuration data +/etc/ttytype * merged into ttys +/etc/termcap \(dd for any local entries that may have been added +/lib \(dd for any locally developed language processors +/usr/dict/* \(dd for local additions to words and papers +/usr/include/* \(dd for local additions +/usr/lib/aliases * mail forwarding data base (moves to \f(CW/etc/aliases\fP) +/usr/lib/crontab * cron daemon data base (moves to \f(CW/etc/crontab\fP) +/usr/lib/crontab.local * local cron daemon data base (moves to \f(CW/etc/crontab.local\fP) +/usr/lib/lib*.a \(dg for local libraries +/usr/lib/mail.rc \(dg system-wide mail(1) initialization (moves to \f(CW/etc/mail.rc\fP) +/usr/lib/sendmail.cf * sendmail configuration (moves to \f(CW/etc/sendmail.cf\fP) +/usr/lib/tmac/* \(dd for locally developed troff/nroff macros (moves to \f(CW/usr/share/tmac/*\fP) +/usr/lib/uucp/* \(dg for local uucp configuration files +/usr/man/manl * for manual pages for locally developed programs (moves to \f(CW/usr/local/man\fP) +/usr/spool/* \(dg for current mail, news, uucp files, etc. (moves to \f(CW/var/spool\fP) +/usr/src/local \(dg for source for locally developed programs +/sys/conf/HOST \(dg configuration file for your machine (moves to \f(CW/sys/<arch>/conf\fP) +/sys/conf/files.HOST \(dg list of special files in your kernel (moves to \f(CW/sys/<arch>/conf\fP) +/*/quotas * filesystem quota files (moves to \f(CW/*/quotas.user\fP) +.TE +.DS +\(dg\|Files that can be used from \*(Ps without change. +\(dd\|Files that need local changes merged into \*(4B files. +*\|Files that require special work to merge and are discussed in section 3.4. +.DE +.Sh 2 "Installing \*(4B" +.PP +The next step is to build a working \*(4B system. +This can be done by following the steps in section 2 of +this document for extracting the root and +.Pn /usr +filesystems from the distribution tape onto unused disk partitions. +For the SPARC, the root filesystem dump on the tape could also be +extracted directly. +For the HP300 and DECstation, the raw disk image can be copied +into an unused partition and this partition can then be dumped +to create an image that can be restored. +The exact procedure chosen will depend on the disk configuration +and the number of suitable disk partitions that may be used. +It is also desirable to run filesystem checks +of all filesystems to be converted to \*(4B before shutting down. +In any case, this is an excellent time to review your disk configuration +for possible tuning of the layout. +Section 2.5 and +.Xr config (8) +are required reading. +.LP +The filesystem in \*(4B has been reorganized in an effort to +meet several goals: +.IP 1) +The root filesystem should be small. +.IP 2) +There should be a per-architecture centrally-shareable read-only +.Pn /usr +filesystem. +.IP 3) +Variable per-machine directories should be concentrated below +a single mount point named +.Pn /var . +.IP 4) +Site-wide machine independent shareable text files should be separated +from architecture specific binary files and should be concentrated below +a single mount point named +.Pn /usr/share . +.LP +These goals are realized with the following general layouts. +The reorganized root filesystem has the following directories: +.TS +lfC l. +/etc (config files) +/bin (user binaries needed when single-user) +/sbin (root binaries needed when single-user) +/local (locally added binaries used only by this machine) +/tmp (mount point for memory based filesystem) +/dev (local devices) +/home (mount point for AMD) +/var (mount point for per-machine variable directories) +/usr (mount point for multiuser binaries and files) +.TE +.LP +The reorganized +.Pn /usr +filesystem has the following directories: +.TS +lfC l. +/usr/bin (user binaries) +/usr/contrib (software contributed to \*(4B) +/usr/games (binaries for games, score files in \f(CW/var\fP) +/usr/include (standard include files) +/usr/lib (lib*.a from old \f(CW/usr/lib\fP) +/usr/libdata (databases from old \f(CW/usr/lib\fP) +/usr/libexec (executables from old \f(CW/usr/lib\fP) +/usr/local (locally added binaries used site-wide) +/usr/old (deprecated binaries) +/usr/sbin (root binaries) +/usr/share (mount point for site-wide shared text) +/usr/src (mount point for sources) +.TE +.LP +The reorganized +.Pn /usr/share +filesystem has the following directories: +.TS +lfC l. +/usr/share/calendar (various useful calendar files) +/usr/share/dict (dictionaries) +/usr/share/doc (\*(4B manual sources) +/usr/share/games (games text files) +/usr/share/groff_font (groff font information) +/usr/share/man (typeset manual pages) +/usr/share/misc (dumping ground for random text files) +/usr/share/mk (templates for \*(4B makefiles) +/usr/share/skel (template user home directory files) +/usr/share/tmac (various groff macro packages) +/usr/share/zoneinfo (information on time zones) +.TE +.LP +The reorganized +.Pn /var +filesystem has the following directories: +.TS +lfC l. +/var/account (accounting files, formerly \f(CW/usr/adm\fP) +/var/at (\fIat\fP\|(1) spooling area) +/var/backups (backups of system files) +/var/crash (crash dumps) +/var/db (system-wide databases, e.g. tags) +/var/games (score files) +/var/log (log files) +/var/mail (users mail) +/var/obj (hierarchy to build \f(CW/usr/src\fP) +/var/preserve (preserve area for vi) +/var/quotas (directory to store quota files) +/var/run (directory to store *.pid files) +/var/rwho (rwho databases) +/var/spool/ftp (home directory for anonymous ftp) +/var/spool/mqueue (sendmail spooling directory) +/var/spool/news (news spooling area) +/var/spool/output (printer spooling area) +/var/spool/uucp (uucp spooling area) +/var/tmp (disk-based temporary directory) +/var/users (root of per-machine user home directories) +.TE +.PP +The \*(4B bootstrap routines pass the identity of the boot device +through to the kernel. +The kernel then uses that device as its root filesystem. +Thus, for example, if you boot from +.Pn /dev/\*(Dk1a , +the kernel will use +.Pn \*(Dk1a +as its root filesystem. If +.Pn /dev/\*(Dk1b +is configured as a swap partition, +it will be used as the initial swap area, +otherwise the normal primary swap area (\c +.Pn /dev/\*(Dk0b ) +will be used. +The \*(4B bootstrap is backward compatible with \*(Ps, +so you can replace your old bootstrap if you use it +to boot your first \*(4B kernel. +However, the \*(Ps bootstrap cannot access \*(4B filesystems, +so if you plan to convert your filesystems to \*(4B, +you must install a new bootstrap \fIbefore\fP doing the conversion. +Note that SPARC users cannot build a \*(4B compatible version +of the bootstrap, so must \fInot\fP convert their root filesystem +to the new \*(4B format. +.PP +Once you have extracted the \*(4B system and booted from it, +you will have to build a kernel customized for your configuration. +If you have any local device drivers, +they will have to be incorporated into the new kernel. +See section 4.1.3 and ``Building 4.3BSD UNIX Systems with Config'' (SMM:2). +.PP +If converting from \*(Ps, your old filesystems should be converted. +If you've modified the partition +sizes from the original \*(Ps ones, and are not already using the +\*(4B disk labels, you will have to modify the default disk partition +tables in the kernel. Make the necessary table changes and boot +your custom kernel \fBBEFORE\fP trying to access any of your old +filesystems! After doing this, if necessary, the remaining filesystems +may be converted in place by running the \*(4B version of +.Xr fsck (8) +on each filesystem and allowing it to make the necessary corrections. +The new version of +.Xr fsck +is more strict about the size of directories than +the version supplied with \*(Ps. +Thus the first time that it is run on a \*(Ps filesystem, +it will produce messages of the form: +.DS +\fBDIRECTORY ...: LENGTH\fP xx \fBNOT MULTIPLE OF 512 (ADJUSTED)\fP +.DE +Length ``xx'' will be the size of the directory; +it will be expanded to the next multiple of 512 bytes. +The new +.Xr fsck +will also set default \fIinterleave\fP and +\fInpsect\fP (number of physical sectors per track) values on older +filesystems, in which these fields were unused spares; this correction +will produce messages of the form: +.DS +\fBIMPOSSIBLE INTERLEAVE=0 IN SUPERBLOCK (SET TO DEFAULT)\fP\** +\fBIMPOSSIBLE NPSECT=0 IN SUPERBLOCK (SET TO DEFAULT)\fP +.DE +.FS +The defaults are to set \fIinterleave\fP to 1 and +\fInpsect\fP to \fInsect\fP. +This is correct on most drives; +it affects only performance (usually virtually unmeasurably). +.FE +Filesystems that have had their interleave and npsect values +set will be diagnosed by the old +.Xr fsck +as having a bad superblock; the old +.Xr fsck +will run only if given an alternate superblock +(\fIfsck \-b32\fP), +in which case it will re-zero these fields. +The \*(4B kernel will internally set these fields to their defaults +if fsck has not done so; again, the \fI\-b32\fP option may be +necessary for running the old +.Xr fsck . +.PP +In addition, \*(4B removes several limits on filesystem sizes +that were present in \*(Ps. +The limited filesystems +continue to work in \*(4B, but should be converted +as soon as it is convenient +by running +.Xr fsck +with the \fI\-c 2\fP option. +The sequence \fIfsck \-p \-c 2\fP will update them all, +fix the interleave and npsect fields, +fix any incorrect directory lengths, +expand maximum uid's and gid's to 32-bits, +place symbolic links less than 60 bytes into their inode, +and fill in directory type fields all at once. +The new filesystem formats are incompatible with older systems. +If you wish to continue using these filesystems with the older +systems you should make only the compatible changes using +\fIfsck \-c 1\fP. +.Sh 2 "Merging your files from \*(Ps into \*(4B" +.PP +When your system is booting reliably and you have the \*(4B root and +.Pn /usr +filesystems fully installed you will be ready +to continue with the next step in the conversion process, +merging your old files into the new system. +.PP +If you saved the files on a +.Xr tar +tape, extract them into a scratch directory, say +.Pn /usr/convert : +.DS +\fB#\fP \fImkdir /usr/convert\fP +\fB#\fP \fIcd /usr/convert\fP +\fB#\fP \fItar xp\fP +.DE +.PP +The data files marked in the previous table with a dagger (\(dg) +may be used without change from the previous system. +Those data files marked with a double dagger (\(dd) have syntax +changes or substantial enhancements. +You should start with the \*(4B version and carefully +integrate any local changes into the new file. +Usually these local changes can be incorporated +without conflict into the new file; +some exceptions are noted below. +The files marked with an asterisk (*) require +particular attention and are discussed below. +.PP +As described in section 3.3, +the most immediately obvious change in \*(4B is the reorganization +of the system filesystems. +Users of certain recent vendor releases have seen this general organization, +although \*(4B takes the reorganization a bit further. +The directories most affected are +.Pn /etc , +that now contains only system configuration files; +.Pn /var , +a new filesystem containing per-system spool and log files; and +.Pn /usr/share, +that contains most of the text files shareable across architectures +such as documentation and macros. +System administration programs formerly in +.Pn /etc +are now found in +.Pn /sbin +and +.Pn /usr/sbin . +Various programs and data files formerly in +.Pn /usr/lib +are now found in +.Pn /usr/libexec +and +.Pn /usr/libdata , +respectively. +Administrative files formerly in +.Pn /usr/adm +are in +.Pn /var/account +and, similarly, log files are now in +.Pn /var/log . +The directory +.Pn /usr/ucb +has been merged into +.Pn /usr/bin , +and the sources for programs in +.Pn /usr/bin +are in +.Pn /usr/src/usr.bin . +Other source directories parallel the destination directories; +.Pn /usr/src/etc +has been greatly expanded, and +.Pn /usr/src/share +is new. +The source for the manual pages, in general, are with the source +code for the applications they document. +Manual pages not closely corresponding to an application program +are found in +.Pn /usr/src/share/man . +The locations of all man pages is listed in +.Pn /usr/src/share/man/man0/man[1-8] . +The manual page +.Xr hier (7) +has been updated and made more detailed; +it is included in the printed documentation. +You should review it to familiarize yourself with the new layout. +.PP +A new utility, +.Xr mtree (8), +is provided to build and check filesystem hierarchies +with the proper contents, owners and permissions. +Scripts are provided in +.Pn /etc/mtree +(and +.Pn /usr/src/etc/mtree ) +for the root, +.Pn /usr +and +.Pn /var +filesystems. +Once a filesystem has been made for +.Pn /var , +.Xr mtree +can be used to create a directory hierarchy there +or you can simply use tar to extract the prototype from +the second file of the distribution tape. +.Sh 3 "Changes in the \f(CW/etc\fP directory" +.PP +The +.Pn /etc +directory now contains nearly all the host-specific configuration +files. +Note that some file formats have changed, +and those configuration files containing pathnames are nearly all affected +by the reorganization. +See the examples provided in +.Pn /etc +(installed from +.Pn /usr/src/etc ) +as a guide. +The following table lists some of the local configuration files +whose locations and/or contents have changed. +.TS +l l l +lfC lfC l. +\*(Ps and Earlier \*(4B Comments +_ _ _ +/etc/fstab /etc/fstab new format; see below +/etc/inetd.conf /etc/inetd.conf pathnames of executables changed +/etc/printcap /etc/printcap pathnames changed +/etc/syslog.conf /etc/syslog.conf pathnames of log files changed +/etc/ttys /etc/ttys pathnames of executables changed +/etc/passwd /etc/master.passwd new format; see below +/usr/lib/sendmail.cf /etc/sendmail.cf changed pathnames +/usr/lib/aliases /etc/aliases may contain changed pathnames +/etc/*.pid /var/run/*.pid + +.T& +l l l +lfC lfC l. +New in \*(Ps-Tahoe \*(4B Comments +_ _ _ +/usr/games/dm.config /etc/dm.conf configuration for games (see \fIdm\fP\|(8)) +/etc/zoneinfo/localtime /etc/localtime timezone configuration +/etc/zoneinfo /usr/share/zoneinfo timezone configuration +.TE +.ne 1.5i +.TS +l l l +lfC lfC l. + New in \*(4B Comments +_ _ _ + /etc/aliases.db database version of the aliases file + /etc/amd-home location database of home directories + /etc/amd-vol location database of exported filesystems + /etc/changelist \f(CW/etc/security\fP files to back up + /etc/csh.cshrc system-wide csh(1) initialization file + /etc/csh.login system-wide csh(1) login file + /etc/csh.logout system-wide csh(1) logout file + /etc/disklabels directory for saving disklabels + /etc/exports NFS list of export permissions + /etc/ftpwelcome message displayed for ftp users; see ftpd(8) + /etc/man.conf lists directories searched by \fIman\fP\|(1) + /etc/mtree directory for local mtree files; see mtree(8) + /etc/netgroup NFS group list used in \f(CW/etc/exports\fP + /etc/pwd.db non-secure hashed user data base file + /etc/spwd.db secure hashed user data base file + /etc/security daily system security checker +.TE +.PP +System security changes require adding several new ``well-known'' groups to +.Pn /etc/group . +The groups that are needed by the system as distributed are: +.TS +l n l. +name number purpose +_ +wheel 0 users allowed superuser privilege +daemon 1 processes that need less than wheel privilege +kmem 2 read access to kernel memory +sys 3 access to kernel sources +tty 4 access to terminals +operator 5 read access to raw disks +bin 7 group for system binaries +news 8 group for news +wsrc 9 write access to sources +games 13 access to games +staff 20 system staff +guest 31 system guests +nobody 39 the least privileged group +utmp 45 access to utmp files +dialer 117 access to remote ports and dialers +.TE +Only users in the ``wheel'' group are permitted to +.Xr su +to ``root''. +Most programs that manage directories in +.Pn /var/spool +now run set-group-id to ``daemon'' so that users cannot +directly access the files in the spool directories. +The special files that access kernel memory, +.Pn /dev/kmem +and +.Pn /dev/mem , +are made readable only by group ``kmem''. +Standard system programs that require this access are +made set-group-id to that group. +The group ``sys'' is intended to control access to kernel sources, +and other sources belong to group ``wsrc.'' +Rather than make user terminals writable by all users, +they are now placed in group ``tty'' and made only group writable. +Programs that should legitimately have access to write on user terminals +such as +.Xr talkd +and +.Xr write +now run set-group-id to ``tty''. +The ``operator'' group controls access to disks. +By default, disks are readable by group ``operator'', +so that programs such as +.Xr dump +can access the filesystem information without being set-user-id to ``root''. +The +.Xr shutdown (8) +program is executable only by group operator +and is setuid to root so that members of group operator may shut down +the system without root access. +.PP +The ownership and modes of some directories have changed. +The +.Xr at +programs now run set-user-id ``root'' instead of ``daemon.'' +Also, the uucp directory no longer needs to be publicly writable, +as +.Xr tip +reverts to privileged status to remove its lock files. +After copying your version of +.Pn /var/spool , +you should do: +.DS +\fB#\fP \fIchown \-R root /var/spool/at\fP +\fB#\fP \fIchown \-R uucp:daemon /var/spool/uucp\fP +\fB#\fP \fIchmod \-R o\-w /var/spool/uucp\fP +.DE +.PP +The format of the cron table, +.Pn /etc/crontab , +has been changed to specify the user-id that should be used to run a process. +The userid ``nobody'' is frequently useful for non-privileged programs. +Local changes are now put in a separate file, +.Pn /etc/crontab.local . +.PP +Some of the commands previously in +.Pn /etc/rc.local +have been moved to +.Pn /etc/rc ; +several new functions are now handled by +.Pn /etc/rc , +.Pn /etc/netstart +and +.Pn /etc/rc.local . +You should look closely at the prototype version of these files +and read the manual pages for the commands contained in it +before trying to merge your local copy. +Note in particular that +.Xr ifconfig +has had many changes, +and that host names are now fully specified as domain-style names +(e.g., vangogh.CS.Berkeley.EDU) for the benefit of the name server. +.PP +Some of the commands previously in +.Pn /etc/daily +have been moved to +.Pn /etc/security , +and several new functions have been added to +.Pn /etc/security +to do nightly security checks on the system. +The script +.Pn /etc/daily +runs +.Pn /etc/security +each night, and mails the output to the super-user. +Some of the checks done by +.Pn /etc/security +are: +.DS +\(bu Syntax errors in the password and group files. +\(bu Duplicate user and group names and id's. +\(bu Dangerous search paths and umask values for the superuser. +\(bu Dangerous values in various initialization files. +\(bu Dangerous .rhosts files. +\(bu Dangerous directory and file ownership or permissions. +\(bu Globally exported filesystems. +\(bu Dangerous owners or permissions for special devices. +.DE +In addition, it reports any changes to setuid and setgid files, special +devices, or the files in +.Pn /etc/changelist +since the last run of +.Pn /etc/security . +Backup copies of the files are saved in +.Pn /var/backups . +Finally, the system binaries are checksummed and their permissions +validated against the +.Xr mtree (8) +specifications in +.Pn /etc/mtree . +.PP +The C-library and system binaries on the distribution tape +are compiled with new versions of +.Xr gethostbyname +and +.Xr gethostbyaddr +that use the name server, +.Xr named (8). +If you have only a small network and are not connected +to a large network, you can use the distributed library routines without +any problems; they use a linear scan of the host table +.Pn /etc/hosts +if the name server is not running. +If you are on the Internet or have a large local network, +it is recommend that you set up +and use the name server. +For instructions on how to set up the necessary configuration files, +refer to ``Name Server Operations Guide for BIND'' (SMM:10). +Several programs rely on the host name returned by +.Xr gethostname +to determine the local domain name. +.PP +If you are using the name server, your +.Xr sendmail +configuration file will need some updates to accommodate it. +See the ``Sendmail Installation and Operation Guide'' (SMM:8) and +the sample +.Xr sendmail +configuration files in +.Pn /usr/src/usr.sbin/sendmail/cf . +The aliases file, +.Pn /etc/aliases +has also been changed to add certain well-known addresses. +.Sh 3 "Shadow password files" +.PP +The password file format adds change and expiration fields +and its location has changed to protect +the encrypted passwords stored there. +The actual password file is now stored in +.Pn /etc/master.passwd . +The hashed dbm password files do not contain encrypted passwords, +but contain the file offset to the entry with the password in +.Pn /etc/master.passwd +(that is readable only by root). +Thus, the +.Fn getpwnam +and +.Fn getpwuid +functions will no longer return an encrypted password string to non-root +callers. +An old-style passwd file is created in +.Pn /etc/passwd +by the +.Xr vipw (8) +and +.Xr pwd_mkdb (8) +programs. +See also +.Xr passwd (5). +.PP +Several new users have also been added to the group of ``well-known'' users in +.Pn /etc/passwd . +The current list is: +.DS +.TS +l c. +name number +_ +root 0 +daemon 1 +operator 2 +bin 3 +games 7 +uucp 66 +nobody 32767 +.TE +.DE +The ``daemon'' user is used for daemon processes that +do not need root privileges. +The ``operator'' user-id is used as an account for dumpers +so that they can log in without having the root password. +By placing them in the ``operator'' group, +they can get read access to the disks. +The ``uucp'' login has existed long before \*(4B, +and is noted here just to provide a common user-id. +The password entry ``nobody'' has been added to specify +the user with least privilege. The ``games'' user is a pseudo-user +that controls access to game programs. +.PP +After installing your updated password file, you must run +.Xr pwd_mkdb (8) +to create the password database. +Note that +.Xr pwd_mkdb (8) +is run whenever +.Xr vipw (8) +is run. +.Sh 3 "The \f(CW/var\fP filesystem" +.PP +The spooling directories saved on tape may be restored in their +eventual resting places without too much concern. Be sure to +use the `\-p' option to +.Xr tar (1) +so that files are recreated with the same file modes. +The following commands provide a guide for copying spool and log files from +an existing system into a new +.Pn /var +filesystem. +At least the following directories should already exist on +.Pn /var : +.Pn output , +.Pn log , +.Pn backups +and +.Pn db . +.LP +.DS +.ft CW +SRC=/oldroot/usr + +cd $SRC; tar cf - msgs preserve | (cd /var && tar xpf -) +.DE +.DS +.ft CW +# copy $SRC/spool to /var +cd $SRC/spool +tar cf - at mail rwho | (cd /var && tar xpf -) +tar cf - ftp mqueue news secretmail uucp uucppublic | \e + (cd /var/spool && tar xpf -) +.DE +.DS +.ft CW +# everything else in spool is probably a printer area +mkdir .save +mv at ftp mail mqueue rwho secretmail uucp uucppublic .save +tar cf - * | (cd /var/spool/output && tar xpf -) +mv .save/* . +rmdir .save +.DE +.DS +.ft CW +cd /var/spool/mqueue +mv syslog.7 /var/log/maillog.7 +mv syslog.6 /var/log/maillog.6 +mv syslog.5 /var/log/maillog.5 +mv syslog.4 /var/log/maillog.4 +mv syslog.3 /var/log/maillog.3 +mv syslog.2 /var/log/maillog.2 +mv syslog.1 /var/log/maillog.1 +mv syslog.0 /var/log/maillog.0 +mv syslog /var/log/maillog +.DE +.DS +.ft CW +# move $SRC/adm to /var +cd $SRC/adm +tar cf - . | (cd /var/account && tar xpf -) +cd /var/account +rm -f msgbuf +mv messages messages.[0-9] ../log +mv wtmp wtmp.[0-9] ../log +mv lastlog ../log +.DE +.Sh 2 "Bug fixes and changes between \*(Ps and \*(4B" +.PP +The major new facilities available in the \*(4B release are +a new virtual memory system, +the addition of ISO/OSI networking support, +a new virtual filesystem interface supporting filesystem stacking, +a freely redistributable implementation of NFS, +a log-structured filesystem, +enhancement of the local filesystems to support +files and filesystems that are up to 2^63 bytes in size, +enhanced security and system management support, +and the conversion to and addition of the IEEE Std1003.1 (``POSIX'') +facilities and many of the IEEE Std1003.2 facilities. +In addition, many new utilities and additions to the C +library are present as well. +The kernel sources have been reorganized to collect all machine-dependent +files for each architecture under one directory, +and most of the machine-independent code is now free of code +conditional on specific machines. +The user structure and process structure have been reorganized +to eliminate the statically-mapped user structure and to make most +of the process resources shareable by multiple processes. +The system and include files have been converted to be compatible +with ANSI C, including function prototypes for most of the exported +functions. +There are numerous other changes throughout the system. +.Sh 3 "Changes to the kernel" +.PP +This release includes several important structural kernel changes. +The kernel uses a new internal system call convention; +the use of global (``u-dot'') variables for parameters and error returns +has been eliminated, +and interrupted system calls no longer abort using non-local goto's (longjmp's). +A new sleep interface separates signal handling from scheduling priority, +returning characteristic errors to abort or restart the current system call. +This sleep call also passes a string describing the process state, +that is used by the ps(1) program. +The old sleep interface can be used only for non-interruptible sleeps. +The sleep interface (\fItsleep\fP) can be used at any priority, +but is only interruptible if the PCATCH flag is set. +When interrupted, \fItsleep\fP returns EINTR or ERESTART. +.PP +Many data structures that were previously statically allocated +are now allocated dynamically. +These structures include mount entries, file entries, +user open file descriptors, the process entries, the vnode table, +the name cache, and the quota structures. +.PP +To protect against indiscriminate reading or writing of kernel +memory, all writing and most reading of kernel data structures +must be done using a new ``sysctl'' interface. +The information to be accessed is described through an extensible +``Management Information Base'' (MIB) style name, +described as a dotted set of components. +A new utility, +.Xr sysctl (8), +retrieves kernel state and allows processes with appropriate +privilege to set kernel state. +.Sh 3 "Security" +.PP +The kernel runs with four different levels of security. +Any superuser process can raise the security level, but only +.Fn init (8) +can lower it. +Security levels are defined as follows: +.IP \-1 +Permanently insecure mode \- always run system in level 0 mode. +.IP " 0" +Insecure mode \- immutable and append-only flags may be turned off. +All devices may be read or written subject to their permissions. +.IP " 1" +Secure mode \- immutable and append-only flags may not be cleared; +disks for mounted filesystems, +.Pn /dev/mem , +and +.Pn /dev/kmem +are read-only. +.IP " 2" +Highly secure mode \- same as secure mode, plus disks are always +read-only whether mounted or not. +This level precludes tampering with filesystems by unmounting them, +but also inhibits running +.Xr newfs (8) +while the system is multi-user. +See +.Xr chflags (1) +and the \-\fBo\fP option to +.Xr ls (1) +for information on setting and displaying the immutable and append-only +flags. +.PP +Normally, the system runs in level 0 mode while single user +and in level 1 mode while multiuser. +If the level 2 mode is desired while running multiuser, +it can be set in the startup script +.Pn /etc/rc +using +.Xr sysctl (1). +If it is desired to run the system in level 0 mode while multiuser, +the administrator must build a kernel with the variable +.Li securelevel +in the kernel source file +.Pn /sys/kern/kern_sysctl.c +initialized to \-1. +.Sh 4 "Virtual memory changes" +.PP +The new virtual memory implementation is derived from the Mach +operating system developed at Carnegie-Mellon, +and was ported to the BSD kernel at the University of Utah. +It is based on the 2.0 release of Mach +(with some bug fixes from the 2.5 and 3.0 releases) +and retains many of its essential features such as +the separation of the machine dependent and independent layers +(the ``pmap'' interface), +efficient memory utilization using copy-on-write +and other lazy-evaluation techniques, +and support for large, sparse address spaces. +It does not include the ``external pager'' interface instead using +a primitive internal pager interface. +The Mach virtual memory system call interface has been replaced with the +``mmap''-based interface described in the ``Berkeley Software +Architecture Manual'' (see UNIX Programmer's Manual, +Supplementary Documents, PSD:5). +The interface is similar to the interfaces shipped +by several commercial vendors such as Sun, USL, and Convex Computer Corp. +The integration of the new virtual memory is functionally complete, +but still has serious performance problems under heavy memory load. +The internal kernel interfaces have not yet been completed +and the memory pool and buffer cache have not been merged. +Some additional caveats: +.IP \(bu +Since the code is based on the 2.0 release of Mach, +bugs and misfeatures of the BSD version should not be considered +short-comings of the current Mach virtual memory system. +.IP \(bu +Because of the disjoint virtual memory (page) and IO (buffer) caches, +it is possible to see inconsistencies if using both the mmap and +read/write interfaces on the same file simultaneously. +.IP \(bu +Swap space is allocated on-demand rather than up front and no +allocation checks are performed so it is possible to over-commit +memory and eventually deadlock. +.IP \(bu +The semantics of the +.Xr vfork (2) +system call are slightly different. +The synchronization between parent and child is preserved, +but the memory sharing aspect is not. +In practice this has been enough for backward compatibility, +but newer code should just use +.Xr fork (2). +.Sh 4 "Networking additions and changes" +.PP +The ISO/OSI Networking consists of a kernel implementation of +transport class 4 (TP-4), +connectionless networking protocol (CLNP), +and 802.3-based link-level support (hardware-compatible with Ethernet\**). +.FS +Ethernet is a trademark of the Xerox Corporation. +.FE +We also include support for ISO Connection-Oriented Network Service, +X.25, TP-0. +The session and presentation layers are provided outside +the kernel using the ISO Development Environment by Marshall Rose, +that is available via anonymous FTP +(but is not included on the distribution tape). +Included in this development environment are file +transfer and management (FTAM), virtual terminals (VT), +a directory services implementation (X.500), +and miscellaneous other utilities. +.PP +Kernel support for the ISO OSI protocols is enabled with the ISO option +in the kernel configuration file. +The +.Xr iso (4) +manual page describes the protocols and addressing; +see also +.Xr clnp (4), +.Xr tp (4) +and +.Xr cltp (4). +The OSI equivalent to ARP is ESIS (End System to Intermediate System Routing +Protocol); running this protocol is mandatory, however one can manually add +translations for machines that do not participate by use of the +.Xr route (8) +command. +Additional information is provided in the manual page describing +.Xr esis (4). +.PP +The command +.Xr route (8) +has a new syntax and several new capabilities: +it can install routes with a specified destination and mask, +and can change route characteristics such as hop count, packet size +and window size. +.PP +Several important enhancements have been added to the TCP/IP +protocols including TCP header prediction and +serial line IP (SLIP) with header compression. +The routing implementation has been completely rewritten +to use a hierarchical routing tree with a mask per route +to support the arbitrary levels of routing found in the ISO protocols. +The routing table also stores and caches route characteristics +to speed the adaptation of the throughput and congestion avoidance +algorithms. +.PP +The format of the +.I sockaddr +structure (the structure used to describe a generic network address with an +address family and family-specific data) +has changed from previous releases, +as have the address family-specific versions of this structure. +The +.I sa_family +family field has been split into a length, +.Pn sa_len , +and a family, +.Pn sa_family . +System calls that pass a +.I sockaddr +structure into the kernel (e.g. +.Fn sendto +and +.Fn connect ) +have a separate parameter that specifies the +.I sockaddr +length, and thus it is not necessary to fill in the +.I sa_len +field for those system calls. +System calls that pass a +.I sockaddr +structure back from the kernel (e.g. +.Fn recvfrom +and +.Fn accept ) +receive a completely filled-in +.I sockaddr +structure, thus the length field is valid. +Because this would not work for old binaries, +the new library uses a different system call number. +Thus, most networking programs compiled under \*(4B are incompatible +with older systems. +.PP +Although this change is mostly source and binary compatible +with old programs, there are three exceptions. +Programs with statically initialized +.I sockaddr +structures +(usually the Internet form, a +.I sockaddr_in ) +are not compatible. +Generally, such programs should be changed to fill in the structure +at run time, as C allows no way to initialize a structure without +assuming the order and number of fields. +Also, programs with use structures to describe a network packet format +that contain embedded +.I sockaddr +structures also require change; a definition of an +.I osockaddr +structure is provided for this purpose. +Finally, programs that use the +.Sm SIOCGIFCONF +ioctl to get a complete list of interface addresses +need to check the +.I sa_len +field when iterating through the array of addresses returned, +as not all the structures returned have the same length +(this variance in length is nearly guaranteed by the presence of link-layer +address structures). +.Sh 4 "Additions and changes to filesystems" +.PP +The \*(4B distribution contains most of the interfaces +specified in the IEEE Std1003.1 system interface standard. +Filesystem additions include IEEE Std1003.1 FIFOs, +byte-range file locking, and saved user and group identifiers. +.PP +A new virtual filesystem interface has been added to the +kernel to support multiple filesystems. +In comparison with other interfaces, +the Berkeley interface has been structured for more efficient support +of filesystems that maintain state (such as the local filesystem). +The interface has been extended with support for stackable +filesystems done at UCLA. +These extensions allow for filesystems to be layered on top of each +other and allow new vnode operations to be added without requiring +changes to existing filesystem implementations. +For example, +the umap filesystem (see +.Xr mount_umap (8)) +is used to mount a sub-tree of an existing filesystem +that uses a different set of uids and gids than the local system. +Such a filesystem could be mounted from a remote site via NFS or it +could be a filesystem on removable media brought from some foreign +location that uses a different password file. +.PP +Other new filesystems that may be stacked include the loopback filesystem +.Xr mount_lofs (8), +and the kernel filesystem +.Xr mount_kernfs (8). +.PP +The buffer cache in the kernel is now organized as a file block cache +rather than a device block cache. +As a consequence, cached blocks from a file +and from the corresponding block device would no longer be kept consistent. +The block device thus has little remaining value. +Three changes have been made for these reasons: +.IP 1) +block devices may not be opened while they are mounted, +and may not be mounted while open, so that the two versions of cached +file blocks cannot be created, +.IP 2) +filesystem checks of the root now use the raw device +to access the root filesystem, and +.IP 3) +the root filesystem is initially mounted read-only +so that nothing can be written back to disk during or after change to +the raw filesystem by +.Xr fsck . +.LP +The root filesystem may be made writable while in single-user mode +with the command: +.DS +.ft CW +mount \-uw / +.DE +The mount command has an option to update the flags on a mounted filesystem, +including the ability to upgrade a filesystem from read-only to read-write +or downgrade it from read-write to read-only. +.PP +In addition to the local ``fast filesystem'', +we have added an implementation of the network filesystem (NFS) +that fully interoperates with the NFS shipped by Sun and its licensees. +Because our NFS implementation was implemented +by Rick Macklem of the University of Guelph +using only the publicly available NFS specification, +it does not require a license from Sun to use in source or binary form. +By default it runs over UDP to be compatible with Sun's implementation. +However, it can be configured on a per-mount basis to run over TCP. +Using TCP allows it to be used quickly and efficiently through +gateways and over long-haul networks. +Using an extended protocol, it supports Leases to allow a limited +callback mechanism that greatly reduces the network traffic necessary +to maintain cache consistency between the server and its clients. +Its use will be familiar to users of other implementations of NFS. +See the manual pages +.Xr mount (8), +.Xr mountd (8), +.Xr fstab (5), +.Xr exports (5), +.Xr netgroup (5), +.Xr nfsd (8), +.Xr nfsiod (8), +and +.Xr nfssvc (8). +and the document ``The 4.4BSD NFS Implementation'' (SMM:6) +for further information. +The format of +.Pn /etc/fstab +has changed from previous \*(Bs releases +to a blank-separated format to allow colons in pathnames. +.PP +A new local filesystem, the log-structured filesystem (LFS), +has been added to the system. +It provides near disk-speed output and fast crash recovery. +This work is based, in part, on the LFS filesystem created +for the Sprite operating system at Berkeley. +While the kernel implementation is almost complete, +only some of the utilities to support the +filesystem have been written, +so we do not recommend it for production use. +See +.Xr newlfs (8), +.Xr mount_lfs (8) +and +.Xr lfs_cleanerd (8) +for more information. +For an in-depth description of the implementation and performance +characteristics of log-structured filesystems in general, +and this one in particular, see Dr. Margo Seltzer's doctoral thesis, +available from the University of California Computer Science Department. +.PP +We have also added a memory-based filesystem that runs in +pageable memory, allowing large temporary filesystems without +requiring dedicated physical memory. +.PP +The local ``fast filesystem'' has been enhanced to do +clustering that allows large pieces of files to be +allocated contiguously resulting in near doubling +of filesystem throughput. +The filesystem interface has been extended to allow +files and filesystems to grow to 2^63 bytes in size. +The quota system has been rewritten to support both +user and group quotas (simultaneously if desired). +Quota expiration is based on time rather than +the previous metric of number of logins over quota. +This change makes quotas more useful on fileservers +onto which users seldom login. +.PP +The system security has been greatly enhanced by the +addition of additional file flags that permit a file to be +marked as immutable or append only. +Once set, these flags can only be cleared by the super-user +when the system is running in insecure mode (normally, single-user). +In addition to the immutable and append-only flags, +the filesystem supports a new user-settable flag ``nodump''. +(File flags are set using the +.Xr chflags (1) +utility.) +When set on a file, +.Xr dump (8) +will omit the file from incremental backups +but retain them on full backups. +See the ``-h'' flag to +.Xr dump (8) +for details on how to change this default. +The ``nodump'' flag is usually set on core dumps, +system crash dumps, and object files generated by the compiler. +Note that the flag is not preserved when files are copied +so that installing an object file will cause it to be preserved. +.PP +The filesystem format used in \*(4B has several additions. +Directory entries have an additional field, +.Pn d_type , +that identifies the type of the entry +(normally found in the +.Pn st_mode +field of the +.Pn stat +structure). +This field is particularly useful for identifying +directories without the need to use +.Xr stat (2). +.PP +Short (less than sixty byte) symbolic links are now stored +in the inode itself rather than in a separate data block. +This saves disk space and makes access of symbolic links faster. +Short symbolic links are not given a special type, +so a user-level application is unaware of their special treatment. +Unlike pre-\*(4B systems, symbolic links do +not have an owner, group, access mode, times, etc. +Instead, these attributes are taken from the directory that contains the link. +The only attributes returned from an +.Xr lstat (2) +that refer to the symbolic link itself are the file type (S_IFLNK), +size, blocks, and link count (always 1). +.PP +An implementation of an auto-mounter daemon, +.Xr amd , +was contributed by Jan-Simon Pendry of the +Imperial College of Science, Technology & Medicine. +See the document ``AMD \- The 4.4BSD Automounter'' (SMM:13) +for further information. +.PP +The directory +.Pn /dev/fd +contains special files +.Pn 0 +through +.Pn 63 +that, when opened, duplicate the corresponding file descriptor. +The names +.Pn /dev/stdin , +.Pn /dev/stdout +and +.Pn /dev/stderr +refer to file descriptors 0, 1 and 2. +See +.Xr fd (4) +and +.Xr mount_fdesc (8) +for more information. +.Sh 4 "POSIX terminal driver changes" +.PP +The \*(4B system uses the IEEE P1003.1 (POSIX.1) terminal interface +rather than the previous \*(Bs terminal interface. +The terminal driver is similar to the System V terminal driver +with the addition of the necessary extensions to get the +functionality previously available in the \*(Ps terminal driver. +Both the old +.Xr ioctl +calls and old options to +.Xr stty (1) +are emulated. +This emulation is expected to be unavailable in many vendors releases, +so conversion to the new interface is encouraged. +.PP +\*(4B also adds the IEEE Std1003.1 job control interface, +that is similar to the \*(Ps job control interface, +but adds a security model that was missing in the +\*(Ps job control implementation. +A new system call, +.Fn setsid , +creates a job-control session consisting of a single process +group with one member, the caller, that becomes a session leader. +Only a session leader may acquire a controlling terminal. +This is done explicitly via a +.Sm TIOCSCTTY +.Fn ioctl +call, not implicitly by an +.Fn open +call. +The call fails if the terminal is in use. +Programs that allocate controlling terminals (or pseudo-terminals) +require change to work in this environment. +The versions of +.Xr xterm +provided in the X11R5 release includes the necessary changes. +New library routines are available for allocating and initializing +pseudo-terminals and other terminals as controlling terminal; see +.Pn /usr/src/lib/libutil/pty.c +and +.Pn /usr/src/lib/libutil/login_tty.c . +.PP +The POSIX job control model formalizes the previous conventions +used in setting up a process group. +Unfortunately, this requires that changes be made in a defined order +and with some synchronization that were not necessary in the past. +Older job control shells (csh, ksh) will generally not operate correctly +with the new system. +.PP +Most of the other kernel interfaces have been changed to correspond +with the POSIX.1 interface, although that work is not complete. +See the relevant manual pages and the IEEE POSIX standard. +.Sh 4 "Native operating system compatibility" +.PP +Both the HP300 and SPARC ports feature the ability to run binaries +built for the native operating system (HP-UX or SunOS) by emulating +their system calls. +Building an HP300 kernel with the HPUXCOMPAT and COMPAT_OHPUX options +or a SPARC kernel with the COMPAT_SUNOS option will enable this feature +(on by default in the generic kernel provided in the root filesystem image). +Though this native operating system compatibility was provided by the +developers as needed for their purposes and is by no means complete, +it is complete enough to run several non-trivial applications including +those that require HP-UX or SunOS shared libraries. +For example, the vendor supplied X11 server and windowing environment +can be used on both the HP300 and SPARC. +.PP +It is important to remember that merely copying over a native binary +and executing it (or executing it directly across NFS) does not imply +that it will run. +All but the most trivial of applications are likely to require access +to auxiliary files that do not exist under \*(4B (e.g. +.Pn /etc/ld.so.cache ) +or have a slightly different format (e.g. +.Pn /etc/passwd ). +However, by using system call tracing and +through creative use of symlinks, +many problems can be tracked down and corrected. +.PP +The DECstation port also has code for ULTRIX emulation +(kernel option ULTRIXCOMPAT, not compiled into the generic kernel) +but it was used primarily for initially bootstrapping the port and +has not been used since. +Hence, some work may be required to make it generally useful. +.Sh 3 "Changes to the utilities" +.PP +We have been tracking the IEEE Std1003.2 shell and utility work +and have included prototypes of many of the proposed utilities +based on draft 12 of the POSIX.2 Shell and Utilities document. +Because most of the traditional utilities have been replaced +with implementations conformant to the POSIX standards, +you should realize that the utility software may not be as stable, +reliable or well documented as in traditional Berkeley releases. +In particular, almost the entire manual suite has been rewritten to +reflect the POSIX defined interfaces, and in some instances +it does not correctly reflect the current state of the software. +It is also worth noting that, in rewriting this software, we have generally +been rewarded with significant performance improvements. +Most of the libraries and header files have been converted +to be compliant with ANSI C. +The shipped compiler (gcc) is a superset of ANSI C, +but supports traditional C as a command-line option. +The system libraries and utilities all compile +with either ANSI or traditional C. +.Sh 4 "Make and Makefiles" +.PP +This release uses a completely new version of the +.Xr make +program derived from the +.Xr pmake +program developed by the Sprite project at Berkeley. +It supports existing makefiles, although certain incorrect makefiles +may fail. +The makefiles for the \*(4B sources make extensive use of the new +facilities, especially conditionals and file inclusion, and are thus +completely incompatible with older versions of +.Xr make +(but nearly all the makefiles are now trivial!). +The standard include files for +.Xr make +are in +.Pn /usr/share/mk . +There is a +.Pn bsd.README +file in +.Pn /usr/src/share/mk . +.PP +Another global change supported by the new +.Xr make +is designed to allow multiple architectures to share a copy of the sources. +If a subdirectory named +.Pn obj +is present in the current directory, +.Xr make +descends into that directory and creates all object and other files there. +We use this by building a directory hierarchy in +.Pn /var/obj +that parallels +.Pn /usr/src . +We then create the +.Pn obj +subdirectories in +.Pn /usr/src +as symbolic links to the corresponding directories in +.Pn /var/obj . +(This step is automated. +The command ``make obj'' in +.Pn /usr/src +builds both the local symlink and the shadow directory, +using +.Pn /usr/obj , +that may be a symbolic link, as the root of the shadow tree. +The use of +.Pn /usr/obj +is for historic reasons only, and the system make configuration files in +.Pn /usr/share/mk +can trivially be modified to use +.Pn /var/obj +instead.) +We have one +.Pn /var/obj +hierarchy on the local system, and another on each +system that shares the source filesystem. +All the sources in +.Pn /usr/src +except for +.Pn /usr/src/contrib +and portions of +.Pn /usr/src/old +have been converted to use the new make and +.Pn obj +subdirectories; +this change allows compilation for multiple +architectures from the same source tree +(that may be mounted read-only). +.Sh 4 "Kerberos" +.PP +The Kerberos authentication system designed by MIT (version 5) +is included in this release. +See +.Xr kerberos (8) +for a general introduction. +Pluggable Authentication Modules (PAM) can use Kerberos +at the system administrator's discretion. +If it is configured, +apps such as +.Xr login (1), +.Xr passwd (1), +.Xr ftp (1) +and +.Xr ssh (1) +can use it automatically. +The file +Each system needs the file +.Pn /etc/krb5.conf +to set its realm and local servers, +and a private key stored in +.Pn /etc/krb5.keytab +(see +.Xr ktutil (8)). +The Kerberos server should be set up on a single, +physically secure, +server machine. +Users and hosts may be added and modified with +.Xr kadmin (8). +.PP +Note that the password-changing program +.Xr passwd (1) +can change the Kerberos password, +if configured by the administrator using PAM. +The +.Li \-l +option to +.Xr passwd (1) +changes the ``local'' password if one exists. +.Sh 4 "Timezone support" +.PP +The timezone conversion code in the C library uses data files installed in +.Pn /usr/share/zoneinfo +to convert from ``GMT'' to various timezones. The data file for the default +timezone for the system should be copied to +.Pn /etc/localtime . +Other timezones can be selected by setting the TZ environment variable. +.PP +The data files initially installed in +.Pn /usr/share/zoneinfo +include corrections for leap seconds since the beginning of 1970. +Thus, they assume that the +kernel will increment the time at a constant rate during a leap second; +that is, time just keeps on ticking. The conversion routines will then +name a leap second 23:59:60. For purists, this effectively means that +the kernel maintains TAI (International Atomic Time) rather than UTC +(Coordinated Universal Time, aka GMT). +.PP +For systems that run current NTP (Network Time Protocol) implementations +or that wish to conform to the letter of the POSIX.1 law, it is possible +to rebuild the timezone data files so that leap seconds are not counted. +(NTP causes the time to jump over a leap second, and POSIX effectively +requires the clock to be reset by hand when a leap second occurs. +In this mode, the kernel effectively runs UTC rather than TAI.) +.PP +The data files without leap second information +are constructed from the source directory, +.Pn /usr/src/share/zoneinfo . +Change the variable REDO in Makefile +from ``right'' to ``posix'', and then do +.DS +make obj (if necessary) +make +make install +.DE +.PP +You will then need to copy the correct default zone file to +.Pn /etc/localtime , +as the old one would still have used leap seconds, and because the Makefile +installs a default +.Pn /etc/localtime +each time ``make install'' is done. +.PP +It is possible to install both sets of timezone data files. This results +in subdirectories +.Pn /usr/share/zoneinfo/right +and +.Pn /usr/share/zoneinfo/posix . +Each contain a complete set of zone files. +See +.Pn /usr/src/share/zoneinfo/Makefile +for details. +.Sh 4 "Additions and changes to the libraries" +.PP +Notable additions to the libraries include functions to traverse a +filesystem hierarchy, database interfaces to btree and hashing functions, +a new, faster implementation of stdio and a radix and merge sort +functions. +.PP +The +.Xr fts (3) +functions will do either physical or logical traversal of +a file hierarchy as well as handle essentially infinite depth +filesystems and filesystems with cycles. +All the utilities in \*(4B which traverse file hierarchies +have been converted to use +.Xr fts (3). +The conversion has always resulted in a significant performance +gain, often of four or five to one in system time. +.PP +The +.Xr dbopen (3) +functions are intended to be a family of database access methods. +Currently, they consist of +.Xr hash (3), +an extensible, dynamic hashing scheme, +.Xr btree (3), +a sorted, balanced tree structure (B+tree's), and +.Xr recno (3), +a flat-file interface for fixed or variable length records +referenced by logical record number. +Each of the access methods stores associated key/data pairs and +uses the same record oriented interface for access. +.PP +The +.Xr qsort (3) +function has been rewritten for additional performance. +In addition, three new types of sorting functions, +.Xr heapsort (3), +.Xr mergesort (3) +and +.Xr radixsort (3) +have been added to the system. +The +.Xr mergesort +function is optimized for data with pre-existing order, +in which case it usually significantly outperforms +.Xr qsort . +The +.Xr radixsort (3) +functions are variants of most-significant-byte radix sorting. +They take time linear to the number of bytes to be +sorted, usually significantly outperforming +.Xr qsort +on data that can be sorted in this fashion. +An implementation of the POSIX 1003.2 standard +.Xr sort (1), +based on +.Xr radixsort , +is included in +.Pn /usr/src/contrib/sort . +.PP +Some additional comments about the \*(4B C library: +.IP \(bu +The floating point support in the C library has been replaced +and is now accurate. +.IP \(bu +The C functions specified by both ANSI C, POSIX 1003.1 and +1003.2 are now part of the C library. +This includes support for file name matching, shell globbing +and both basic and extended regular expressions. +.IP \(bu +ANSI C multibyte and wide character support has been integrated. +The rune functionality from the Bell Labs' Plan 9 system is provided +as well. +.IP \(bu +The +.Xr termcap (3) +functions have been generalized and replaced with a general +purpose interface named +.Xr getcap (3). +.IP \(bu +The +.Xr stdio (3) +routines have been replaced, and are usually much faster. +In addition, the +.Xr funopen (3) +interface permits applications to provide their own I/O stream +function support. +.PP +The +.Xr curses (3) +library has been largely rewritten. +Important additional features include support for scrolling and +.Xr termios (3). +.PP +An application front-end editing library, named libedit, has been +added to the system. +.PP +A superset implementation of the SunOS kernel memory interface library, +libkvm, has been integrated into the system. +.PP +.Sh 4 "Additions and changes to other utilities" +.PP +There are many new utilities, offering many new capabilities, +in \*(4B. +Skimming through the section 1 and section 8 manual pages is sure +to be useful. +The additions to the utility suite include greatly enhanced versions of +programs that display system status information, implementations of +various traditional tools described in the IEEE Std1003.2 standard, +new tools not previous available on Berkeley UNIX systems, +and many others. +Also, with only a very few exceptions, all the utilities from +\*(Ps that included proprietary source code have been replaced, +and their \*(4B counterparts are freely redistributable. +Normally, this replacement resulted in significant performance +improvements and the increase of the limits imposed on data by +the utility as well. +.PP +A summary of specific additions and changes are as follows: +.TS +lfC l. +amd An auto-mounter implementation. +ar Replacement of the historic archive format with a new one. +awk Replaced by gawk; see /usr/src/old/awk for the historic version. +bdes Utility implementing DES modes of operation described in FIPS PUB 81. +calendar Addition of an interface for system calendars. +cap_mkdb Utility for building hashed versions of termcap style databases. +cc Replacement of pcc with gcc suite. +chflags A utility for setting the per-file user and system flags. +chfn An editor based replacement for changing user information. +chpass An editor based replacement for changing user information. +chsh An editor based replacement for changing user information. +cksum The POSIX 1003.2 checksum utility; compatible with sum. +column A columnar text formatting utility. +cp POSIX 1003.2 compatible, able to copy special files. +csh Freely redistributable and 8-bit clean. +date User specified formats added. +dd New EBCDIC conversion tables, major performance improvements. +dev_mkdb Hashed interface to devices. +dm Dungeon master. +find Several new options and primaries, major performance improvements. +fstat Utility displaying information on files open on the system. +ftpd Connection logging added. +hexdump A binary dump utility, superseding od. +id The POSIX 1003.2 user identification utility. +inetd Tcpmux added. +jot A text formatting utility. +kdump A system-call tracing facility. +ktrace A system-call tracing facility. +kvm_mkdb Hashed interface to the kernel name list. +lam A text formatting utility. +lex A new, freely redistributable, significantly faster version. +locate A database of the system files, by name, constructed weekly. +logname The POSIX 1003.2 user identification utility. +mail.local New local mail delivery agent, replacing mail. +make Replaced with a new, more powerful make, supporting include files. +man Added support for man page location configuration. +mkdep A new utility for generating make dependency lists. +mkfifo The POSIX 1003.2 FIFO creation utility. +mtree A new utility for mapping file hierarchies to a file. +nfsstat An NFS statistics utility. +nvi A freely redistributable replacement for the ex/vi editors. +pax The POSIX 1003.2 replacement for cpio and tar. +printf The POSIX 1003.2 replacement for echo. +roff Replaced by groff; see /usr/src/old/roff for the historic versions. +rs New utility for text formatting. +shar An archive building utility. +sysctl MIB-style interface to system state. +tcopy Fast tape-to-tape copying and verification. +touch Time and file reference specifications. +tput The POSIX 1003.2 terminal display utility. +tr Addition of character classes. +uname The POSIX 1003.2 system identification utility. +vis A filter for converting and displaying non-printable characters. +xargs The POSIX 1003.2 argument list constructor utility. +yacc A new, freely redistributable, significantly faster version. +.TE +.PP +The new versions of +.Xr lex (1) +(``flex'') and +.Xr yacc (1) +(``zoo'') should be installed early on if attempting to +cross-compile \*(4B on another system. +Note that the new +.Xr lex +program is not completely backward compatible with historic versions of +.Xr lex , +although it is believed that all documented features are supported. +.PP +The +.Xr find +utility has two new options that are important to be aware of if you +intend to use NFS. +The ``fstype'' and ``prune'' options can be used together to prevent +find from crossing NFS mount points. +See +.Pn /etc/daily +for an example of their use. +.Sh 2 "Hints on converting from \*(Ps to \*(4B" +.PP +This section summarizes changes between +\*(Ps and \*(4B that are likely to +cause difficulty in doing the conversion. +It does not include changes in the network; +see section 5 for information on setting up the network. +.PP +Since the stat st_size field is now 64-bits instead of 32, +doing something like: +.DS +.ft CW +foo(st.st_size); +.DE +and then (improperly) defining foo with an ``int'' or ``long'' parameter: +.DS +.ft CW +foo(size) + int size; +{ + ... +} +.DE +will fail miserably (well, it might work on a little endian machine). +This problem showed up in +.Xr emacs (1) +as well as several other programs. +A related problem is improperly casting (or failing to cast) +the second argument to +.Xr lseek (2), +.Xr truncate (2), +or +.Xr ftruncate (2) +ala: +.DS +.ft CW +lseek(fd, (long)off, 0); +.DE +or +.DS +.ft CW +lseek(fd, 0, 0); +.DE +The best solution is to include +.Pn <unistd.h> +which has prototypes that catch these types of errors. +.PP +Determining the ``namelen'' parameter for a +.Xr connect (2) +call on a unix domain socket should use the ``SUN_LEN'' macro from +.Pn <sys/un.h> . +One old way that was used: +.DS +.ft CW +addrlen = strlen(unaddr.sun_path) + sizeof(unaddr.sun_family); +.DE +no longer works as there is an additional +.Pn sun_len +field. +.PP +The kernel's limit on the number of open files has been +increased from 20 to 64. +It is now possible to change this limit almost arbitrarily. +The standard I/O library +autoconfigures to the kernel limit. +Note that file (``_iob'') entries may be allocated by +.Xr malloc +from +.Xr fopen ; +this allocation has been known to cause problems with programs +that use their own memory allocators. +Memory allocation does not occur until after 20 files have been opened +by the standard I/O library. +.PP +.Xr Select +can be used with more than 32 descriptors +by using arrays of \fBint\fPs for the bit fields rather than single \fBint\fPs. +Programs that used +.Xr getdtablesize +as their first argument to +.Xr select +will no longer work correctly. +Usually the program can be modified to correctly specify the number +of bits in an \fBint\fP. +Alternatively the program can be modified to use an array of \fBint\fPs. +There are a set of macros available in +.Pn <sys/types.h> +to simplify this. +See +.Xr select (2). +.PP +Old core files will not be intelligible by the current debuggers +because of numerous changes to the user structure +and because the kernel stack has been enlarged. +The +.Xr a.out +header that was in the user structure is no longer present. +Locally-written debuggers that try to check the magic number +will need to be changed. +.PP +Files may not be deleted from directories having the ``sticky'' (ISVTX) bit +set in their modes +except by the owner of the file or of the directory, or by the superuser. +This is primarily to protect users' files in publicly-writable directories +such as +.Pn /tmp +and +.Pn /var/tmp . +All publicly-writable directories should have their ``sticky'' bits set +with ``chmod +t.'' +.PP +The following two sections contain additional notes about +changes in \*(4B that affect the installation of local files; +be sure to read them as well. diff --git a/share/doc/smm/01.setup/4.t b/share/doc/smm/01.setup/4.t new file mode 100644 index 000000000000..975f035f23c3 --- /dev/null +++ b/share/doc/smm/01.setup/4.t @@ -0,0 +1,676 @@ +.\" Copyright (c) 1980, 1986, 1988 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds LH "Installing/Operating \*(4B +.ds CF \*(Dy +.ds RH "System setup +.Sh 1 "System setup" +.PP +This section describes procedures used to set up a \*(4B UNIX system. +These procedures are used when a system is first installed +or when the system configuration changes. Procedures for normal +system operation are described in the next section. +.Sh 2 "Kernel configuration" +.PP +This section briefly describes the layout of the kernel code and +how files for devices are made. +For a full discussion of configuring +and building system images, consult the document ``Building +4.3BSD UNIX Systems with Config'' (SMM:2). +.Sh 3 "Kernel organization" +.PP +As distributed, the kernel source is in a +separate tar image. The source may be physically +located anywhere within any filesystem so long as +a symbolic link to the location is created for the file +.Pn /sys +(many files in +.Pn /usr/include +are normally symbolic links relative to +.Pn /sys ). +In further discussions of the system source all path names +will be given relative to +.Pn /sys . +.LP +The kernel is made up of several large generic parts: +.TS +l l l. +sys main kernel header files +kern kernel functions broken down as follows + init system startup, syscall dispatching, entry points + kern scheduling, descriptor handling and generic I/O + sys process management, signals + tty terminal handling and job control + vfs filesystem management + uipc interprocess communication (sockets) + subr miscellaneous support routines +vm virtual memory management +ufs local filesystems broken down as follows + ufs common local filesystem routines + ffs fast filesystem + lfs log-based filesystem + mfs memory based filesystem +nfs Sun-compatible network filesystem +miscfs miscellaneous filesystems broken down as follows + deadfs where rejected vnodes go to die + fdesc access to per-process file descriptors + fifofs IEEE Std1003.1 FIFOs + kernfs filesystem access to kernel data structures + lofs loopback filesystem + nullfs another loopback filesystem + specfs device special files + umapfs provide alternate uid/gid mappings +dev generic device drivers (SCSI, vnode, concatenated disk) +.TE +.LP +The networking code is organized by protocol +.TS +l l. +net routing and generic interface drivers +netinet Internet protocols (TCP, UDP, IP, etc) +netiso ISO protocols (TP-4, CLNP, CLTP, etc) +netns Xerox network systems protocols (IDP, SPP, etc) +netx25 CCITT X.25 protocols (X.25 Packet Level, HDLC/LAPB) +.TE +.LP +A separate subdirectory is provided for each machine architecture +.TS +l l. +hp300 HP 9000/300 series of Motorola 68000-based machines +hp code common to both HP 68k and (non-existent) PA-RISC ports +i386 Intel 386/486-based PC machines +luna68k Omron 68000-based workstations +news3400 Sony News MIPS-based workstations +pmax Digital 3100/5000 MIPS-based workstations +sparc Sun Microsystems SPARCstation 1, 1+, and 2 +tahoe (deprecated) CCI Power 6-series machines +vax (deprecated) Digital VAX machines +.TE +.LP +Each machine directory is subdivided by function; +for example the hp300 directory contains +.TS +l l. +include exported machine-dependent header files +hp300 machine-dependent support code and private header files +dev device drivers +conf configuration files +stand machine-dependent standalone code +.TE +.LP +Other kernel related directories +.TS +l l. +compile area to compile kernels +conf machine-independent configuration files +stand machine-independent standalone code +.TE +.Sh 3 "Devices and device drivers" +.PP +Devices supported by UNIX are implemented in the kernel +by drivers whose source is kept in +.Pn /sys/<architecture>/dev . +These drivers are loaded +into the system when included in a cpu specific configuration file +kept in the conf directory. Devices are accessed through special +files in the filesystem, made by the +.Xr mknod (8) +program and normally kept in the +.Pn /dev +directory. +For all the devices supported by the distribution system, the +files in +.Pn /dev +are created by devfs. +.PP +Determine the set of devices that you have and create a new +.Pn /dev +directory by mounting devfs. +.Sh 3 "Building new system images" +.PP +The kernel configuration of each UNIX system is described by +a single configuration file, stored in the +.Pn /sys/<architecture>/conf +directory. +To learn about the format of this file and the procedure used +to build system images, +start by reading ``Building 4.3BSD UNIX Systems with Config'' (SMM:2), +look at the manual pages in section 4 +of the UNIX manual for the devices you have, +and look at the sample configuration files in the +.Pn /sys/<architecture>/conf +directory. +.PP +The configured system image +.Pn kernel +should be copied to the root, and then booted to try it out. +It is best to name it +.Pn /newkernel +so as not to destroy the working system until you are sure it does work: +.DS +\fB#\fP \fIcp kernel /newkernel\fP +\fB#\fP \fIsync\fP +.DE +It is also a good idea to keep the previous system around under some other +name. In particular, we recommend that you save the generic distribution +version of the system permanently as +.Pn /genkernel +for use in emergencies. +To boot the new version of the system you should follow the +bootstrap procedures outlined in section 6.1. +After having booted and tested the new system, it should be installed as +.Pn /kernel +before going into multiuser operation. +A systematic scheme for numbering and saving old versions +of the system may be useful. +.Sh 2 "Configuring terminals" +.PP +If UNIX is to support simultaneous +access from directly-connected terminals other than the console, +the file +.Pn /etc/ttys +(see +.Xr ttys (5)) +must be edited. +.PP +To add a new terminal device, be sure the device is configured into the system +and that the special files for the device exist in +.Pn /dev . +Then, enable the appropriate lines of +.Pn /etc/ttys +by setting the ``status'' +field to \fBon\fP (or add new lines). +Note that lines in +.Pn /etc/ttys +are one-for-one with entries in the file of current users +(see +.Pn /var/run/utmp ), +and therefore it is best to make changes +while running in single-user mode +and to add all the entries for a new device at once. +.PP +Each line in the +.Pn /etc/ttys +file is broken into four tab separated +fields (comments are shown by a `#' character and extend to +the end of the line). For each terminal line the four fields +are: +the device (without a leading +.Pn /dev ), +the program +.Pn /sbin/init +should startup to service the line +(or \fBnone\fP if the line is to be left alone), +the terminal type (found in +.Pn /usr/share/misc/termcap ), +and optional status information describing if the terminal is +enabled or not and if it is ``secure'' (i.e. the super user should +be allowed to login on the line). +If the console is marked as ``insecure'', +then the root password is required to bring the machine up single-user. +All fields are character strings +with entries requiring embedded white space enclosed in double +quotes. +Thus a newly added terminal +.Pn /dev/tty00 +could be added as +.DS +tty00 "/usr/libexec/getty std.9600" vt100 on secure # mike's office +.DE +The std.9600 parameter provided to +.Pn /usr/libexec/getty +is used in searching the file +.Pn /etc/gettytab ; +it specifies a terminal's characteristics (such as baud rate). +To make custom terminal types, consult +.Xr gettytab (5) +before modifying +.Pn /etc/gettytab . +.PP +Dialup terminals should be wired so that carrier is asserted only when the +phone line is dialed up. +For non-dialup terminals, from which modem control is not available, +you must wire back the signals so that +the carrier appears to always be present. For further details, +find your terminal driver in section 4 of the manual. +.PP +For network terminals (i.e. pseudo terminals), no program should +be started up on the lines. Thus, the normal entry in +.Pn /etc/ttys +would look like +.DS +ttyp0 none network +.DE +(Note, the fourth field is not needed here.) +.PP +When the system is running multi-user, all terminals that are listed in +.Pn /etc/ttys +as \fBon\fP have their line enabled. +If, during normal operations, you wish +to disable a terminal line, you can edit the file +.Pn /etc/ttys +to change the terminal's status to \fBoff\fP and +then send a hangup signal to the +.Xr init +process, by doing +.DS +\fB#\fP \fIkill \-1 1\fP +.DE +Terminals can similarly be enabled by changing the status field +from \fBoff\fP to \fBon\fP and sending a hangup signal to +.Xr init . +.PP +Note that if a special file is inaccessible when +.Xr init +tries to create a process for it, +.Xr init +will log a message to the +system error logging process (see +.Xr syslogd (8)) +and try to reopen the terminal every minute, reprinting the warning +message every 10 minutes. Messages of this sort are normally +printed on the console, though other actions may occur depending +on the configuration information found in +.Pn /etc/syslog.conf . +.PP +Finally note that you should change the names of any dialup +terminals to ttyd? +where ? is in [0-9a-zA-Z], as some programs use this property of the +names to determine if a terminal is a dialup. +.PP +While it is possible to use truly arbitrary strings for terminal names, +the accounting and noticeably the +.Xr ps (1) +command make good use of the convention that tty names +(by default, and also after dialups are named as suggested above) +are distinct in the last 2 characters. +Change this and you may be sorry later, as the heuristic +.Xr ps (1) +uses based on these conventions will then break down and +.Xr ps +will run MUCH slower. +.Sh 2 "Adding users" +.PP +The procedure for adding a new user is described in +.Xr adduser (8). +You should add accounts for the initial user community, giving +each a directory and a password, and putting users who will wish +to share software in the same groups. +.PP +Several guest accounts have been provided on the distribution +system; these accounts are for people at Berkeley, +Bell Laboratories, and others +who have done major work on UNIX in the past. You can delete these accounts, +or leave them on the system if you expect that these people would have +occasion to login as guests on your system. +.Sh 2 "Site tailoring" +.PP +All programs that require the site's name, or some similar +characteristic, obtain the information through system calls +or from files located in +.Pn /etc . +Aside from parts of the +system related to the network, to tailor the system to your +site you must simply select a site name, then edit the file +.DS +/etc/netstart +.DE +The first lines in +.Pn /etc/netstart +use a variable to set the hostname, +.DS +hostname=\fImysitename\fP +/bin/hostname $hostname +.DE +to define the value returned by the +.Xr gethostname (2) +system call. If you are running the name server, your site +name should be your fully qualified domain name. Programs such as +.Xr getty (8), +.Xr mail (1), +.Xr wall (1), +and +.Xr uucp (1) +use this system call so that the binary images are site +independent. +.PP +You will also need to edit +.Pn /etc/netstart +to do the network interface initialization using +.Xr ifconfig (8). +If you are not sure how to do this, see sections 5.1, 5.2, and 5.3. +If you are not running a routing daemon and have +more than one Ethernet in your environment +you will need to set up a default route; +see section 5.4 for details. +Before bringing your system up multiuser, +you should ensure that the networking is properly configured. +The network is started by running +.Pn /etc/netstart . +Once started, you should test connectivity using +.Xr ping (8). +You should first test connectivity to yourself, +then another host on your Ethernet, +and finally a host on another Ethernet. +The +.Xr netstat (8) +program can be used to inspect and debug +your routes; see section 5.4. +.Sh 2 "Setting up the line printer system" +.PP +The line printer system consists of at least +the following files and commands: +.DS +.TS +l l. +/usr/bin/lpq spooling queue examination program +/usr/bin/lprm program to delete jobs from a queue +/usr/bin/lpr program to enter a job in a printer queue +/etc/printcap printer configuration and capability database +/usr/sbin/lpd line printer daemon, scans spooling queues +/usr/sbin/lpc line printer control program +/etc/hosts.lpd list of host allowed to use the printers +.TE +.DE +.PP +The file +.Pn /etc/printcap +is a master database describing line +printers directly attached to a machine and, also, printers +accessible across a network. The manual page +.Xr printcap (5) +describes the format of this database and also +shows the default values for such things as the directory +in which spooling is performed. The line printer system handles +multiple printers, multiple spooling queues, local and remote +printers, and also printers attached via serial lines that require +line initialization such as the baud rate. Raster output devices +such as a Varian or Versatec, and laser printers such as an Imagen, +are also supported by the line printer system. +.PP +Remote spooling via the network is handled with two spooling +queues, one on the local machine and one on the remote machine. +When a remote printer job is started with +.Xr lpr , +the job is queued locally and a daemon process created to oversee the +transfer of the job to the remote machine. If the destination +machine is unreachable, the job will remain queued until it is +possible to transfer the files to the spooling queue on the +remote machine. The +.Xr lpq +program shows the contents of spool +queues on both the local and remote machines. +.PP +To configure your line printers, consult the printcap manual page +and the accompanying document, ``4.3BSD Line Printer Spooler Manual'' (SMM:7). +A call to the +.Xr lpd +program should be present in +.Pn /etc/rc . +.Sh 2 "Setting up the mail system" +.PP +The mail system consists of the following commands: +.DS +.TS +l l. +/usr/bin/mail UCB mail program, described in \fImail\fP\|(1) +/usr/sbin/sendmail mail routing program +/var/spool/mail mail spooling directory +/var/spool/secretmail secure mail directory +/usr/bin/xsend secure mail sender +/usr/bin/xget secure mail receiver +/etc/aliases mail forwarding information +/usr/bin/newaliases command to rebuild binary forwarding database +/usr/bin/biff mail notification enabler +/usr/libexec/comsat mail notification daemon +.TE +.DE +Mail is normally sent and received using the +.Xr mail (1) +command (found in +.Pn /usr/bin/mail ), +which provides a front-end to edit the messages sent +and received, and passes the messages to +.Xr sendmail (8) +for routing. +The routing algorithm uses knowledge of the network name syntax, +aliasing and forwarding information, and network topology, as +defined in the configuration file +.Pn /usr/lib/sendmail.cf , +to process each piece of mail. +Local mail is delivered by giving it to the program +.Pn /usr/libexec/mail.local +that adds it to the mailboxes in the directory +.Pn /var/spool/mail/<username> , +using a locking protocol to avoid problems with simultaneous updates. +After the mail is delivered, the local mail delivery daemon +.Pn /usr/libexec/comsat +is notified, which in turn notifies users who have issued a +``\fIbiff\fP y'' command that mail has arrived. +.PP +Mail queued in the directory +.Pn /var/spool/mail +is normally readable only by the recipient. +To send mail that is secure against perusal +(except by a code-breaker) you should use the secret mail facility, +which encrypts the mail. +.PP +To set up the mail facility you should read the instructions in the +file READ_ME in the directory +.Pn /usr/src/usr.sbin/sendmail +and then adjust the necessary configuration files. +You should also set up the file +.Pn /etc/aliases +for your installation, creating mail groups as appropriate. +For more informations see +``Sendmail Installation and Operation Guide'' (SMM:8) and +``Sendmail \- An Internetwork Mail Router'' (SMM:9). +.Sh 3 "Setting up a UUCP connection" +.LP +The version of +.Xr uucp +included in \*(4B has the following features: +.IP \(bu 3 +support for many auto call units and dialers +in addition to the DEC DN11, +.IP \(bu 3 +breakup of the spooling area into multiple subdirectories, +.IP \(bu 3 +addition of an +.Pn L.cmds +file to control the set +of commands that may be executed by a remote site, +.IP \(bu 3 +enhanced ``expect-send'' sequence capabilities when +logging in to a remote site, +.IP \(bu 3 +new commands to be used in polling sites and +obtaining snap shots of +.Xr uucp +activity, +.IP \(bu 3 +additional protocols for different communication media. +.LP +This section gives a brief overview of +.Xr uucp +and points out the most important steps in its installation. +.PP +To connect two UNIX machines with a +.Xr uucp +network link using modems, +one site must have an automatic call unit +and the other must have a dialup port. +It is better if both sites have both. +.PP +You should first read the paper in the UNIX System Manager's Manual: +``Uucp Implementation Description'' (SMM:14). +It describes in detail the file formats and conventions, +and will give you a little context. +In addition, +the document ``setup.tblms'', +located in the directory +.Pn /usr/src/usr.bin/uucp/UUAIDS , +may be of use in tailoring the software to your needs. +.PP +The +.Xr uucp +support is located in three major directories: +.Pn /usr/bin, +.Pn /usr/lib/uucp, +and +.Pn /var/spool/uucp . +User commands are kept in +.Pn /usr/bin, +operational commands in +.Pn /usr/lib/uucp , +and +.Pn /var/spool/uucp +is used as a spooling area. +The commands in +.Pn /usr/bin +are: +.DS +.TS +l l. +/usr/bin/uucp file-copy command +/usr/bin/uux remote execution command +/usr/bin/uusend binary file transfer using mail +/usr/bin/uuencode binary file encoder (for \fIuusend\fP) +/usr/bin/uudecode binary file decoder (for \fIuusend\fP) +/usr/bin/uulog scans session log files +/usr/bin/uusnap gives a snap-shot of \fIuucp\fP activity +/usr/bin/uupoll polls remote system until an answer is received +/usr/bin/uuname prints a list of known uucp hosts +/usr/bin/uuq gives information about the queue +.TE +.DE +The important files and commands in +.Pn /usr/lib/uucp +are: +.DS +.TS +l l. +/usr/lib/uucp/L-devices list of dialers and hard-wired lines +/usr/lib/uucp/L-dialcodes dialcode abbreviations +/usr/lib/uucp/L.aliases hostname aliases +/usr/lib/uucp/L.cmds commands remote sites may execute +/usr/lib/uucp/L.sys systems to communicate with, how to connect, and when +/usr/lib/uucp/SEQF sequence numbering control file +/usr/lib/uucp/USERFILE remote site pathname access specifications +/usr/lib/uucp/uucico \fIuucp\fP protocol daemon +/usr/lib/uucp/uuclean cleans up garbage files in spool area +/usr/lib/uucp/uuxqt \fIuucp\fP remote execution server +.TE +.DE +while the spooling area contains the following important files and directories: +.DS +.TS +l l. +/var/spool/uucp/C. directory for command, ``C.'' files +/var/spool/uucp/D. directory for data, ``D.'', files +/var/spool/uucp/X. directory for command execution, ``X.'', files +/var/spool/uucp/D.\fImachine\fP directory for local ``D.'' files +/var/spool/uucp/D.\fImachine\fPX directory for local ``X.'' files +/var/spool/uucp/TM. directory for temporary, ``TM.'', files +/var/spool/uucp/LOGFILE log file of \fIuucp\fP activity +/var/spool/uucp/SYSLOG log file of \fIuucp\fP file transfers +.TE +.DE +.PP +To install +.Xr uucp +on your system, +start by selecting a site name +(shorter than 14 characters). +A +.Xr uucp +account must be created in the password file and a password set up. +Then, +create the appropriate spooling directories with mode 755 +and owned by user +.Xr uucp , +group \fIdaemon\fP. +.PP +If you have an auto-call unit, +the L.sys, L-dialcodes, and L-devices files should be created. +The L.sys file should contain +the phone numbers and login sequences +required to establish a connection with a +.Xr uucp +daemon on another machine. +For example, our L.sys file looks something like: +.DS +adiron Any ACU 1200 out0123456789- ogin-EOT-ogin uucp +cbosg Never Slave 300 +cbosgd Never Slave 300 +chico Never Slave 1200 out2010123456 +.DE +The first field is the name of a site, +the second shows when the machine may be called, +the third field specifies how the host is connected +(through an ACU, a hard-wired line, etc.), +then comes the phone number to use in connecting through an auto-call unit, +and finally a login sequence. +The phone number +may contain common abbreviations that are defined in the L-dialcodes file. +The device specification should refer to devices +specified in the L-devices file. +Listing only ACU causes the +.Xr uucp +daemon, +.Xr uucico , +to search for any available auto-call unit in L-devices. +Our L-dialcodes file is of the form: +.DS +ucb 2 +out 9% +.DE +while our L-devices file is: +.DS +ACU cul0 unused 1200 ventel +.DE +Refer to the README file in the +.Xr uucp +source directory for more information about installation. +.PP +As +.Xr uucp +operates it creates (and removes) many small +files in the directories underneath +.Pn /var/spool/uucp . +Sometimes files are left undeleted; +these are most easily purged with the +.Xr uuclean +program. +The log files can grow without bound unless trimmed back; +.Xr uulog +maintains these files. +Many useful aids in maintaining your +.Xr uucp +installation are included in a subdirectory UUAIDS beneath +.Pn /usr/src/usr.bin/uucp . +Peruse this directory and read the ``setup'' instructions also located there. diff --git a/share/doc/smm/01.setup/5.t b/share/doc/smm/01.setup/5.t new file mode 100644 index 000000000000..248c7de119d2 --- /dev/null +++ b/share/doc/smm/01.setup/5.t @@ -0,0 +1,551 @@ +.\" Copyright (c) 1980, 1986, 1988, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds lq `` +.ds rq '' +.ds LH "Installing/Operating \*(4B +.ds RH Network setup +.ds CF \*(Dy +.Sh 1 "Network setup" +.PP +\*(4B provides support for the standard Internet +protocols IP, ICMP, TCP, and UDP. These protocols may be used +on top of a variety of hardware devices ranging from +serial lines to local area network controllers +for the Ethernet. Network services are split between the +kernel (communication protocols) and user programs (user +services such as TELNET and FTP). This section describes +how to configure your system to use the Internet networking support. +\*(4B also supports the Xerox Network Systems (NS) protocols. +IDP and SPP are implemented in the kernel, +and other protocols such as Courier run at the user level. +\*(4B provides some support for the ISO OSI protocols CLNP +TP4, and ESIS. User level process +complete the application protocols such as X.400 and X.500. +.Sh 2 "System configuration" +.PP +To configure the kernel to include the Internet communication +protocols, define the INET option. +Xerox NS support is enabled with the NS option. +ISO OSI support is enabled with the ISO option. +In either case, include the pseudo-devices +``pty'', and ``loop'' in your machine's configuration +file. +The ``pty'' pseudo-device forces the pseudo terminal device driver +to be configured into the system, see +.Xr pty (4), +while the ``loop'' pseudo-device forces inclusion of the software loopback +interface driver. +The loop driver is used in network testing +and also by the error logging system. +.PP +If you are planning to use the Internet network facilities on a 10Mb/s +Ethernet, the pseudo-device ``ether'' should also be included +in the configuration; this forces inclusion of the Address Resolution +Protocol module used in mapping between 48-bit Ethernet +and 32-bit Internet addresses. +.PP +Before configuring the appropriate networking hardware, you should +consult the manual pages in section 4 of the Programmer's Manual +selecting the appropriate interfaces for your architecture. +.PP +All network interface drivers including the loopback interface, +require that their host address(es) be defined at boot time. +This is done with +.Xr ifconfig (8) +commands included in the +.Pn /etc/netstart +file. +Interfaces that are able to dynamically deduce the host +part of an address may check that the host part of the address is correct. +The manual page for each network interface +describes the method used to establish a host's address. +.Xr Ifconfig (8) +can also be used to set options for the interface at boot time. +Options are set independently for each interface, and +apply to all packets sent using that interface. +Alternatively, translations for such hosts may be set in advance +or ``published'' by a \*(4B host by use of the +.Xr arp (8) +command. +Note that the use of trailer link-level is now negotiated between \*(4B hosts +using ARP, +and it is thus no longer necessary to disable the use of trailers +with +.Xr ifconfig . +.PP +The OSI equivalent to ARP is ESIS (End System to Intermediate System Routing +Protocol); running this protocol is mandatory, however one can manually add +translations for machines that do not participate by use of the +.Xr route (8) +command. +Additional information is provided in the manual page describing +.Xr ESIS (4). +.Sh 2 "Local subnets" +.PP +In \*(4B the Internet support +includes the notion of ``subnets''. This is a mechanism +by which multiple local networks may appears as a single Internet +network to off-site hosts. Subnetworks are useful because +they allow a site to hide their local topology, requiring only a single +route in external gateways; +it also means that local network numbers may be locally administered. +The standard describing this change in Internet addressing is RFC-950. +.PP +To set up local subnets one must first decide how the available +address space (the Internet ``host part'' of the 32-bit address) +is to be partitioned. +Sites with a class A network +number have a 24-bit host address space with which to work, sites with a +class B network number have a 16-bit host address space, while sites with +a class C network number have an 8-bit host address space\**. +.FS +If you are unfamiliar with the Internet addressing structure, consult +``Address Mappings'', Internet RFC-796, J. Postel; available from +the Internet Network Information Center at SRI. +.FE +To define local subnets you must steal some bits +from the local host address space for use in extending the network +portion of the Internet address. This reinterpretation of Internet +addresses is done only for local networks; i.e. it is not visible +to hosts off-site. For example, if your site has a class B network +number, hosts on this network have an Internet address that contains +the network number, 16 bits, and the host number, another +16 bits. To define 254 local subnets, each +possessing at most 255 hosts, 8 bits may be taken from the local part. +(The use of subnets 0 and all-1's, 255 in this example, is discouraged +to avoid confusion about broadcast addresses.) +These new network +numbers are then constructed by concatenating the original 16-bit network +number with the extra 8 bits containing the local subnet number. +.PP +The existence of local subnets is communicated to the system at the time a +network interface is configured with the +.I netmask +option to the +.Xr ifconfig +program. A ``network mask'' is specified to define the +portion of the Internet address that is to be considered the network part +for that network. +This mask normally contains the bits corresponding to the standard +network part as well as the portion of the local part +that has been assigned to subnets. +If no mask is specified when the address is set, +it will be set according to the class of the network. +For example, at Berkeley (class B network 128.32) 8 bits +of the local part have been reserved for defining subnets; +consequently the +.Pn /etc/netstart +file contains lines of the form +.DS +.ft CW +/sbin/ifconfig le0 netmask 0xffffff00 128.32.1.7 +.DE +This specifies that for interface ``le0'', the upper 24 bits of +the Internet address should be used in calculating network numbers +(netmask 0xffffff00), and the interface's Internet address is +``128.32.1.7'' (host 7 on network 128.32.1). Hosts \fIm\fP on +sub-network \fIn\fP of this network would then have addresses of +the form ``128.32.\fIn\fP.\fIm\fP''; for example, host +99 on network 129 would have an address ``128.32.129.99''. +For hosts with multiple interfaces, the network mask should +be set for each interface, +although in practice only the mask of the first interface on each network +is really used. +.Sh 2 "Internet broadcast addresses" +.PP +The address defined as the broadcast address for Internet networks +according to RFC-919 is the address with a host part of all 1's. +The address used by 4.2BSD was the address with a host part of 0. +\*(4B uses the standard broadcast address (all 1's) by default, +but allows the broadcast address to be set (with +.Xr ifconfig ) +for each interface. +This allows networks consisting of both 4.2BSD, \*(Ps and \*(4B hosts +to coexist while the upgrade process proceeds. +In the presence of subnets, the broadcast address uses the subnet field +as for normal host addresses, with the remaining host part set to 1's +(or 0's, on a network that has not yet been converted). +\*(4B hosts recognize and accept packets +sent to the logical-network broadcast address as well as those sent +to the subnet broadcast address, and when using an all-1's broadcast, +also recognize and receive packets sent to host 0 as a broadcast. +.Sh 2 "Routing" +.PP +If your environment allows access to networks not directly +attached to your host you will need to set up routing information +to allow packets to be properly routed. Two schemes are +supported by the system. The first scheme +employs a routing table management daemon. +Optimally, you should use the routing daemon +.Xr gated +available from Cornell university. +We use it on our systems and it works well, +especially for multi-homed hosts using Serial Line IP (SLIP). +Unfortunately, we were not able to obtain permission to +include it on \*(4B. +.PP +If you do not wish to or cannot obtain +.Xr gated , +the distribution does include +.Xr routed (8) +to maintain the system routing tables. The routing daemon +uses a variant of the Xerox Routing Information Protocol +to maintain up to date routing tables in a cluster of local +area networks. By using the +.Pn /etc/gateways +file, the routing daemon can also be used to initialize static routes +to distant networks (see the next section for further discussion). +When the routing daemon is started up +(usually from +.Pn /etc/rc ) +it reads +.Pn /etc/gateways +if it exists and installs those routes defined there, +then broadcasts on each local network +to which the host is attached to find other instances of the routing +daemon. If any responses are received, the routing daemons +cooperate in maintaining a globally consistent view of routing +in the local environment. This view can be extended to include +remote sites also running the routing daemon by setting up suitable +entries in +.Pn /etc/gateways ; +consult +.Xr routed (8) +for a more thorough discussion. +.PP +The second approach is to define a default or wildcard +route to a smart +gateway and depend on the gateway to provide ICMP routing +redirect information to dynamically create a routing data +base. This is done by adding an entry of the form +.DS +.ft CW +/sbin/route add default \fIsmart-gateway\fP 1 +.DE +to +.Pn /etc/netstart ; +see +.Xr route (8) +for more information. The default route +will be used by the system as a ``last resort'' +in routing packets to their destination. Assuming the gateway +to which packets are directed is able to generate the proper +routing redirect messages, the system will then add routing +table entries based on the information supplied. This approach +has certain advantages over the routing daemon, but is +unsuitable in an environment where there are only bridges (i.e. +pseudo gateways that, for instance, do not generate routing +redirect messages). Further, if the +smart gateway goes down there is no alternative, save manual +alteration of the routing table entry, to maintaining service. +.PP +The system always listens, and processes, routing redirect +information, so it is possible to combine both of the above +facilities. For example, the routing table management process +might be used to maintain up to date information about routes +to geographically local networks, while employing the wildcard +routing techniques for ``distant'' networks. The +.Xr netstat (1) +program may be used to display routing table contents as well +as various routing oriented statistics. For example, +.DS +\fB#\fP \fInetstat \-r\fP +.DE +will display the contents of the routing tables, while +.DS +\fB#\fP \fInetstat \-r \-s\fP +.DE +will show the number of routing table entries dynamically +created as a result of routing redirect messages, etc. +.Sh 2 "Use of \*(4B machines as gateways" +.PP +Several changes have been made in \*(4B in the area of gateway support +(or packet forwarding, if one prefers). +A new configuration option, GATEWAY, is used when configuring +a machine to be used as a gateway. +This option increases the size of the routing hash tables in the kernel. +Unless configured with that option, +hosts with only a single non-loopback interface never attempt +to forward packets or to respond with ICMP error messages to misdirected +packets. +This change reduces the problems that may occur when different hosts +on a network disagree on the network number or broadcast address. +Another change is that \*(4B machines that forward packets back through +the same interface on which they arrived +will send ICMP redirects to the source host if it is on the same network. +This improves the interaction of \*(4B gateways with hosts that configure +their routes via default gateways and redirects. +The generation of redirects may be disabled with the configuration option +IPSENDREDIRECTS=0 or while the system is running by using the command: +.DS +.ft CW +sysctl -w net.inet.ip.redirect=0 +.DE +in environments where it may cause difficulties. +.Sh 2 "Network databases" +.PP +Several data files are used by the network library routines +and server programs. Most of these files are host independent +and updated only rarely. +.br +.ne 1i +.TS +lfC l l. +File Manual reference Use +_ +/etc/hosts \fIhosts\fP\|(5) local host names +/etc/networks \fInetworks\fP\|(5) network names +/etc/services \fIservices\fP\|(5) list of known services +/etc/protocols \fIprotocols\fP\|(5) protocol names +/etc/hosts.equiv \fIrshd\fP\|(8) list of ``trusted'' hosts +/etc/netstart \fIrc\fP\|(8) command script for initializing network +/etc/rc \fIrc\fP\|(8) command script for starting standard servers +/etc/rc.local \fIrc\fP\|(8) command script for starting local servers +/etc/ftpusers \fIftpd\fP\|(8) list of ``unwelcome'' ftp users +/etc/hosts.lpd \fIlpd\fP\|(8) list of hosts allowed to access printers +/etc/inetd.conf \fIinetd\fP\|(8) list of servers started by \fIinetd\fP +.TE +The files distributed are set up for Internet hosts. +Local networks and hosts should be added to describe the local +configuration; the Berkeley entries may serve as examples +(see also the section on +.Pn /etc/hosts ). +Network numbers will have to be chosen for each Ethernet. +For sites connected to the Internet, +the normal channels should be used for allocation of network +numbers (contact hostmaster@SRI-NIC.ARPA). +For other sites, +these could be chosen more or less arbitrarily, +but it is generally better to request official numbers +to avoid conversion if a connection to the Internet (or others on the Internet) +is ever established. +.Sh 3 "Network servers" +.PP +Most network servers are automatically started up at boot time +by the command file +.Pn /etc/rc +or by the Internet daemon (see below). +These include the following: +.TS +lfC l l. +Program Server Started by +_ +/usr/sbin/syslogd error logging server \f(CW/etc/rc\fP +/usr/sbin/named Internet name server \f(CW/etc/rc\fP +/sbin/routed routing table management daemon \f(CW/etc/rc\fP +/usr/sbin/rwhod system status daemon \f(CW/etc/rc\fP +/usr/sbin/timed time synchronization daemon \f(CW/etc/rc\fP +/usr/sbin/sendmail SMTP server \f(CW/etc/rc\fP +/usr/libexec/rshd shell server inetd +/usr/libexec/rexecd exec server inetd +/usr/libexec/rlogind login server inetd +/usr/libexec/telnetd TELNET server inetd +/usr/libexec/ftpd FTP server inetd +/usr/libexec/fingerd Finger server inetd +/usr/libexec/tftpd TFTP server inetd +.TE +Consult the manual pages and accompanying documentation (particularly +for named and sendmail) for details about their operation. +.PP +The use of +.Xr routed +and +.Xr rwhod +is controlled by shell +variables set in +.Pn /etc/netstart . +By default, +.Xr routed +is used, but +.Xr rwhod +is not; they are enabled by setting the variables \fIroutedflags\fP and +.Xr rwhod +to strings other than ``NO.'' +The value of \fIroutedflags\fP provides host-specific options to +.Xr routed . +For example, +.DS +.ft CW +routedflags=-q +rwhod=NO +.DE +would run +.Xr "routed -q" +and would not run +.Xr rwhod . +.PP +To have other network servers started as well, +commands of the following sort should be placed in the site-dependent file +.Pn /etc/rc.local . +.DS +.ft CW +if [ -f /usr/sbin/timed ]; then + /usr/sbin/timed & echo -n ' timed' >/dev/console +f\&i +.DE +.Sh 3 "Internet daemon" +.PP +In \*(4B most of the servers for user-visible services are started up by a +``super server'', the Internet daemon. The Internet +daemon, +.Pn /usr/sbin/inetd , +acts as a master server for +programs specified in its configuration file, +.Pn /etc/inetd.conf , +listening for service requests for these servers, and starting +up the appropriate program whenever a request is received. +The configuration file contains lines containing a service +name (as found in +.Pn /etc/services ), +the type of socket the +server expects (e.g. stream or dgram), the protocol to be +used with the socket (as found in +.Pn /etc/protocols ), +whether to wait for each server to complete before starting up another, +the user name by which the server should run, the server +program's name, and at most five arguments to pass to the +server program. +Some trivial services are implemented internally in +.Xr inetd , +and their servers are listed as ``internal.'' +For example, an entry for the file +transfer protocol server would appear as +.DS +.ft CW +ftp stream tcp nowait root /usr/libexec/ftpd ftpd +.DE +Consult +.Xr inetd (8) +for more detail on the format of the configuration file +and the operation of the Internet daemon. +.Sh 3 "The \f(CW/etc/hosts.equiv\fP file" +.PP +The remote login and shell servers use an +authentication scheme based on trusted hosts. The +.Pn hosts.equiv +file contains a list of hosts that are considered trusted +and, under a single administrative control. When a user +contacts a remote login or shell server requesting service, +the client process passes the user's name and the official +name of the host on which the client is located. In the simple +case, if the host's name is located in +.Pn hosts.equiv +and the user has an account on the server's machine, then service +is rendered (i.e. the user is allowed to log in, or the command +is executed). Users may expand this ``equivalence'' of +machines by installing a +.Pn \&.rhosts +file in their login directory. +The root login is handled specially, bypassing the +.Pn hosts.equiv +file, and using only the +.Pn /.rhosts +file. +.PP +Thus, to create a class of equivalent machines, the +.Pn hosts.equiv +file should contain the \fIofficial\fP names for those machines. +If you are running the name server, you may omit the domain part +of the host name for machines in your local domain. +For example, four machines on our local +network are considered trusted, so the +.Pn hosts.equiv +file is of the form: +.DS +.ft CW +vangogh.CS.Berkeley.EDU +picasso.CS.Berkeley.EDU +okeeffe.CS.Berkeley.EDU +.DE +.Sh 3 "The \f(CW/etc/ftpusers\fP file" +.PP +The FTP server included in the system provides support for an +anonymous FTP account. Because of the inherent security problems +with such a facility you should read this section carefully if +you consider providing such a service. +.PP +An anonymous account is enabled by creating a user +.Xr ftp . +When a client uses the anonymous account a +.Xr chroot (2) +system call is performed by the server to restrict the client +from moving outside that part of the filesystem where the +user ftp home directory is located. Because a +.Xr chroot +call is used, certain programs and files used by the server +process must be placed in the ftp home directory. +Further, one must be +sure that all directories and executable images are unwritable. +The following directory setup is recommended. The +use of the +.Xr awk +commands to copy the +.Pn /etc/passwd +and +.Pn /etc/group +files are \fBSTRONGLY\fP recommended. +.DS +\fB#\fP \fIcd ~ftp\fP +\fB#\fP \fIchmod 555 .; chown ftp .; chgrp ftp .\fP +\fB#\fP \fImkdir bin etc pub\fP +\fB#\fP \fIchown root bin etc\fP +\fB#\fP \fIchmod 555 bin etc\fP +\fB#\fP \fIchown ftp pub\fP +\fB#\fP \fIchmod 777 pub\fP +\fB#\fP \fIcd bin\fP +\fB#\fP \fIcp /bin/sh /bin/ls .\fP +\fB#\fP \fIchmod 111 sh ls\fP +\fB#\fP \fIcd ../etc\fP +\fB#\fP \fIawk -F: '{$2="*";print$1":"$2":"$3":"$4":"$5":"$6":"}' < /etc/passwd > passwd\fP +\fB#\fP \fIawk -F: '{$2="*";print$1":"$2":"}' < /etc/group > group\fP +\fB#\fP \fIchmod 444 passwd group\fP +.DE +When local users wish to place files in the anonymous +area, they must be placed in a subdirectory. In the +setup here, the directory +.Pn ~ftp/pub +is used. +.PP +Aside from the problems of directory modes and such, +the ftp server may provide a loophole for interlopers +if certain user accounts are allowed. +The file +.Pn /etc/ftpusers +is checked on each connection. +If the requested user name is located in the file, the +request for service is denied. This file normally has +the following names on our systems. +.DS +uucp +root +.DE +Accounts without passwords need not be listed in this file as the ftp +server will refuse service to these users. +Accounts with nonstandard shells (any not listed in +.Pn /etc/shells ) +will also be denied access via ftp. diff --git a/share/doc/smm/01.setup/6.t b/share/doc/smm/01.setup/6.t new file mode 100644 index 000000000000..dd11607b07a5 --- /dev/null +++ b/share/doc/smm/01.setup/6.t @@ -0,0 +1,657 @@ +.\" Copyright (c) 1980, 1986, 1988, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds LH "Installing/Operating \*(4B +.ds CF \*(Dy +.Sh 1 "System operation" +.PP +This section describes procedures used to operate a \*(4B UNIX system. +Procedures described here are used periodically, to reboot the system, +analyze error messages from devices, do disk backups, monitor +system performance, recompile system software and control local changes. +.Sh 2 "Bootstrap and shutdown procedures" +.PP +In a normal reboot, the system checks the disks and comes up multi-user +without intervention at the console. +Such a reboot +can be stopped (after it prints the date) with a ^C (interrupt). +This will leave the system in single-user mode, with only the console +terminal active. +(If the console has been marked ``insecure'' in +.Pn /etc/ttys +you must enter the root password to bring the machine to single-user mode.) +It is also possible to allow the filesystem checks to complete +and then to return to single-user mode by signaling +.Xr fsck (8) +with a QUIT signal (^\|\e). +.PP +To bring the system up to a multi-user configuration from the single-user +status, +all you have to do is hit ^D on the console. The system +will then execute +.Pn /etc/rc , +a multi-user restart script (and +.Pn /etc/rc.local ), +and come up on the terminals listed as +active in the file +.Pn /etc/ttys . +See +.Xr init (8) +and +.Xr ttys (5) for more details. +Note, however, that this does not cause a filesystem check to be done. +Unless the system was taken down cleanly, you should run +``fsck \-p'' or force a reboot with +.Xr reboot (8) +to have the disks checked. +.PP +To take the system down to a single user state you can use +.DS +\fB#\fP \fIkill 1\fP +.DE +or use the +.Xr shutdown (8) +command (which is much more polite, if there are other users logged in) +when you are running multi-user. +Either command will kill all processes and give you a shell on the console, +as if you had just booted. Filesystems remain mounted after the +system is taken single-user. If you wish to come up multi-user again, you +should do this by: +.DS +\fB#\fP \fIcd /\fP +\fB#\fP \fI/sbin/umount -a\fP +\fB#\fP \fI^D\fP +.DE +.PP +Each system shutdown, crash, processor halt and reboot +is recorded in the system log +with its cause. +.Sh 2 "Device errors and diagnostics" +.PP +When serious errors occur on peripherals or in the system, the system +prints a warning diagnostic on the console. +These messages are collected +by the system error logging process +.Xr syslogd (8) +and written into a system error log file +.Pn /var/log/messages . +Less serious errors are sent directly to +.Xr syslogd , +which may log them on the console. +The error priorities that are logged and the locations to which they are logged +are controlled by +.Pn /etc/syslog.conf . +See +.Xr syslogd (8) +for further details. +.PP +Error messages printed by the devices in the system are described with the +drivers for the devices in section 4 of the programmer's manual. +If errors occur suggesting hardware problems, you should contact +your hardware support group or field service. It is a good idea to +examine the error log file regularly +(e.g. with the command \fItail \-r /var/log/messages\fP). +.Sh 2 "Filesystem checks, backups, and disaster recovery" +.PP +Periodically (say every week or so in the absence of any problems) +and always (usually automatically) after a crash, +all the filesystems should be checked for consistency +by +.Xr fsck (1). +The procedures of +.Xr reboot (8) +should be used to get the system to a state where a filesystem +check can be done manually or automatically. +.PP +Dumping of the filesystems should be done regularly, +since once the system is going it is easy to +become complacent. +Complete and incremental dumps are easily done with +.Xr dump (8). +You should arrange to do a towers-of-hanoi dump sequence; we tune +ours so that almost all files are dumped on two tapes and kept for at +least a week in most every case. We take full dumps every month (and keep +these indefinitely). +Operators can execute ``dump w'' at login that will tell them what needs +to be dumped +(based on the +.Pn /etc/fstab +information). +Be sure to create a group +.B operator +in the file +.Pn /etc/group +so that dump can notify logged-in operators when it needs help. +.PP +More precisely, we have three sets of dump tapes: 10 daily tapes, +5 weekly sets of 2 tapes, and fresh sets of three tapes monthly. +We do daily dumps circularly on the daily tapes with sequence +`3 2 5 4 7 6 9 8 9 9 9 ...'. +Each weekly is a level 1 and the daily dump sequence level +restarts after each weekly dump. +Full dumps are level 0 and the daily sequence restarts after each full dump +also. +.PP +Thus a typical dump sequence would be: +.br +.ne 6 +.TS +center; +c c c c c +n n n l l. +tape name level number date opr size +_ +FULL 0 Nov 24, 1992 operator 137K +D1 3 Nov 28, 1992 operator 29K +D2 2 Nov 29, 1992 operator 34K +D3 5 Nov 30, 1992 operator 19K +D4 4 Dec 1, 1992 operator 22K +W1 1 Dec 2, 1992 operator 40K +D5 3 Dec 4, 1992 operator 15K +D6 2 Dec 5, 1992 operator 25K +D7 5 Dec 6, 1992 operator 15K +D8 4 Dec 7, 1992 operator 19K +W2 1 Dec 9, 1992 operator 118K +D9 3 Dec 11, 1992 operator 15K +D10 2 Dec 12, 1992 operator 26K +D1 5 Dec 15, 1992 operator 14K +W3 1 Dec 17, 1992 operator 71K +D2 3 Dec 18, 1992 operator 13K +FULL 0 Dec 22, 1992 operator 135K +.TE +We do weekly dumps often enough that daily dumps always fit on one tape. +.PP +Dumping of files by name is best done by +.Xr tar (1) +but the amount of data that can be moved in this way is limited +to a single tape. +Finally if there are enough drives entire +disks can be copied with +.Xr dd (1) +using the raw special files and an appropriate +blocking factor; the number of sectors per track is usually +a good value to use, consult +.Pn /etc/disktab . +.PP +It is desirable that full dumps of the root filesystem be +made regularly. +This is especially true when only one disk is available. +Then, if the +root filesystem is damaged by a hardware or software failure, you +can rebuild a workable disk doing a restore in the +same way that the initial root filesystem was created. +.PP +Exhaustion of user-file space is certain to occur +now and then; disk quotas may be imposed, or if you +prefer a less fascist approach, try using the programs +.Xr du (1), +.Xr df (1), +and +.Xr quot (8), +combined with threatening +messages of the day, and personal letters. +.Sh 2 "Moving filesystem data" +.PP +If you have the resources, +the best way to move a filesystem +is to dump it to a spare disk partition, or magtape, using +.Xr dump (8), +use +.Xr newfs (8) +to create the new filesystem, +and restore the filesystem using +.Xr restore (8). +Filesystems may also be moved by piping the output of +.Xr dump +to +.Xr restore . +The +.Xr restore +program uses an ``in-place'' algorithm that +allows filesystem dumps to be restored without concern for the +original size of the filesystem. Further, portions of a +filesystem may be selectively restored using a method similar +to the tape archive program. +.PP +If you have to merge a filesystem into another, existing one, +the best bet is to use +.Xr tar (1). +If you must shrink a filesystem, the best bet is to dump +the original and restore it onto the new filesystem. +If you +are playing with the root filesystem and only have one drive, +the procedure is more complicated. +If the only drive is a Winchester disk, this procedure may not be used +without overwriting the existing root or another partition. +What you do is the following: +.IP 1. +GET A SECOND PACK, OR USE ANOTHER DISK DRIVE!!!! +.IP 2. +Dump the root filesystem to tape using +.Xr dump (8). +.IP 3. +Bring the system down. +.IP 4. +Mount the new pack in the correct disk drive, if +using removable media. +.IP 5. +Load the distribution tape and install the new +root filesystem as you did when first installing the system. +Boot normally +using the newly created disk filesystem. +.PP +Note that if you change the disk partition tables or add new disk +drivers they should also be added to the standalone system in +.Pn /sys/<architecture>/stand , +and the default disk partition tables in +.Pn /etc/disktab +should be modified. +.Sh 2 "Monitoring system performance" +.PP +The +.Xr systat +program provided with the system is designed to be an aid to monitoring +systemwide activity. The default ``pigs'' mode shows a dynamic ``ps''. +By running in the ``vmstat'' mode +when the system is active you can judge the system activity in several +dimensions: job distribution, virtual memory load, paging and swapping +activity, device interrupts, and disk and cpu utilization. +Ideally, there should be few blocked (b) jobs, +there should be little paging or swapping activity, there should +be available bandwidth on the disk devices (most single arms peak +out at 20-30 tps in practice), and the user cpu utilization (us) should +be high (above 50%). +.PP +If the system is busy, then the count of active jobs may be large, +and several of these jobs may often be blocked (b). If the virtual +memory is active, then the paging demon will be running (sr will +be non-zero). It is healthy for the paging demon to free pages when +the virtual memory gets active; it is triggered by the amount of free +memory dropping below a threshold and increases its pace as free memory +goes to zero. +.PP +If you run in the ``vmstat'' mode +when the system is busy, you can find +imbalances by noting abnormal job distributions. If many +processes are blocked (b), then the disk subsystem +is overloaded or imbalanced. If you have several non-dma +devices or open teletype lines that are ``ringing'', or user programs +that are doing high-speed non-buffered input/output, then the system +time may go high (60-70% or higher). +It is often possible to pin down the cause of high system time by +looking to see if there is excessive context switching (cs), interrupt +activity (in) and per-device interrupt counts, +or system call activity (sy). Cumulatively on one of +our large machines we average about 60-200 context switches and interrupts +per second and about 50-500 system calls per second. +.PP +If the system is heavily loaded, or if you have little memory +for your load (2M is little in most any case), then the system +may be forced to swap. This is likely to be accompanied by a noticeable +reduction in system performance and pregnant pauses when interactive +jobs such as editors swap out. +If you expect to be in a memory-poor environment +for an extended period you might consider administratively +limiting system load. +.Sh 2 "Recompiling and reinstalling system software" +.PP +It is easy to regenerate either the entire system or a single utility, +and it is a good idea to try rebuilding pieces of the system to build +confidence in the procedures. +.LP +In general, there are six well-known targets supported by +all the makefiles on the system: +.IP all 9 +This entry is the default target, the same as if no target is specified. +This target builds the kernel, binary or library, as well as its +associated manual pages. +This target \fBdoes not\fP build the dependency files. +Some of the utilities require that a \fImake depend\fP be done before +a \fImake all\fP can succeed. +.IP depend +Build the include file dependency file, ``.depend'', which is +read by +.Xr make . +See +.Xr mkdep (1) +for further details. +.IP install +Install the kernel, binary or library, as well as its associated +manual pages. +See +.Xr install (1) +for further details. +.IP clean +Remove the kernel, binary or library, as well as any object files +created when building it. +.IP cleandir +The same as clean, except that the dependency files and formatted +manual pages are removed as well. +.IP obj +Build a shadow directory structure in the area referenced by +.Pn /usr/obj +and create a symbolic link in the current source directory to +referenced it, named ``obj''. +Once this shadow structure has been created, all the files created by +.Xr make +will live in the shadow structure, and +.Pn /usr/src +may be mounted read-only by multiple machines. +Doing a \fImake obj\fP in +.Pn /usr/src +will build the shadow directory structure for everything on the +system except for the contributed, old, and kernel software. +.PP +The system consists of three major parts: +the kernel itself, found in +.Pn /usr/src/sys , +the libraries , found in +.Pn /usr/src/lib , +and the user programs (the rest of +.Pn /usr/src ). +.PP +Deprecated software, found in +.Pn /usr/src/old , +often has old style makefiles; +some of it does not compile under \*(4B at all. +.PP +Contributed software, found in +.Pn /usr/src/contrib , +usually does not support the ``cleandir'', ``depend'', or ``obj'' targets. +.PP +The kernel does not support the ``obj'' shadow structure. +All kernels are compiled in subdirectories of +.Pn /usr/src/sys/compile +which is usually abbreviated as +.Pn /sys/compile . +If you want to mount your source tree read-only, +.Pn /usr/src/sys/compile +will have to be on a separate filesystem from +.Pn /usr/src . +Separation from +.Pn /usr/src +can be done by making +.Pn /usr/src/sys/compile +a symbolic link that references +.Pn /usr/obj/sys/compile . +If it is a symbolic link, the \fIS\fP variable in the kernel +Makefile must be changed from +.Pn \&../.. +to the absolute pathname needed to locate the kernel sources, usually +.Pn /usr/src/sys . +The symbolic link created by +.Xr config (8) +for +.Pn machine +must also be manually changed to an absolute pathname. +Finally, the +.Pn /usr/src/sys/libkern/obj +directory must be located in +.Pn /usr/obj/sys/libkern . +.PP +Each of the standard utilities and libraries may be built and +installed by changing directories into the correct location and +doing: +.DS +\fB#\fP \fImake\fP +\fB#\fP \fImake install\fP +.DE +Note, if system include files have changed between compiles, +.Xr make +will not do the correct dependency checks if the dependency +files have not been built using the ``depend'' target. +.PP +The entire library and utility suite for the system may be recompiled +from scratch by changing directory to +.Pn /usr/src +and doing: +.DS +\fB#\fP \fImake build\fP +.DE +This target installs the system include files, cleans the source +tree, builds and installs the libraries, and builds and installs +the system utilities. +.PP +To recompile a specific program, first determine where the binary +resides with the +.Xr whereis (1) +command, then change to the corresponding source directory and build +it with the Makefile in the directory. +For instance, to recompile ``passwd'', +all one has to do is: +.DS +\fB#\fP \fIwhereis passwd\fP +\fB/usr/bin/passwd\fP +\fB#\fP \fIcd /usr/src/usr.bin/passwd\fP +\fB#\fP \fImake\fP +\fB#\fP \fImake install\fP +.DE +this will compile and install the +.Xr passwd +utility. +.PP +If you wish to recompile and install all programs into a particular +target area you can override the default path prefix by doing: +.DS +\fB#\fP \fImake\fP +\fB#\fP \fImake DESTDIR=\fPpathname \fIinstall\fP +.DE +Similarly, the mode, owner, group, and other characteristics of +the installed object can be modified by changing other default +make variables. +See +.Xr make (1), +.Pn /usr/src/share/mk/bsd.README , +and the ``.mk'' scripts in the +.Pn /usr/share/mk +directory for more information. +.PP +If you modify the C library or system include files, to change a +system call for example, and want to rebuild and install everything, +you have to be a little careful. +You must ensure that the include files are installed before anything +is compiled, and that the libraries are installed before the remainder +of the source, otherwise the loaded images will not contain the new +routine from the library. +If include files have been modified, the following commands should +be done first: +.DS +\fB#\fP \fIcd /usr/src/include\fP +\fB#\fP \fImake install\fP +.DE +Then, if, for example, C library files have been modified, the +following commands should be executed: +.DS +\fB#\fP \fIcd /usr/src/lib/libc\fP +\fB#\fP \fImake depend\fP +\fB#\fP \fImake\fP +\fB#\fP \fImake install\fP +\fB#\fP \fIcd /usr/src\fP +\fB#\fP \fImake depend\fP +\fB#\fP \fImake\fP +\fB#\fP \fImake install\fP +.DE +Alternatively, the \fImake build\fP command described above will +accomplish the same tasks. +This takes several hours on a reasonably configured machine. +.Sh 2 "Making local modifications" +.PP +The source for locally written commands is normally stored in +.Pn /usr/src/local , +and their binaries are kept in +.Pn /usr/local/bin . +This isolation of local binaries allows +.Pn /usr/bin , +and +.Pn /bin +to correspond to the distribution tape (and to the manuals that +people can buy). +People using local commands should be made aware that they are not +in the base manual. +Manual pages for local commands should be installed in +.Pn /usr/local/man/cat[1-8]. +The +.Xr man (1) +command automatically finds manual pages placed in +/usr/local/man/cat[1-8] to encourage this practice (see +.Xr man.conf (5)). +.Sh 2 "Accounting" +.PP +UNIX optionally records two kinds of accounting information: +connect time accounting and process resource accounting. The connect +time accounting information is stored in the file +.Pn /var/log/wtmp , +which is summarized by the program +.Xr ac (8). +The process time accounting information is stored in the file +.Pn /var/account/acct +after it is enabled by +.Xr accton (8), +and is analyzed and summarized by the program +.Xr sa (8). +.PP +If you need to recharge for computing time, you can develop +procedures based on the information provided by these commands. +A convenient way to do this is to give commands to the clock daemon +.Pn /usr/sbin/cron +to be executed every day at a specified time. +This is done by adding lines to +.Pn /etc/crontab.local ; +see +.Xr cron (8) +for details. +.Sh 2 "Resource control" +.PP +Resource control in the current version of UNIX is more +elaborate than in most UNIX systems. The disk quota +facilities developed at the University of Melbourne have +been incorporated in the system and allow control over the +number of files and amount of disk space each user and/or group may use +on each filesystem. In addition, the resources consumed +by any single process can be limited by the mechanisms of +.Xr setrlimit (2). +As distributed, the latter mechanism +is voluntary, though sites may choose to modify the login +mechanism to impose limits not covered with disk quotas. +.PP +To use the disk quota facilities, the system must be +configured with ``options QUOTA''. Filesystems may then +be placed under the quota mechanism by creating a null file +.Pn quota.user +and/or +.Pn quota.group +at the root of the filesystem, running +.Xr quotacheck (8), +and modifying +.Pn /etc/fstab +to show that the filesystem is to run +with disk quotas (options userquota and/or groupquota). +The +.Xr quotaon (8) +program may then be run to enable quotas. +.PP +Individual quotas are applied by using the quota editor +.Xr edquota (8). +Users may view their quotas (but not those of other users) with the +.Xr quota (1) +program. The +.Xr repquota (8) +program may be used to summarize the quotas and current +space usage on a particular filesystem or filesystems. +.PP +Quotas are enforced with \fIsoft\fP and \fIhard\fP limits. +When a user and/or group first reaches a soft limit on a resource, a +message is generated on their terminal. If the user and/or group fails to +lower the resource usage below the soft limit +for longer than the time limit established for that filesystem +(default seven days) the system then treats the soft limit as a +\fIhard\fP limit and disallows any allocations until enough space is +reclaimed to bring the user and/or group back below the soft limit. +Hard limits are enforced strictly resulting in errors when a user +and/or group tries to create or write a file. Each time a hard limit is +exceeded the system will generate a message on the user's terminal. +.PP +Consult the auxiliary document, ``Disc Quotas in a UNIX Environment'' (SMM:4) +and the appropriate manual entries for more information. +.Sh 2 "Network troubleshooting" +.PP +If you have anything more than a trivial network configuration, +from time to time you are bound to run into problems. Before +blaming the software, first check your network connections. On +networks such as the Ethernet a +loose cable tap or misplaced power cable can result in severely +deteriorated service. The +.Xr netstat (1) +program may be of aid in tracking down hardware malfunctions. +In particular, look at the \fB\-i\fP and \fB\-s\fP options in the manual page. +.PP +Should you believe a communication protocol problem exists, +consult the protocol specifications and attempt to isolate the +problem in a packet trace. The SO_DEBUG option may be supplied +before establishing a connection on a socket, in which case the +system will trace all traffic and internal actions (such as timers +expiring) in a circular trace buffer. +This buffer may then be printed out with the +.Xr trpt (8) +program. +Most of the servers distributed with the system +accept a \fB\-d\fP option forcing +all sockets to be created with debugging turned on. +Consult the appropriate manual pages for more information. +.Sh 2 "Files that need periodic attention" +.PP +We conclude the discussion of system operations by listing +the files that require periodic attention or are system specific: +.TS +center; +lfC l. +/etc/fstab how disk partitions are used +/etc/disktab default disk partition sizes/labels +/etc/printcap printer database +/etc/gettytab terminal type definitions +/etc/remote names and phone numbers of remote machines for \fItip\fP(1) +/etc/group group memberships +/etc/motd message of the day +/etc/master.passwd password file; each account has a line +/etc/rc.local local system restart script; runs reboot; starts daemons +/etc/inetd.conf local internet servers +/etc/hosts local host name database +/etc/networks network name database +/etc/services network services database +/etc/hosts.equiv hosts under same administrative control +/etc/syslog.conf error log configuration for \fIsyslogd\fP\|(8) +/etc/ttys enables/disables ports +/etc/crontab commands that are run periodically +/etc/crontab.local local commands that are run periodically +/etc/aliases mail forwarding and distribution groups +/var/account/acct raw process account data +/var/log/messages system error log +/var/log/wtmp login session accounting +.TE +.pn 2 +.bp +.PX diff --git a/share/doc/smm/01.setup/Makefile b/share/doc/smm/01.setup/Makefile new file mode 100644 index 000000000000..60351b562bb3 --- /dev/null +++ b/share/doc/smm/01.setup/Makefile @@ -0,0 +1,6 @@ +VOLUME= smm/01.setup +SRCS= stubs 0.t 1.t 2.t 3.t 4.t 5.t 6.t +MACROS= -ms +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/smm/01.setup/spell.ok b/share/doc/smm/01.setup/spell.ok new file mode 100644 index 000000000000..3ac251ebcb4e --- /dev/null +++ b/share/doc/smm/01.setup/spell.ok @@ -0,0 +1,617 @@ +A1096A +AA +ACU +AMD +Automounter +BA +BLOCKSIZE +BSD +Bb +Bostic +Bourne +Bs +Bz +CCI +CCITT +CLNP +CLTP +COMPAT +CPU's +CS80 +CSRG +CW +Catseye +Cyl +DAT +DECstation +DESTDIR +DISK's +DISKTYPE +DMA +DN11 +DV +DaVinci +Dk +Dn +Dy +EBCDIC +EEPROM's +EINTR +EISA +EOT +ERESTART +ESIS +Emulex +Exabyte +FDDI +FIPS +FPU +FTAM +Filesystem +Filesystems +GCC +GENERIC.hp300 +GX +Gatorbox +HDLC +HIL +HP +HP's +HP300 +HP300s +HP433 +HP9000 +HPBSD +HPUXCOMPAT +Hibler +IB +ICMP +IDP +IDs +IFLNK +IP +IPC +IPSENDREDIRECTS +IPX +ISA +ISO +ISVTX +Intel +Jul +Karels +Kerberos +L.aliases +L.cmds +L.sys +LAN +LAPB +LFS +LH +LK201 +LOGFILE +Leffler +Luna +MB +MC68040 +MFS +MIB +MIPS +MISC +MMU +MT02 +Macklem +Makefile +Makefiles +Maxtor +McKusick +NFS +NIC.ARPA +NPSECT +NTP +OHPUX +OS +OSI +OSes +Omron +PCATCH +PDT +PMAD +PMAG +PMAX +PMAZ +POSIX +POSIX.1 +POSIX.2 +PSD:5 +PVRX +Pathnames +Pendry +Postel +README +RFC +RH +RISC +ROM +RS232 +RZ23 +RZ55 +RZ57 +SCSI +SEQF +SIOCGIFCONF +SLC +SMM:1 +SMM:10 +SMM:13 +SMM:14 +SMM:2 +SMM:3 +SMM:4 +SMM:6 +SMM:7 +SMM:8 +SMM:9 +SMTP +SPARC +SPARCstation +SPP +SRC +SUNOS +Sbus +Solaris +Standalone +Std1003.1 +Std1003.2 +SunOS +TAI +TAPE's +TBOOT +TCP +TIOCSCTTY +TK50 +TM +TP4 +TURBOchannel +TVRX +TZ +Tcpmux +Topcat +Tue +UCB +UDP +UFS +ULTRIX +ULTRIXCOMPAT +UNIX''SMM:1 +USERFILE +USL +UTC +UUAIDS +UX +Ux +VAX +VFS +VME +X11R5 +XX +Xinu +a,c,u,p +a.out +adaptor +adaptors +addrlen +adiron +adm +aka +aliases.db +amd +autochanger +autoconf +autoconfiguration +autoconfigures +bdes +bootblock +bootimage +bootp +bootpath +bootrom +bootsd +bootstrapped +bs +bsd +bsd.README +btree +bwtwo +c.f +callback +cbosg +cbosgd +centronics +cfb0 +cgsix +cgthree +changelist +chflags +chico +chpass +cksum +cleandir +cleanerd +clnp +cltp +conf +conformant +contrib +cpio +crontab +crontab.local +cs +csh.cshrc +csh.login +csh.logout +cshrc +ct +cul0 +db +dbopen +dc7085 +deadfs +dev +dgram +dialcode +dialcodes +dict +disk3 +diskful +disklabel +disklabels +disktab +dm +dm.conf +dm.config +dma +doc +endian +es +esis +ext +fd +fdesc +fifofs +files.HOST +fileservers +filesystem +filesystems +foo +frags +friend.host.inet.number +fsf +fstab +fstype +ftpusers +ftpwelcome +fts +funopen +gcc +genkernel +getcap +gettytab +gid +gid's +gids +groff +groupquota +hangup +hanoi +heapsort +hexdump +hier +host.inet.number +hostmaster +hosts.equiv +hosts.lpd +hp +hp300 +hpib0 +inetd.conf +inline +inode +int +intr +iob +iso +kbyte +kbytes +kdb +kdump +kerberos +kerberosIV +kernfs +kmem +krb.conf +ksh +ktrace +kvm +labelling +lastlog +ld.so.cache +le +le0 +lfs +lib +libc +libdata +libedit +libexec +libkern +libkvm +libutil +localhost +lofs +logname +loopback +lq +lun +luna68k +magtape +mail.local +mail.rc +maillog +maillog.0 +maillog.1 +maillog.2 +maillog.3 +maillog.4 +maillog.5 +maillog.6 +maillog.7 +makefiles +man.conf +man0 +manl +master.passwd +maxine +mckusick +mdec +mediainit +mem +mergesort +mfb0 +mfs +misc +miscfs +mk +mkdb +mkdep +mkfifo +mmap +mnt +mono +motd +mountd +mqueue +msgbuf +mtree +my.domain +myfriend +myfriend.my.domain +myname +myname.my.domain +mysitename +namelen +net.inet.ip.redirect +netgroup +netinet +netiso +netmask +netns +netstart +netx25 +newdev +newlfs +news3400 +newkernel +nfs +nfsd +nfsiod +nfsstat +nfssvc +nodump +nowait +npsect +nr +nrmt0 +nrst0 +nsect +nullfs +nvi +obj +ogin +ok +okeeffe.CS.Berkeley.EDU +olddev +oldroot +opr +osockaddr +out0123456789 +out2010123456 +pageable +pathname +pathnames +pcc +picasso.CS.Berkeley.EDU +pid +pm0 +pmake +pmap +pmax +posix +printcap +pt0 +pty +pty.c +pty0 +pty1 +pty2 +pty3 +ptyp +pwd.db +quota.group +quota.user +quotas.user +radixsort +rc.local +rct +rd +rd0 +rdsk +recno +rew +rf +rhosts +rmt0 +rmt12 +ro +roff +root.dump +root.image +routedflags +rq +rrd0d +rrz?a +rrz?c +rsd0d +rsd3a +rsf +rst0 +rw +rxx0 +rz +rz0 +rz1c +rz?a +sbin +sc14 +sc7 +scsi0 +scsiformat +sd +sd0 +sd2b +sd3 +sd3a +sd660 +secretmail +securelevel +securettys +sendmail.cf +setsid +setup.tblms +shar +shareable +sizeof +skel +sockaddr +sparc +specfs +spwd.db +sr +src +srvtab +st +st.st +standalone +std +std.9600 +stderr +stdin +stdout +subr +sunos +sw +sy +sysctl +sysctl.c +syslog.0 +syslog.1 +syslog.2 +syslog.3 +syslog.4 +syslog.5 +syslog.6 +syslog.7 +syslog.conf +tahoe +tapehost +tcp +termios +tmac +tmp +toor +tps +tput +tsleep +tty.c +tty00 +ttya +ttyb +ttyd +ttyp +ttyp0 +ttytype +types.h +tz +tz6 +ucb +ufs +uid +uid's +uids +uipc +umap +umapfs +un.h +unaddr.sun +uname +unistd.h +userid +username +userquota +usr.bin +usr.sbin +utmp +uucp.daemon +uucppublic +uw +vangogh.CS.Berkeley.EDU +var +vax +ventel +vfs +vm +kernel +kernel.net +kernel.tape +vnode +vnodes +vol +vt100 +wildcard +wr +wsrc +wtmp +xargs +xbpf +xcfb0 +xf +xp +xpbf +xpf +xsf +xx +xx0 +xxx +your.host.inet.number +yymmddhhmm +zA +zoneinfo diff --git a/share/doc/smm/01.setup/stubs b/share/doc/smm/01.setup/stubs new file mode 100644 index 000000000000..ee1b74f8bc1b --- /dev/null +++ b/share/doc/smm/01.setup/stubs @@ -0,0 +1,5 @@ +.\" +.if n \{\ +. ftr CW R +. ftr C R +.\} diff --git a/share/doc/smm/02.config/0.t b/share/doc/smm/02.config/0.t new file mode 100644 index 000000000000..62eb82176106 --- /dev/null +++ b/share/doc/smm/02.config/0.t @@ -0,0 +1,82 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.bd S B 3 +.de UX +.ie \\n(GA>0 \\$2UNIX\\$1 +.el \{\ +.if n \\$2UNIX\\$1* +.if t \\$2UNIX\\$1\\f1\(dg\\fP +.FS +.if n *UNIX +.if t \(dgUNIX +.ie \\$3=1 is a Footnote of Bell Laboratories. +.el is a Trademark of Bell Laboratories. +.FE +.nr GA 1\} +.. +.de BR +\fB\\$1\fP\\$2 +.. +.TL +Building 4.4BSD Kernels with Config +.AU +Samuel J. Leffler and Michael J. Karels +.AI +Computer Systems Research Group +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, California 94720 +.de IR +\fI\\$1\fP\\$2 +.. +.de DT +.TA 8 16 24 32 40 48 56 64 72 80 +.. +.AB +.PP +This document describes the use of +\fIconfig\fP\|(8) to configure and create bootable +4.4BSD system images. +It discusses the structure of system +configuration files and how to configure +systems with non-standard hardware configurations. +Sections describing the preferred way to +add new code to the system and how the system's autoconfiguration +process operates are included. An appendix +contains a summary of the rules used by the system +in calculating the size of system data structures, +and also indicates some of the standard system size +limitations (and how to change them). +Other configuration options are also listed. +.sp +.LP +Revised July 5, 1993 +.AE +.LP +.OH 'Building 4.4BSD Kernels with Config''SMM:2-%' +.EH 'SMM:2-%''Building 4.4BSD Kernels with Config' diff --git a/share/doc/smm/02.config/1.t b/share/doc/smm/02.config/1.t new file mode 100644 index 000000000000..e0d2086b178d --- /dev/null +++ b/share/doc/smm/02.config/1.t @@ -0,0 +1,55 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH Introduction +.ne 2i +.sp 3 +.NH +INTRODUCTION +.PP +.I Config +is a tool used in building 4.4BSD system images (the UNIX kernel). +It takes a file describing a system's tunable parameters and +hardware support, and generates a collection +of files which are then used to build a copy of UNIX appropriate +to that configuration. +.I Config +simplifies system maintenance by isolating system dependencies +in a single, easy to understand, file. +.PP +This document describes the content and +format of system configuration +files and the rules which must be followed when creating +these files. Example configuration files are constructed +and discussed. +.PP +Later sections suggest guidelines to be used in modifying +system source and explain some of the inner workings of the +autoconfiguration process. Appendix D summarizes the rules +used in calculating the most important system data structures +and indicates some inherent system data structure size +limitations (and how to go about modifying them). diff --git a/share/doc/smm/02.config/2.t b/share/doc/smm/02.config/2.t new file mode 100644 index 000000000000..2fd42da4d719 --- /dev/null +++ b/share/doc/smm/02.config/2.t @@ -0,0 +1,182 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH "Configuration File Contents +.ne 2i +.NH +CONFIGURATION FILE CONTENTS +.PP +A system configuration must include at least the following +pieces of information: +.IP \(bu 3 +machine type +.IP \(bu 3 +cpu type +.IP \(bu 3 +system identification +.IP \(bu 3 +timezone +.IP \(bu 3 +maximum number of users +.IP \(bu 3 +location of the root file system +.IP \(bu 3 +available hardware +.PP +.I Config +allows multiple system images to be generated from a single +configuration description. Each system image is configured +for identical hardware, but may have different locations for the root +file system and, possibly, other system devices. +.NH 2 +Machine type +.PP +The +.I "machine type" +indicates if the system is going to operate on a DEC VAX-11\(dg computer, +.FS +\(dg DEC, VAX, UNIBUS, MASSBUS and MicroVAX are trademarks of Digital +Equipment Corporation. +.FE +or some other machine on which 4.4BSD operates. The machine type +is used to locate certain data files which are machine specific, and +also to select rules used in constructing the resultant +configuration files. +.NH 2 +Cpu type +.PP +The +.I "cpu type" +indicates which, of possibly many, cpu's the system is to operate on. +For example, if the system is being configured for a VAX-11, it could +be running on a VAX 8600, VAX-11/780, VAX-11/750, VAX-11/730 or MicroVAX II. +(Other VAX cpu types, including the 8650, 785 and 725, are configured using +the cpu designation for compatible machines introduced earlier.) +Specifying +more than one cpu type implies that the system should be configured to run +on any of the cpu's specified. For some types of machines this is not +possible and +.I config +will print a diagnostic indicating such. +.NH 2 +System identification +.PP +The +.I "system identification" +is a moniker attached to the system, and often the machine on which the +system is to run. For example, at Berkeley we have machines named Ernie +(Co-VAX), Kim (No-VAX), and so on. The system identifier selected is used to +create a global C ``#define'' which may be used to isolate system dependent +pieces of code in the kernel. For example, Ernie's Varian driver used +to be special cased because its interrupt vectors were wired together. The +code in the driver which understood how to handle this non-standard hardware +configuration was conditionally compiled in only if the system +was for Ernie. +.PP +The system identifier ``GENERIC'' is given to a system which +will run on any cpu of a particular machine type; it should not +otherwise be used for a system identifier. +.NH 2 +Timezone +.PP +The timezone in which the system is to run is used to define the +information returned by the \fIgettimeofday\fP\|(2) +system call. This value is specified as the number of hours east +or west of GMT. Negative numbers indicate a value east of GMT. +The timezone specification may also indicate the +type of daylight savings time rules to be applied. +.NH 2 +Maximum number of users +.PP +The system allocates many system data structures at boot time +based on the maximum number of users the system will support. +This number is normally between 8 and 40, depending +on the hardware and expected job mix. The rules +used to calculate system data structures are discussed in +Appendix D. +.NH 2 +Root file system location +.PP +When the system boots it must know the location of +the root of the file system +tree. This location and the part(s) of the disk(s) to be used +for paging and swapping must be specified in order to create +a complete configuration description. +.I Config +uses many rules to calculate default locations for these items; +these are described in Appendix B. +.PP +When a generic system is configured, the root file system is left +undefined until the system is booted. In this case, the root file +system need not be specified, only that the system is a generic system. +.NH 2 +Hardware devices +.PP +When the system boots it goes through an +.I autoconfiguration +phase. During this period, the system searches for all +those hardware devices +which the system builder has indicated might be present. This probing +sequence requires certain pieces of information such as register +addresses, bus interconnects, etc. A system's hardware may be configured +in a very flexible manner or be specified without any flexibility +whatsoever. Most people do not configure hardware devices into the +system unless they are currently present on the machine, expect +them to be present in the near future, or are simply guarding +against a hardware +failure somewhere else at the site (it is often wise to configure in +extra disks in case an emergency requires moving one off a machine which +has hardware problems). +.PP +The specification of hardware devices usually occupies the majority of +the configuration file. As such, a large portion of this document will +be spent understanding it. Section 6.3 contains a description of +the autoconfiguration process, as it applies to those planning to +write, or modify existing, device drivers. +.NH 2 +Pseudo devices +.PP +Several system facilities are configured in a manner like that used +for hardware devices although they are not associated with specific hardware. +These system options are configured as +.IR pseudo-devices . +Some pseudo devices allow an optional parameter that sets the limit +on the number of instances of the device that are active simultaneously. +.NH 2 +System options +.PP +Other than the mandatory pieces of information described above, it +is also possible to include various optional system facilities +or to modify system behavior and/or limits. +For example, 4.4BSD can be configured to support binary compatibility for +programs built under 4.3BSD. Also, optional support is provided +for disk quotas and tracing the performance of the virtual memory +subsystem. Any optional facilities to be configured into +the system are specified in the configuration file. The resultant +files generated by +.I config +will automatically include the necessary pieces of the system. diff --git a/share/doc/smm/02.config/3.t b/share/doc/smm/02.config/3.t new file mode 100644 index 000000000000..d5331f6f6706 --- /dev/null +++ b/share/doc/smm/02.config/3.t @@ -0,0 +1,293 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH "System Building Process +.ne 2i +.NH +SYSTEM BUILDING PROCESS +.PP +In this section we consider the steps necessary to build a bootable system +image. We assume the system source is located in the ``/sys'' directory +and that, initially, the system is being configured from source code. +.PP +Under normal circumstances there are 5 steps in building a system. +.IP 1) 3 +Create a configuration file for the system. +.IP 2) 3 +Make a directory for the system to be constructed in. +.IP 3) 3 +Run +.I config +on the configuration file to generate the files required +to compile and load the system image. +.IP 4) +Construct the source code interdependency rules for the +configured system with +.I make depend +using +.IR make (1). +.IP 5) +Compile and load the system with +.IR make . +.PP +Steps 1 and 2 are usually done only once. When a system configuration +changes it usually suffices to just run +.I config +on the modified configuration file, rebuild the source code dependencies, +and remake the system. Sometimes, +however, configuration dependencies may not be noticed in which case +it is necessary to clean out the relocatable object files saved +in the system's directory; this will be discussed later. +.NH 2 +Creating a configuration file +.PP +Configuration files normally reside in the directory ``/sys/conf''. +A configuration file is most easily constructed by copying an +existing configuration file and modifying it. The 4.4BSD distribution +contains a number of configuration files for machines at Berkeley; +one may be suitable or, in worst case, a copy +of the generic configuration file may be edited. +.PP +The configuration file must have the same name as the directory in +which the configured system is to be built. +Further, +.I config +assumes this directory is located in the parent directory of +the directory in which it +is run. For example, the generic +system has a configuration file ``/sys/conf/GENERIC'' and an accompanying +directory named ``/sys/GENERIC''. +Although it is not required that the system sources and configuration +files reside in ``/sys,'' the configuration and compilation procedure +depends on the relative locations of directories within that hierarchy, +as most of the system code and the files created by +.I config +use pathnames of the form ``../''. +If the system files are not located in ``/sys,'' +it is desirable to make a symbolic link there for use in installation +of other parts of the system that share files with the kernel. +.PP +When building the configuration file, be sure to include the items +described in section 2. In particular, the machine type, +cpu type, timezone, system identifier, maximum users, and root device +must be specified. The specification of the hardware present may take +a bit of work; particularly if your hardware is configured at non-standard +places (e.g. device registers located at funny places or devices not +supported by the system). Section 4 of this document +gives a detailed description of the configuration file syntax, +section 5 explains some sample configuration files, and +section 6 discusses how to add new devices to +the system. If the devices to be configured are not already +described in one of the existing configuration files you should check +the manual pages in section 4 of the UNIX Programmers Manual. For each +supported device, the manual page synopsis entry gives a +sample configuration line. +.PP +Once the configuration file is complete, run it through +.I config +and look for any errors. Never try and use a system which +.I config +has complained about; the results are unpredictable. +For the most part, +.IR config 's +error diagnostics are self explanatory. It may be the case that +the line numbers given with the error messages are off by one. +.PP +A successful run of +.I config +on your configuration file will generate a number of files in +the configuration directory. These files are: +.IP \(bu 3 +A file to be used by \fImake\fP\|(1) +in compiling and loading the system, +.IR Makefile . +.IP \(bu 3 +One file for each possible system image for this machine, +.IR swapxxx.c , +where +.I xxx +is the name of the system image, +which describes where swapping, the root file system, and other +miscellaneous system devices are located. +.IP \(bu 3 +A collection of header files, one per possible device the +system supports, which define the hardware configured. +.IP \(bu 3 +A file containing the I/O configuration tables used by the system +during its +.I autoconfiguration +phase, +.IR ioconf.c . +.IP \(bu 3 +An assembly language file of interrupt vectors which +connect interrupts from the machine's external buses to the main +system path for handling interrupts, +and a file that contains counters and names for the interrupt vectors. +.PP +Unless you have reason to doubt +.IR config , +or are curious how the system's autoconfiguration scheme +works, you should never have to look at any of these files. +.NH 2 +Constructing source code dependencies +.PP +When +.I config +is done generating the files needed to compile and link your system it +will terminate with a message of the form ``Don't forget to run make depend''. +This is a reminder that you should change over to the configuration +directory for the system just configured and type ``make depend'' +to build the rules used by +.I make +to recognize interdependencies in the system source code. +This will insure that any changes to a piece of the system +source code will result in the proper modules being recompiled +the next time +.I make +is run. +.PP +This step is particularly important if your site makes changes +to the system include files. The rules generated specify which source code +files are dependent on which include files. Without these rules, +.I make +will not recognize when it must rebuild modules +due to the modification of a system header file. +The dependency rules are generated by a pass of the C preprocessor +and reflect the global system options. +This step must be repeated when the configuration file is changed +and +.I config +is used to regenerate the system makefile. +.NH 2 +Building the system +.PP +The makefile constructed by +.I config +should allow a new system to be rebuilt by simply typing ``make image-name''. +For example, if you have named your bootable system image ``kernel'', +then ``make kernel'' +will generate a bootable image named ``kernel''. Alternate system image names +are used when the root file system location and/or swapping configuration +is done in more than one way. The makefile which +.I config +creates has entry points for each system image defined in +the configuration file. +Thus, if you have configured ``kernel'' to be a system with the root file +system on an ``hp'' device and ``hkkernel'' to be a system with the root +file system on an ``hk'' device, then ``make kernel hkkernel'' will generate +binary images for each. +As the system will generally use the disk from which it is loaded +as the root filesystem, separate system images are only required +to support different swap configurations. +.PP +Note that the name of a bootable image is different from the system +identifier. All bootable images are configured for the same system; +only the information about the root file system and paging devices differ. +(This is described in more detail in section 4.) +.PP +The last step in the system building process is to rearrange certain commonly +used symbols in the symbol table of the system image; the makefile +generated by +.I config +does this automatically for you. +This is advantageous for programs such as +\fInetstat\fP\|(1) and \fIvmstat\fP\|(1), +which run much faster when the symbols they need are located at +the front of the symbol table. +Remember also that many programs expect +the currently executing system to be named ``/kernel''. If you install +a new system and name it something other than ``/kernel'', many programs +are likely to give strange results. +.NH 2 +Sharing object modules +.PP +If you have many systems which are all built on a single machine +there are at least two approaches to saving time in building system +images. The best way is to have a single system image which is run on +all machines. This is attractive since it minimizes disk space used +and time required to rebuild systems after making changes. However, +it is often the case that one or more systems will require a separately +configured system image. This may be due to limited memory (building +a system with many unused device drivers can be expensive), or to +configuration requirements (one machine may be a development machine +where disk quotas are not needed, while another is a production machine +where they are), etc. In these cases it is possible +for common systems to share relocatable object modules which are not +configuration dependent; most of the modules in the directory ``/sys/sys'' +are of this sort. +.PP +To share object modules, a generic system should be built. Then, for +each system configure the system as before, but before recompiling and +linking the system, type ``make links'' in the system compilation directory. +This will cause the system +to be searched for source modules which are safe to share between systems +and generate symbolic links in the current directory to the appropriate +object modules in the directory ``../GENERIC''. A shell script, +``makelinks'' is generated with this request and may be checked for +correctness. The file ``/sys/conf/defines'' contains a list of symbols +which we believe are safe to ignore when checking the source code +for modules which may be shared. Note that this list includes the definitions +used to conditionally compile in the virtual memory tracing facilities, and +the trace point support used only rarely (even at Berkeley). +It may be necessary +to modify this file to reflect local needs. Note further that +interdependencies which are not directly visible +in the source code are not caught. This means that if you place +per-system dependencies in an include file, they will not be recognized +and the shared code may be selected in an unexpected fashion. +.NH 2 +Building profiled systems +.PP +It is simple to configure a system which will automatically +collect profiling information as it operates. The profiling data +may be collected with \fIkgmon\fP\|(8) and processed with +\fIgprof\fP\|(1) +to obtain information regarding the system's operation. Profiled +systems maintain histograms of the program counter as well as the +number of invocations of each routine. The \fIgprof\fP +command will also generate a dynamic call graph of the executing +system and propagate time spent in each routine along the arcs +of the call graph (consult the \fIgprof\fP documentation for elaboration). +The program counter sampling can be driven by the system clock, or +if you have an alternate real time clock, this can be used. The +latter is highly recommended, as use of the system clock will result +in statistical anomalies, and time spent in the clock routine will +not be accurately attributed. +.PP +To configure a profiled system, the +.B \-p +option should be supplied to \fIconfig\fP. +A profiled system is about 5-10% larger in its text space due to +the calls to count the subroutine invocations. When the system +executes, the profiling data is stored in a buffer which is 1.2 +times the size of the text space. The overhead for running a +profiled system varies; under normal load we see anywhere from 5-25% +of the system time spent in the profiling code. +.PP +Note that systems configured for profiling should not be shared as +described above unless all the other shared systems are also to be +profiled. diff --git a/share/doc/smm/02.config/4.t b/share/doc/smm/02.config/4.t new file mode 100644 index 000000000000..a8add4f7cead --- /dev/null +++ b/share/doc/smm/02.config/4.t @@ -0,0 +1,436 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH "Configuration File Syntax +.ne 2i +.NH +CONFIGURATION FILE SYNTAX +.PP +In this section we consider the specific rules used in writing +a configuration file. A complete grammar for the input language +can be found in Appendix A and may be of use if you should have +problems with syntax errors. +.PP +A configuration file is broken up into three logical pieces: +.IP \(bu 3 +configuration parameters global to all system images +specified in the configuration file, +.IP \(bu 3 +parameters specific to each +system image to be generated, and +.IP \(bu 3 +device specifications. +.NH 2 +Global configuration parameters +.PP +The global configuration parameters are the type of machine, +cpu types, options, timezone, system identifier, and maximum users. +Each is specified with a separate line in the configuration file. +.IP "\fBmachine\fP \fItype\fP" +.br +The system is to run on the machine type specified. No more than +one machine type can appear in the configuration file. Legal values +are +.B vax +and +\fBsun\fP. +.IP "\fBcpu\fP ``\fItype\fP''" +.br +This system is to run on the cpu type specified. +More than one cpu type specification +can appear in a configuration file. +Legal types for a +.B vax +machine are +\fBVAX8600\fP, \fBVAX780\fP, \fBVAX750\fP, +\fBVAX730\fP +and +\fBVAX630\fP (MicroVAX II). +The 8650 is listed as an 8600, the 785 as a 780, and a 725 as a 730. +.IP "\fBoptions\fP \fIoptionlist\fP" +.br +Compile the listed optional code into the system. +Options in this list are separated by commas. +Possible options are listed at the top of the generic makefile. +A line of the form ``options FUNNY,HAHA'' generates global ``#define''s +\-DFUNNY \-DHAHA in the resultant makefile. +An option may be given a value by following its name with ``\fB=\fP'', +then the value enclosed in (double) quotes. +The following are major options are currently in use: +COMPAT (include code for compatibility with 4.1BSD binaries), +INET (Internet communication protocols), +NS (Xerox NS communication protocols), +and +QUOTA (enable disk quotas). +Other kernel options controlling system sizes and limits +are listed in Appendix D; +options for the network are found in Appendix E. +There are additional options which are associated with certain +peripheral devices; those are listed in the Synopsis section +of the manual page for the device. +.IP "\fBmakeoptions\fP \fIoptionlist\fP" +.br +Options that are used within the system makefile +and evaluated by +.I make +are listed as +.IR makeoptions . +Options are listed with their values with the form +``makeoptions name=value,name2=value2.'' +The values must be enclosed in double quotes if they include numerals +or begin with a dash. +.IP "\fBtimezone\fP \fInumber\fP [ \fBdst\fP [ \fInumber\fP ] ]" +.br +Specifies the timezone used by the system. This is measured in the +number of hours your timezone is west of GMT. +EST is 5 hours west of GMT, PST is 8. Negative numbers +indicate hours east of GMT. If you specify +\fBdst\fP, the system will operate under daylight savings time. +An optional integer or floating point number may be included +to specify a particular daylight saving time correction algorithm; +the default value is 1, indicating the United States. +Other values are: 2 (Australian style), 3 (Western European), +4 (Middle European), and 5 (Eastern European). See +\fIgettimeofday\fP\|(2) and \fIctime\fP\|(3) for more information. +.IP "\fBident\fP \fIname\fP" +.br +This system is to be known as +.IR name . +This is usually a cute name like ERNIE (short for Ernie Co-Vax) or +VAXWELL (for Vaxwell Smart). +This value is defined for use in conditional compilation, +and is also used to locate an optional list of source files specific +to this system. +.IP "\fBmaxusers\fP \fInumber\fP" +.br +The maximum expected number of simultaneously active user on this system is +.IR number . +This number is used to size several system data structures. +.NH 2 +System image parameters +.PP +Multiple bootable images may be specified in a single configuration +file. The systems will have the same global configuration parameters +and devices, but the location of the root file system and other +system specific devices may be different. A system image is specified +with a ``config'' line: +.IP +\fBconfig\fP\ \fIsysname\fP\ \fIconfig-clauses\fP +.LP +The +.I sysname +field is the name given to the loaded system image; almost everyone +names their standard system image ``kernel''. The configuration clauses +are one or more specifications indicating where the root file system +is located and the number and location of paging devices. +The device used by the system to process argument lists during +.IR execve (2) +calls may also be specified, though in practice this is almost +always selected by +.I config +using one of its rules for selecting default locations for +system devices. +.PP +A configuration clause is one of the following +.IP +.nf +\fBroot\fP [ \fBon\fP ] \fIroot-device\fP +\fBswap\fP [ \fBon\fP ] \fIswap-device\fP [ \fBand\fP \fIswap-device\fP ] ... +\fBdumps\fP [ \fBon\fP ] \fIdump-device\fP +\fBargs\fP [ \fBon\fP ] \fIarg-device\fP +.LP +(the ``on'' is optional.) Multiple configuration clauses +are separated by white space; +.I config +allows specifications to be continued across multiple lines +by beginning the continuation line with a tab character. +The ``root'' clause specifies where the root file system +is located, the ``swap'' clause indicates swapping and paging +area(s), the ``dumps'' clause can be used to force system dumps +to be taken on a particular device, and the ``args'' clause +can be used to specify that argument list processing for +.I execve +should be done on a particular device. +.PP +The device names supplied in the clauses may be fully specified +as a device, unit, and file system partition; or underspecified +in which case +.I config +will use builtin rules to select default unit numbers and file +system partitions. The defaulting rules are a bit complicated +as they are dependent on the overall system configuration. +For example, the swap area need not be specified at all if +the root device is specified; in this case the swap area is +placed in the ``b'' partition of the same disk where the root +file system is located. Appendix B contains a complete list +of the defaulting rules used in selecting system configuration +devices. +.PP +The device names are translated to the +appropriate major and minor device +numbers on a per-machine basis. A file, +``/sys/conf/devices.machine'' (where ``machine'' +is the machine type specified in the configuration file), +is used to map a device name to its major block device number. +The minor device number is calculated using the standard +disk partitioning rules: on unit 0, partition ``a'' is minor device +0, partition ``b'' is minor device 1, and so on; for units +other than 0, add 8 times the unit number to get the minor +device. +.PP +If the default mapping of device name to major/minor device +number is incorrect for your configuration, it can be replaced +by an explicit specification of the major/minor device. +This is done by substituting +.IP +\fBmajor\fP \fIx\fP \fBminor\fP \fIy\fP +.LP +where the device name would normally be found. For example, +.IP +.nf +\fBconfig\fP kernel \fBroot\fP \fBon\fP \fBmajor\fP 99 \fBminor\fP 1 +.fi +.PP +Normally, the areas configured for swap space are sized by the system +at boot time. If a non-standard size is to be used for one +or more swap areas (less than the full partition), +this can also be specified. To do this, the +device name specified for a swap area should have a ``size'' +specification appended. For example, +.IP +.nf +\fBconfig\fP kernel \fBroot\fP \fBon\fP hp0 \fBswap\fP \fBon\fP hp0b \fBsize\fP 1200 +.fi +.LP +would force swapping to be done in partition ``b'' of ``hp0'' and +the swap partition size would be set to 1200 sectors. A swap area +sized larger than the associated disk partition is trimmed to the +partition size. +.PP +To create a generic configuration, only the clause ``swap generic'' +should be specified; any extra clauses will cause an error. +.NH 2 +Device specifications +.PP +Each device attached to a machine must be specified +to +.I config +so that the system generated will know to probe for it during +the autoconfiguration process carried out at boot time. Hardware +specified in the configuration need not actually be present on +the machine where the generated system is to be run. Only the +hardware actually found at boot time will be used by the system. +.PP +The specification of hardware devices in the configuration file +parallels the interconnection hierarchy of the machine to be +configured. On the VAX, this means that a configuration file must +indicate what MASSBUS and UNIBUS adapters are present, and to +which \fInexi\fP they might be connected.* +.FS +* While VAX-11/750's and VAX-11/730 do not actually have +nexi, the system treats them as having +.I "simulated nexi" +to simplify device configuration. +.FE +Similarly, devices +and controllers must be indicated as possibly being connected +to one or more adapters. A device description may provide a +complete definition of the possible configuration parameters +or it may leave certain parameters undefined and make the system +probe for all the possible values. The latter allows a single +device configuration list to match many possible physical +configurations. For example, a disk may be indicated as present +at UNIBUS adapter 0, or at any UNIBUS adapter which the system +locates at boot time. The latter scheme, termed +.IR wildcarding , +allows more flexibility in the physical configuration of a system; +if a disk must be moved around for some reason, the system will +still locate it at the alternate location. +.PP +A device specification takes one of the following forms: +.IP +.nf +\fBmaster\fP \fIdevice-name\fP \fIdevice-info\fP +\fBcontroller\fP \fIdevice-name\fP \fIdevice-info\fP [ \fIinterrupt-spec\fP ] +\fBdevice\fP \fIdevice-name\fP \fIdevice-info\fP \fIinterrupt-spec\fP +\fBdisk\fP \fIdevice-name\fP \fIdevice-info\fP +\fBtape\fP \fIdevice-name\fP \fIdevice-info\fP +.fi +.LP +A ``master'' is a MASSBUS tape controller; a ``controller'' is a +disk controller, a UNIBUS tape controller, a MASSBUS adapter, or +a UNIBUS adapter. A ``device'' is an autonomous device which +connects directly to a UNIBUS adapter (as opposed to something +like a disk which connects through a disk controller). ``Disk'' +and ``tape'' identify disk drives and tape drives connected to +a ``controller'' or ``master.'' +.PP +The +.I device-name +is one of the standard device names, as +indicated in section 4 of the UNIX Programmers Manual, +concatenated with the +.I logical +unit number to be assigned the device (the +.I logical +unit number may be different than the +.I physical +unit number indicated on the front of something +like a disk; the +.I logical +unit number is used to refer to the UNIX device, not +the physical unit number). For example, ``hp0'' is logical +unit 0 of a MASSBUS storage device, even though it might +be physical unit 3 on MASSBUS adapter 1. +.PP +The +.I device-info +clause specifies how the hardware is +connected in the interconnection hierarchy. On the VAX, +UNIBUS and MASSBUS adapters are connected to the internal +system bus through +a \fInexus\fP. +Thus, one of the following +specifications would be used: +.IP +.ta 1.5i 2.5i 4.0i +.nf +\fBcontroller\fP mba0 \fBat\fP \fBnexus\fP \fIx\fP +\fBcontroller\fP uba0 \fBat\fP \fBnexus\fP \fIx\fP +.fi +.LP +To tie a controller to a specific nexus, ``x'' would be supplied +as the number of that nexus; otherwise ``x'' may be specified as +``?'', in which +case the system will probe all nexi present looking +for the specified controller. +.PP +The remaining interconnections on the VAX are: +.IP \(bu 3 +a controller +may be connected to another controller (e.g. a disk controller attached +to a UNIBUS adapter), +.IP \(bu 3 +a master is always attached to a controller (a MASSBUS adapter), +.IP \(bu 3 +a tape is always attached to a master (for MASSBUS +tape drives), +.IP \(bu 3 +a disk is always attached to a controller, and +.IP \(bu 3 +devices +are always attached to controllers (e.g. UNIBUS controllers attached +to UNIBUS adapters). +.LP +The following lines give an example of each of these interconnections: +.IP +.ta 1.5i 2.5i 4.0i +.nf +\fBcontroller\fP hk0 \fBat\fP uba0 ... +\fBmaster\fP ht0 \fBat\fP mba0 ... +\fBdisk\fP hp0 \fBat\fP mba0 ... +\fBtape\fP tu0 \fBat\fP ht0 ... +\fBdisk\fP rk1 \fBat\fP hk0 ... +\fBdevice\fP dz0 \fBat\fP uba0 ... +.fi +.LP +Any piece of hardware which may be connected to a specific +controller may also be wildcarded across multiple controllers. +.PP +The final piece of information needed by the system to configure +devices is some indication of where or how a device will interrupt. +For tapes and disks, simply specifying the \fIslave\fP or \fIdrive\fP +number is sufficient to locate the control status register for the +device. +\fIDrive\fP numbers may be wildcarded +on MASSBUS devices, but not on disks on a UNIBUS controller. +For controllers, the control status register must be +given explicitly, as well the number of interrupt vectors used and +the names of the routines to which they should be bound. +Thus the example lines given above might be completed as: +.IP +.ta 1.5i 2.5i 4.0i +.nf +\fBcontroller\fP hk0 \fBat\fP uba0 \fBcsr\fP 0177440 \fBvector\fP rkintr +\fBmaster\fP ht0 \fBat\fP mba0 \fBdrive\fP 0 +\fBdisk\fP hp0 \fBat\fP mba0 \fBdrive\fP ? +\fBtape\fP tu0 \fBat\fP ht0 \fBslave\fP 0 +\fBdisk\fP rk1 \fBat\fP hk0 \fBdrive\fP 1 +\fBdevice\fP dz0 \fBat\fP uba0 \fBcsr\fP 0160100 \fBvector\fP dzrint dzxint +.fi +.PP +Certain device drivers require extra information passed to them +at boot time to tailor their operation to the actual hardware present. +The line printer driver, for example, needs to know how many columns +are present on each non-standard line printer (i.e. a line printer +with other than 80 columns). The drivers for the terminal multiplexors +need to know which lines are attached to modem lines so that no one will +be allowed to use them unless a connection is present. For this reason, +one last parameter may be specified to a +.IR device , +a +.I flags +field. It has the syntax +.IP +\fBflags\fP \fInumber\fP +.LP +and is usually placed after the +.I csr +specification. The +.I number +is passed directly to the associated driver. The manual pages +in section 4 should be consulted to determine how each driver +uses this value (if at all). +Communications interface drivers commonly use the flags +to indicate whether modem control signals are in use. +.PP +The exact syntax for each specific device is given in the Synopsis +section of its manual page in section 4 of the manual. +.NH 2 +Pseudo-devices +.PP +A number of drivers and software subsystems +are treated like device drivers without any associated hardware. +To include any of these pieces, a ``pseudo-device'' specification +must be used. A specification for a pseudo device takes the form +.IP +.DT +.nf +\fBpseudo-device\fP \fIdevice-name\fP [ \fIhowmany\fP ] +.fi +.PP +Examples of pseudo devices are +\fBpty\fP, the pseudo terminal driver (where the optional +.I howmany +value indicates the number of pseudo terminals to configure, 32 default), +and \fBloop\fP, the software loopback network pseudo-interface. +Other pseudo devices for the network include +\fBimp\fP (required when a CSS or ACC imp is configured) +and \fBether\fP (used by the Address Resolution Protocol +on 10 Mb/sec Ethernets). +More information on configuring each of these can also be found +in section 4 of the manual. diff --git a/share/doc/smm/02.config/5.t b/share/doc/smm/02.config/5.t new file mode 100644 index 000000000000..2c03c11895d5 --- /dev/null +++ b/share/doc/smm/02.config/5.t @@ -0,0 +1,265 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH "Sample Configuration Files +.ne 2i +.NH +SAMPLE CONFIGURATION FILES +.PP +In this section we will consider how to configure a +sample VAX-11/780 system on which the hardware can be +reconfigured to guard against various hardware mishaps. +We then study the rules needed to configure a VAX-11/750 +to run in a networking environment. +.NH 2 +VAX-11/780 System +.PP +Our VAX-11/780 is configured with hardware +recommended in the document ``Hints on Configuring a VAX for 4.2BSD'' +(this is one of the high-end configurations). +Table 1 lists the pertinent hardware to be configured. +.DS B +.TS +box; +l | l | l | l | l +l | l | l | l | l. +Item Vendor Connection Name Reference +_ +cpu DEC VAX780 +MASSBUS controller Emulex nexus ? mba0 hp(4) +disk Fujitsu mba0 hp0 +disk Fujitsu mba0 hp1 +MASSBUS controller Emulex nexus ? mba1 +disk Fujitsu mba1 hp2 +disk Fujitsu mba1 hp3 +UNIBUS adapter DEC nexus ? +tape controller Emulex uba0 tm0 tm(4) +tape drive Kennedy tm0 te0 +tape drive Kennedy tm0 te1 +terminal multiplexor Emulex uba0 dh0 dh(4) +terminal multiplexor Emulex uba0 dh1 +terminal multiplexor Emulex uba0 dh2 +.TE +.DE +.ce +Table 1. VAX-11/780 Hardware support. +.LP +We will call this machine ANSEL and construct a configuration +file one step at a time. +.PP +The first step is to fill in the global configuration parameters. +The machine is a VAX, so the +.I "machine type" +is ``vax''. We will assume this system will +run only on this one processor, so the +.I "cpu type" +is ``VAX780''. The options are empty since this is going to +be a ``vanilla'' VAX. The system identifier, as mentioned before, +is ``ANSEL,'' and the maximum number of users we plan to support is +about 40. Thus the beginning of the configuration file looks like +this: +.DS +.ta 1.5i 2.5i 4.0i +# +# ANSEL VAX (a picture perfect machine) +# +machine vax +cpu VAX780 +timezone 8 dst +ident ANSEL +maxusers 40 +.DE +.PP +To this we must then add the specifications for three +system images. The first will be our standard system with the +root on ``hp0'' and swapping on the same drive as the root. +The second will have the root file system in the same location, +but swap space interleaved among drives on each controller. +Finally, the third will be a generic system, +to allow us to boot off any of the four disk drives. +.DS +.ta 1.5i 2.5i +config kernel root on hp0 +config hpkernel root on hp0 swap on hp0 and hp2 +config genkernel swap generic +.DE +.PP +Finally, the hardware must be specified. Let us first just try +transcribing the information from Table 1. +.DS +.ta 1.5i 2.5i 4.0i +controller mba0 at nexus ? +disk hp0 at mba0 disk 0 +disk hp1 at mba0 disk 1 +controller mba1 at nexus ? +disk hp2 at mba1 disk 2 +disk hp3 at mba1 disk 3 +controller uba0 at nexus ? +controller tm0 at uba0 csr 0172520 vector tmintr +tape te0 at tm0 drive 0 +tape te1 at tm0 drive 1 +device dh0 at uba0 csr 0160020 vector dhrint dhxint +device dm0 at uba0 csr 0170500 vector dmintr +device dh1 at uba0 csr 0160040 vector dhrint dhxint +device dh2 at uba0 csr 0160060 vector dhrint dhxint +.DE +.LP +(Oh, I forgot to mention one panel of the terminal multiplexor +has modem control, thus the ``dm0'' device.) +.PP +This will suffice, but leaves us with little flexibility. Suppose +our first disk controller were to break. We would like to recable the +drives normally on the second controller so that all our disks could +still be used without reconfiguring the system. To do this we wildcard +the MASSBUS adapter connections and also the slave numbers. Further, +we wildcard the UNIBUS adapter connections in case we decide some time +in the future to purchase another adapter to offload the single UNIBUS +we currently have. The revised device specifications would then be: +.DS +.ta 1.5i 2.5i 4.0i +controller mba0 at nexus ? +disk hp0 at mba? disk ? +disk hp1 at mba? disk ? +controller mba1 at nexus ? +disk hp2 at mba? disk ? +disk hp3 at mba? disk ? +controller uba0 at nexus ? +controller tm0 at uba? csr 0172520 vector tmintr +tape te0 at tm0 drive 0 +tape te1 at tm0 drive 1 +device dh0 at uba? csr 0160020 vector dhrint dhxint +device dm0 at uba? csr 0170500 vector dmintr +device dh1 at uba? csr 0160040 vector dhrint dhxint +device dh2 at uba? csr 0160060 vector dhrint dhxint +.DE +.LP +The completed configuration file for ANSEL is shown in Appendix C. +.NH 2 +VAX-11/750 with network support +.PP +Our VAX-11/750 system will be located on two 10Mb/s Ethernet +local area networks and also the DARPA Internet. The system +will have a MASSBUS drive for the root file system and two +UNIBUS drives. Paging is interleaved among all three drives. +We have sold our standard DEC terminal multiplexors since this +machine will be accessed solely through the network. This +machine is not intended to have a large user community, it +does not have a great deal of memory. First the global parameters: +.DS +.ta 1.5i 2.5i 4.0i +# +# UCBVAX (Gateway to the world) +# +machine vax +cpu "VAX780" +cpu "VAX750" +ident UCBVAX +timezone 8 dst +maxusers 32 +options INET +options NS +.DE +.PP +The multiple cpu types allow us to replace UCBVAX with a +more powerful cpu without reconfiguring the system. The +value of 32 given for the maximum number of users is done to +force the system data structures to be over-allocated. That +is desirable on this machine because, while it is not expected +to support many users, it is expected to perform a great deal +of work. +The ``INET'' indicates that we plan to use the +DARPA standard Internet protocols on this machine, +and ``NS'' also includes support for Xerox NS protocols. +Note that unlike 4.2BSD configuration files, +the network protocol options do not require corresponding pseudo devices. +.PP +The system images and disks are configured next. +.DS +.ta 1.5i 2.5i 4.0i +config kernel root on hp swap on hp and rk0 and rk1 +config upkernel root on up +config hkkernel root on hk swap on rk0 and rk1 + +controller mba0 at nexus ? +controller uba0 at nexus ? +disk hp0 at mba? drive 0 +disk hp1 at mba? drive 1 +controller sc0 at uba? csr 0176700 vector upintr +disk up0 at sc0 drive 0 +disk up1 at sc0 drive 1 +controller hk0 at uba? csr 0177440 vector rkintr +disk rk0 at hk0 drive 0 +disk rk1 at hk0 drive 1 +.DE +.PP +UCBVAX requires heavy interleaving of its paging area to keep up +with all the mail traffic it handles. The limiting factor on this +system's performance is usually the number of disk arms, as opposed +to memory or cpu cycles. The extra UNIBUS controller, ``sc0'', +is in case the MASSBUS controller breaks and a spare controller +must be installed (most of our old UNIBUS controllers have been +replaced with the newer MASSBUS controllers, so we have a number +of these around as spares). +.PP +Finally, we add in the network devices. +Pseudo terminals are needed to allow users to +log in across the network (remember the only hardwired terminal +is the console). +The software loopback device is used for on-machine communications. +The connection to the Internet is through +an IMP, this requires yet another +.I pseudo-device +(in addition to the actual hardware device used by the +IMP software). And, finally, there are the two Ethernet devices. +These use a special protocol, the Address Resolution Protocol (ARP), +to map between Internet and Ethernet addresses. Thus, yet another +.I pseudo-device +is needed. The additional device specifications are show below. +.DS +.ta 1.5i 2.5i 4.0i +pseudo-device pty +pseudo-device loop +pseudo-device imp +device acc0 at uba? csr 0167600 vector accrint accxint +pseudo-device ether +device ec0 at uba? csr 0164330 vector ecrint eccollide ecxint +device il0 at uba? csr 0164000 vector ilrint ilcint +.DE +.LP +The completed configuration file for UCBVAX is shown in Appendix C. +.NH 2 +Miscellaneous comments +.PP +It should be noted in these examples that neither system was +configured to use disk quotas or the 4.1BSD compatibility mode. +To use these optional facilities, and others, we would probably +clean out our current configuration, reconfigure the system, then +recompile and relink the system image(s). This could, of course, +be avoided by figuring out which relocatable object files are +affected by the reconfiguration, then reconfiguring and recompiling +only those files affected by the configuration change. This technique +should be used carefully. diff --git a/share/doc/smm/02.config/6.t b/share/doc/smm/02.config/6.t new file mode 100644 index 000000000000..ca171fd3bd8c --- /dev/null +++ b/share/doc/smm/02.config/6.t @@ -0,0 +1,226 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH "Adding New Devices +.ne 2i +.NH +ADDING NEW SYSTEM SOFTWARE +.PP +This section is not for the novice, it describes +some of the inner workings of the configuration process as +well as the pertinent parts of the system autoconfiguration process. +It is intended to give +those people who intend to install new device drivers and/or +other system facilities sufficient information to do so in the +manner which will allow others to easily share the changes. +.PP +This section is broken into four parts: +.IP \(bu 3 +general guidelines to be followed in modifying system code, +.IP \(bu 3 +how to add non-standard system facilities to 4.4BSD, +.IP \(bu 3 +how to add a device driver to 4.4BSD, and +.NH 2 +Modifying system code +.PP +If you wish to make site-specific modifications to the system +it is best to bracket them with +.DS +#ifdef SITENAME +\&... +#endif +.DE +to allow your source to be easily distributed to others, and +also to simplify \fIdiff\fP\|(1) listings. If you choose not +to use a source code control system (e.g. SCCS, RCS), and +perhaps even if you do, it is +recommended that you save the old code with something +of the form: +.DS +#ifndef SITENAME +\&... +#endif +.DE +We try to isolate our site-dependent code in individual files +which may be configured with pseudo-device specifications. +.PP +Indicate machine-specific code with ``#ifdef vax'' (or other machine, +as appropriate). +4.4BSD underwent extensive work to make it extremely portable to +machines with similar architectures\- you may someday find +yourself trying to use a single copy of the source code on +multiple machines. +.NH 2 +Adding non-standard system facilities +.PP +This section considers the work needed to augment +.IR config 's +data base files for non-standard system facilities. +.I Config +uses a set of files that list the source modules that may be required +when building a system. +The data bases are taken from the directory in which +.I config +is run, normally /sys/conf. +Three such files may be used: +.IR files , +.IR files .machine, +and +.IR files .ident. +The first is common to all systems, +the second contains files unique to a single machine type, +and the third is an optional list of modules for use on a specific machine. +This last file may override specifications in the first two. +The format of the +.I files +file has grown somewhat complex over time. Entries are normally of +the form +.IP +.nf +.DT +\fIdir/source.c\fP \fItype\fP \fIoption-list\fP \fImodifiers\fP +.LP +for example, +.IP +.nf +.DT +\fIvaxuba/foo.c\fP \fBoptional\fP foo \fBdevice-driver\fP +.LP +The +.I type +is one of +.B standard +or +.BR optional . +Files marked as standard are included in all system configurations. +Optional file specifications include a list of one or more system +options that together require the inclusion of this module. +The options in the list may be either names of devices that may +be in the configuration file, +or the names of system options that may be defined. +An optional file may be listed multiple times with different options; +if all of the options for any of the entries are satisfied, +the module is included. +.PP +If a file is specified as a +.IR device-driver , +any special compilation options for device drivers will be invoked. +On the VAX this results in the use of the +.B \-i +option for the C optimizer. This is required when pointer references +are made to memory locations in the VAX I/O address space. +.PP +Two other optional keywords modify the usage of the file. +.I Config +understands that certain files are used especially for +kernel profiling. These files are indicated in the +.I files +files with a +.I profiling-routine +keyword. For example, the current profiling subroutines +are sequestered off in a separate file with the following +entry: +.IP +.nf +.DT +\fIsys/subr_mcount.c\fP \fBoptional\fP \fBprofiling-routine\fP +.fi +.LP +The +.I profiling-routine +keyword forces +.I config +not to compile the source file with the +.B \-pg +option. +.PP +The second keyword which can be of use is the +.I config-dependent +keyword. This causes +.I config +to compile the indicated module with the global configuration +parameters. This allows certain modules, such as +.I machdep.c +to size system data structures based on the maximum number +of users configured for the system. +.NH 2 +Adding device drivers to 4.4BSD +.PP +The I/O system and +.I config +have been designed to easily allow new device support to be added. +The system source directories are organized as follows: +.DS +.TS +lw(1.0i) l. +/sys/h machine independent include files +/sys/sys machine-independent system source files +/sys/conf site configuration files and basic templates +/sys/net network-protocol-independent, but network-related code +/sys/netinet DARPA Internet code +/sys/netimp IMP support code +/sys/netns Xerox NS code +/sys/vax VAX-specific mainline code +/sys/vaxif VAX network interface code +/sys/vaxmba VAX MASSBUS device drivers and related code +/sys/vaxuba VAX UNIBUS device drivers and related code +.TE +.DE +.PP +Existing block and character device drivers for the VAX +reside in ``/sys/vax'', ``/sys/vaxmba'', and ``/sys/vaxuba''. Network +interface drivers reside in ``/sys/vaxif''. Any new device +drivers should be placed in the appropriate source code directory +and named so as not to conflict with existing devices. +Normally, definitions for things like device registers are placed in +a separate file in the same directory. For example, the ``dh'' +device driver is named ``dh.c'' and its associated include file is +named ``dhreg.h''. +.PP +Once the source for the device driver has been placed in a directory, +the file ``/sys/conf/files.machine'', and possibly +``/sys/conf/devices.machine'' should be modified. The +.I files +files in the conf directory contain a line for each C source or binary-only +file in the system. Those files which are machine independent are +located in ``/sys/conf/files,'' while machine specific files +are in ``/sys/conf/files.machine.'' The ``devices.machine'' file +is used to map device names to major block device numbers. If the device +driver being added provides support for a new disk +you will want to modify this file (the format is obvious). +.PP +In addition to including the driver in the +.I files +file, it must also be added to the device configuration tables. These +are located in ``/sys/vax/conf.c'', or similar for machines other than +the VAX. If you don't understand what to add to this file, you should +study an entry for an existing driver. +Remember that the position in the +device table specifies the major device number. +The block major number is needed in the ``devices.machine'' file +if the device is a disk. diff --git a/share/doc/smm/02.config/Makefile b/share/doc/smm/02.config/Makefile new file mode 100644 index 000000000000..7c8eb7f50779 --- /dev/null +++ b/share/doc/smm/02.config/Makefile @@ -0,0 +1,6 @@ +VOLUME= smm/02.config +SRCS= 0.t 1.t 2.t 3.t 4.t 5.t 6.t a.t b.t c.t d.t e.t +MACROS= -ms +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/smm/02.config/a.t b/share/doc/smm/02.config/a.t new file mode 100644 index 000000000000..b4dec21616ea --- /dev/null +++ b/share/doc/smm/02.config/a.t @@ -0,0 +1,156 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH "Configuration File Grammar +.bp +.LG +.B +.ce +APPENDIX A. CONFIGURATION FILE GRAMMAR +.sp +.R +.NL +.PP +The following grammar is a compressed form of the actual +\fIyacc\fP\|(1) grammar used by +.I config +to parse configuration files. +Terminal symbols are shown all in upper case, literals +are emboldened; optional clauses are enclosed in brackets, ``['' +and ``]''; zero or more instantiations are denoted with ``*''. +.sp +.nf +.DT +Configuration ::= [ Spec \fB;\fP ]* + +Spec ::= Config_spec + | Device_spec + | \fBtrace\fP + | /* lambda */ + +/* configuration specifications */ + +Config_spec ::= \fBmachine\fP ID + | \fBcpu\fP ID + | \fBoptions\fP Opt_list + | \fBident\fP ID + | System_spec + | \fBtimezone\fP [ \fB\-\fP ] NUMBER [ \fBdst\fP [ NUMBER ] ] + | \fBtimezone\fP [ \fB\-\fP ] FPNUMBER [ \fBdst\fP [ NUMBER ] ] + | \fBmaxusers\fP NUMBER + +/* system configuration specifications */ + +System_spec ::= \fBconfig\fP ID System_parameter [ System_parameter ]* + +System_parameter ::= swap_spec | root_spec | dump_spec | arg_spec + +swap_spec ::= \fBswap\fP [ \fBon\fP ] swap_dev [ \fBand\fP swap_dev ]* + +swap_dev ::= dev_spec [ \fBsize\fP NUMBER ] + +root_spec ::= \fBroot\fP [ \fBon\fP ] dev_spec + +dump_spec ::= \fBdumps\fP [ \fBon\fP ] dev_spec + +arg_spec ::= \fBargs\fP [ \fBon\fP ] dev_spec + +dev_spec ::= dev_name | major_minor + +major_minor ::= \fBmajor\fP NUMBER \fBminor\fP NUMBER + +dev_name ::= ID [ NUMBER [ ID ] ] + +/* option specifications */ + +Opt_list ::= Option [ \fB,\fP Option ]* + +Option ::= ID [ \fB=\fP Opt_value ] + +Opt_value ::= ID | NUMBER + +Mkopt_list ::= Mkoption [ \fB,\fP Mkoption ]* + +Mkoption ::= ID \fB=\fP Opt_value + +/* device specifications */ + +Device_spec ::= \fBdevice\fP Dev_name Dev_info Int_spec + | \fBmaster\fP Dev_name Dev_info + | \fBdisk\fP Dev_name Dev_info + | \fBtape\fP Dev_name Dev_info + | \fBcontroller\fP Dev_name Dev_info [ Int_spec ] + | \fBpseudo-device\fP Dev [ NUMBER ] + +Dev_name ::= Dev NUMBER + +Dev ::= \fBuba\fP | \fBmba\fP | ID + +Dev_info ::= Con_info [ Info ]* + +Con_info ::= \fBat\fP Dev NUMBER + | \fBat\fP \fBnexus\fP NUMBER + +Info ::= \fBcsr\fP NUMBER + | \fBdrive\fP NUMBER + | \fBslave\fP NUMBER + | \fBflags\fP NUMBER + +Int_spec ::= \fBvector\fP ID [ ID ]* + | \fBpriority\fP NUMBER +.fi +.sp +.SH +Lexical Conventions +.LP +The terminal symbols are loosely defined as: +.IP ID +.br +One or more alphabetics, either upper or lower case, and underscore, +``_''. +.IP NUMBER +.br +Approximately the C language specification for an integer number. +That is, a leading ``0x'' indicates a hexadecimal value, +a leading ``0'' indicates an octal value, otherwise the number is +expected to be a decimal value. Hexadecimal numbers may use either +upper or lower case alphabetics. +.IP FPNUMBER +.br +A floating point number without exponent. That is a number of the +form ``nnn.ddd'', where the fractional component is optional. +.LP +In special instances a question mark, ``?'', can be substituted for +a ``NUMBER'' token. This is used to effect wildcarding in device +interconnection specifications. +.LP +Comments in configuration files are indicated by a ``#'' character +at the beginning of the line; the remainder of the line is discarded. +.LP +A specification +is interpreted as a continuation of the previous line +if the first character of the line is tab. diff --git a/share/doc/smm/02.config/b.t b/share/doc/smm/02.config/b.t new file mode 100644 index 000000000000..0f5871e94121 --- /dev/null +++ b/share/doc/smm/02.config/b.t @@ -0,0 +1,131 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH "Device Defaulting Rules +.bp +.LG +.B +.ce +APPENDIX B. RULES FOR DEFAULTING SYSTEM DEVICES +.sp +.R +.NL +.PP +When \fIconfig\fP processes a ``config'' rule which does +not fully specify the location of the root file system, +paging area(s), device for system dumps, and device for +argument list processing it applies a set of rules to +define those values left unspecified. The following list +of rules are used in defaulting system devices. +.IP 1) 3 +If a root device is not specified, the swap +specification must indicate a ``generic'' system is to be built. +.IP 2) 3 +If the root device does not specify a unit number, it +defaults to unit 0. +.IP 3) 3 +If the root device does not include a partition specification, +it defaults to the ``a'' partition. +.IP 4) 3 +If no swap area is specified, it defaults to the ``b'' +partition of the root device. +.IP 5) 3 +If no device is specified for processing argument lists, the +first swap partition is selected. +.IP 6) 3 +If no device is chosen for system dumps, the first swap +partition is selected (see below to find out where dumps are +placed within the partition). +.PP +The following table summarizes the default partitions selected +when a device specification is incomplete, e.g. ``hp0''. +.DS +.TS +l l. +Type Partition +_ +root ``a'' +swap ``b'' +args ``b'' +dumps ``b'' +.TE +.DE +.SH +Multiple swap/paging areas +.PP +When multiple swap partitions are specified, the system treats the +first specified as a ``primary'' swap area which is always used. +The remaining partitions are then interleaved into the paging +system at the time a +.IR swapon (2) +system call is made. This is normally done at boot time with +a call to +.IR swapon (8) +from the /etc/rc file. +.SH +System dumps +.PP +System dumps are automatically taken after a system crash, +provided the device driver for the ``dumps'' device supports +this. The dump contains the contents of memory, but not +the swap areas. Normally the dump device is a disk in +which case the information is copied to a location at the +back of the partition. The dump is placed in the back of the +partition because the primary swap and dump device are commonly +the same device and this allows the system to be rebooted without +immediately overwriting the saved information. When a dump has +occurred, the system variable \fIdumpsize\fP +is set to a non-zero value indicating the size (in bytes) of +the dump. The \fIsavecore\fP\|(8) +program then copies the information from the dump partition to +a file in a ``crash'' directory and also makes a copy of the +system which was running at the time of the crash (usually +``/kernel''). The offset to the system dump is defined in the +system variable \fIdumplo\fP (a sector offset from +the front of the dump partition). The +.I savecore +program operates by reading the contents of \fIdumplo\fP, \fIdumpdev\fP, +and \fIdumpmagic\fP from /dev/kmem, then comparing the value +of \fIdumpmagic\fP read from /dev/kmem to that located in +corresponding location in the dump area of the dump partition. +If a match is found, +.I savecore +assumes a crash occurred and reads \fIdumpsize\fP from the dump area +of the dump partition. This value is then used in copying the +system dump. Refer to +\fIsavecore\fP\|(8) +for more information about its operation. +.PP +The value \fIdumplo\fP is calculated to be +.DS +\fIdumpdev-size\fP \- \fImemsize\fP +.DE +where \fIdumpdev-size\fP is the size of the disk partition +where system dumps are to be placed, and +\fImemsize\fP is the size of physical memory. +If the disk partition is not large enough to hold a full +dump, \fIdumplo\fP is set to 0 (the start of the partition). diff --git a/share/doc/smm/02.config/c.t b/share/doc/smm/02.config/c.t new file mode 100644 index 000000000000..f18fce99978a --- /dev/null +++ b/share/doc/smm/02.config/c.t @@ -0,0 +1,103 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH "Sample Config Files +.bp +.LG +.B +.ce +APPENDIX C. SAMPLE CONFIGURATION FILES +.sp +.R +.NL +.PP +The following configuration files are developed in section 5; +they are included here for completeness. +.sp 2 +.nf +.ta 1.5i 2.5i 4.0i +# +# ANSEL VAX (a picture perfect machine) +# +machine vax +cpu VAX780 +timezone 8 dst +ident ANSEL +maxusers 40 + +config kernel root on hp0 +config hpkernel root on hp0 swap on hp0 and hp2 +config genkernel swap generic + +controller mba0 at nexus ? +disk hp0 at mba? disk ? +disk hp1 at mba? disk ? +controller mba1 at nexus ? +disk hp2 at mba? disk ? +disk hp3 at mba? disk ? +controller uba0 at nexus ? +controller tm0 at uba? csr 0172520 vector tmintr +tape te0 at tm0 drive 0 +tape te1 at tm0 drive 1 +device dh0 at uba? csr 0160020 vector dhrint dhxint +device dm0 at uba? csr 0170500 vector dmintr +device dh1 at uba? csr 0160040 vector dhrint dhxint +device dh2 at uba? csr 0160060 vector dhrint dhxint +.bp +# +# UCBVAX - Gateway to the world +# +machine vax +cpu "VAX780" +cpu "VAX750" +ident UCBVAX +timezone 8 dst +maxusers 32 +options INET +options NS + +config kernel root on hp swap on hp and rk0 and rk1 +config upkernel root on up +config hkkernel root on hk swap on rk0 and rk1 + +controller mba0 at nexus ? +controller uba0 at nexus ? +disk hp0 at mba? drive 0 +disk hp1 at mba? drive 1 +controller sc0 at uba? csr 0176700 vector upintr +disk up0 at sc0 drive 0 +disk up1 at sc0 drive 1 +controller hk0 at uba? csr 0177440 vector rkintr +disk rk0 at hk0 drive 0 +disk rk1 at hk0 drive 1 +pseudo-device pty +pseudo-device loop +pseudo-device imp +device acc0 at uba? csr 0167600 vector accrint accxint +pseudo-device ether +device ec0 at uba? csr 0164330 vector ecrint eccollide ecxint +device il0 at uba? csr 0164000 vector ilrint ilcint diff --git a/share/doc/smm/02.config/d.t b/share/doc/smm/02.config/d.t new file mode 100644 index 000000000000..cd8b9dede8c8 --- /dev/null +++ b/share/doc/smm/02.config/d.t @@ -0,0 +1,266 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH "Data Structure Sizing Rules +.bp +.LG +.B +.ce +APPENDIX D. VAX KERNEL DATA STRUCTURE SIZING RULES +.sp +.R +.NL +.PP +Certain system data structures are sized at compile time +according to the maximum number of simultaneous users expected, +while others are calculated at boot time based on the +physical resources present, e.g. memory. This appendix lists +both sets of rules and also includes some hints on changing +built-in limitations on certain data structures. +.SH +Compile time rules +.PP +The file \fI/sys/conf\|/param.c\fP contains the definitions of +almost all data structures sized at compile time. This file +is copied into the directory of each configured system to allow +configuration-dependent rules and values to be maintained. +(Each copy normally depends on the copy in /sys/conf, +and global modifications cause the file to be recopied unless +the makefile is modified.) +The rules implied by its contents are summarized below (here +MAXUSERS refers to the value defined in the configuration file +in the ``maxusers'' rule). +Most limits are computed at compile time and stored in global variables +for use by other modules; they may generally be patched in the system +binary image before rebooting to test new values. +.IP \fBnproc\fP +.br +The maximum number of processes which may be running at any time. +It is referred to in other calculations as NPROC and is defined to be +.DS +20 + 8 * MAXUSERS +.DE +.IP \fBntext\fP +.br +The maximum number of active shared text segments. +The constant is intended to allow for network servers and common commands +that remain in the table. +It is defined as +.DS +36 + MAXUSERS. +.DE +.IP \fBninode\fP +.br +The maximum number of files in the file system which may be +active at any time. This includes files in use by users, as +well as directory files being read or written by the system +and files associated with bound sockets in the UNIX IPC domain. +It is defined as +.DS +(NPROC + 16 + MAXUSERS) + 32 +.DE +.IP \fBnfile\fP +.br +The number of ``file table'' structures. One file +table structure is used for each open, unshared, file descriptor. +Multiple file descriptors may reference a single file table +entry when they are created through a \fIdup\fP call, or as the +result of a \fIfork\fP. This is defined to be +.DS +16 * (NPROC + 16 + MAXUSERS) / 10 + 32 +.DE +.IP \fBncallout\fP +.br +The number of ``callout'' structures. One callout +structure is used per internal system event handled with +a timeout. Timeouts are used for terminal delays, +watchdog routines in device drivers, protocol timeout processing, etc. +This is defined as +.DS +16 + NPROC +.DE +.IP \fBnclist\fP +.br +The number of ``c-list'' structures. C-list structures are +used in terminal I/O, and currently each holds 60 characters. +Their number is defined as +.DS +60 + 12 * MAXUSERS +.DE +.IP \fBnmbclusters\fP +.br +The maximum number of pages which may be allocated by the network. +This is defined as 256 (a quarter megabyte of memory) in /sys/h/mbuf.h. +In practice, the network rarely uses this much memory. It starts off +by allocating 8 kilobytes of memory, then requesting more as +required. This value represents an upper bound. +.IP \fBnquota\fP +.br +The number of ``quota'' structures allocated. Quota structures +are present only when disc quotas are configured in the system. One +quota structure is kept per user. This is defined to be +.DS +(MAXUSERS * 9) / 7 + 3 +.DE +.IP \fBndquot\fP +.br +The number of ``dquot'' structures allocated. Dquot structures +are present only when disc quotas are configured in the system. +One dquot structure is required per user, per active file system quota. +That is, when a user manipulates a file on a file system on which +quotas are enabled, the information regarding the user's quotas on +that file system must be in-core. This information is cached, so +that not all information must be present in-core all the time. +This is defined as +.DS +NINODE + (MAXUSERS * NMOUNT) / 4 +.DE +where NMOUNT is the maximum number of mountable file systems. +.LP +In addition to the above values, the system page tables (used to +map virtual memory in the kernel's address space) are sized at +compile time by the SYSPTSIZE definition in the file /sys/vax/vmparam.h. +This is defined to be +.DS +20 + MAXUSERS +.DE +pages of page tables. +Its definition affects +the size of many data structures allocated at boot time because +it constrains the amount of virtual memory which may be addressed +by the running system. This is often the limiting factor +in the size of the buffer cache, in which case a message is printed +when the system configures at boot time. +.SH +Run-time calculations +.PP +The most important data structures sized at run-time are those used in +the buffer cache. Allocation is done by allocating physical memory +(and system virtual memory) immediately after the system +has been started up; look in the file /sys/vax/machdep.c. +The amount of physical memory which may be allocated to the buffer +cache is constrained by the size of the system page tables, among +other things. While the system may calculate +a large amount of memory to be allocated to the buffer cache, +if the system page +table is too small to map this physical +memory into the virtual address space +of the system, only as much as can be mapped will be used. +.PP +The buffer cache is comprised of a number of ``buffer headers'' +and a pool of pages attached to these headers. Buffer headers +are divided into two categories: those used for swapping and +paging, and those used for normal file I/O. The system tries +to allocate 10% of the first two megabytes and 5% of the remaining +available physical memory for the buffer +cache (where \fIavailable\fP does not count that space occupied by +the system's text and data segments). If this results in fewer +than 16 pages of memory allocated, then 16 pages are allocated. +This value is kept in the initialized variable \fIbufpages\fP +so that it may be patched in the binary image (to allow tuning +without recompiling the system), +or the default may be overridden with a configuration-file option. +For example, the option \fBoptions BUFPAGES="3200"\fP +causes 3200 pages (3.2M bytes) to be used by the buffer cache. +A sufficient number of file I/O buffer headers are then allocated +to allow each to hold 2 pages each. +Each buffer maps 8K bytes. +If the number of buffer pages is larger than can be mapped +by the buffer headers, the number of pages is reduced. +The number of buffer headers allocated +is stored in the global variable \fInbuf\fP, +which may be patched before the system is booted. +The system option \fBoptions NBUF="1000"\fP forces the allocation +of 1000 buffer headers. +Half as many swap I/O buffer headers as file I/O buffers +are allocated, +but no more than 256. +.SH +System size limitations +.PP +As distributed, the sum of the virtual sizes of the core-resident +processes is limited to 256M bytes. The size of the text +segment of a single process is currently limited to 6M bytes. +It may be increased to no greater than the data segment size limit +(see below) by redefining MAXTSIZ. +This may be done with a configuration file option, +e.g. \fBoptions MAXTSIZ="(10*1024*1024)"\fP +to set the limit to 10 million bytes. +Other per-process limits discussed here may be changed with similar options +with names given in parentheses. +Soft, user-changeable limits are set to 512K bytes for stack (DFLSSIZ) +and 6M bytes for the data segment (DFLDSIZ) by default; +these may be increased up to the hard limit +with the \fIsetrlimit\fP\|(2) system call. +The data and stack segment size hard limits are set by a system configuration +option to one of 17M, 33M or 64M bytes. +One of these sizes is chosen based on the definition of MAXDSIZ; +with no option, the limit is 17M bytes; with an option +\fBoptions MAXDSIZ="(32*1024*1024)"\fP (or any value between 17M and 33M), +the limit is increased to 33M bytes, and values larger than 33M +result in a limit of 64M bytes. +You must be careful in doing this that you have adequate paging space. +As normally configured , the system has 16M or 32M bytes per paging area, +depending on disk size. +The best way to get more space is to provide multiple, thereby +interleaved, paging areas. +Increasing the virtual memory limits results in interleaving of +swap space in larger sections (from 500K bytes to 1M or 2M bytes). +.PP +By default, the virtual memory system allocates enough memory +for system page tables mapping user page tables +to allow 256 megabytes of simultaneous active virtual memory. +That is, the sum of the virtual memory sizes of all (completely- or partially-) +resident processes can not exceed this limit. +If the limit is exceeded, some process(es) must be swapped out. +To increase the amount of resident virtual space possible, +you can alter the constant USRPTSIZE (in +/sys/vax/vmparam.h). +Each page of system page tables allows 8 megabytes of user virtual memory. +.PP +Because the file system block numbers are stored in +page table \fIpg_blkno\fP +entries, the maximum size of a file system is limited to +2^24 1024 byte blocks. Thus no file system can be larger than 8 gigabytes. +.PP +The number of mountable file systems is set at 20 by the definition +of NMOUNT in /sys/h/param.h. +This should be sufficient; if not, the value can be increased up to 255. +If you have many disks, it makes sense to make some of +them single file systems, and the paging areas don't count in this total. +.PP +The limit to the number of files that a process may have open simultaneously +is set to 64. +This limit is set by the NOFILE definition in /sys/h/param.h. +It may be increased arbitrarily, with the caveat that the user structure +expands by 5 bytes for each file, and thus UPAGES (/sys/vax/machparam.h) +must be increased accordingly. +.PP +The amount of physical memory is currently limited to 64 Mb +by the size of the index fields in the core-map (/sys/h/cmap.h). +The limit may be increased by following instructions in that file +to enlarge those fields. diff --git a/share/doc/smm/02.config/e.t b/share/doc/smm/02.config/e.t new file mode 100644 index 000000000000..f66b6e00a0cd --- /dev/null +++ b/share/doc/smm/02.config/e.t @@ -0,0 +1,108 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH "Network configuration options +.bp +.LG +.B +.ce +APPENDIX E. NETWORK CONFIGURATION OPTIONS +.sp +.R +.NL +.PP +The network support in the kernel is self-configuring +according to the protocol support options (INET and NS) and the network +hardware discovered during autoconfiguration. +There are several changes that may be made to customize network behavior +due to local restrictions. +Within the Internet protocol routines, the following options +set in the system configuration file are supported: +.IP \fBGATEWAY\fP +.br +The machine is to be used as a gateway. +This option currently makes only minor changes. +First, the size of the network routing hash table is increased. +Secondly, machines that have only a single hardware network interface +will not forward IP packets; without this option, they will also refrain +from sending any error indication to the source of unforwardable packets. +Gateways with only a single interface are assumed to have missing +or broken interfaces, and will return ICMP unreachable errors to hosts +sending them packets to be forwarded. +.IP \fBTCP_COMPAT_42\fP +.br +This option forces the system to limit its initial TCP sequence numbers +to positive numbers. +Without this option, 4.4BSD systems may have problems with TCP connections +to 4.2BSD systems that connect but never transfer data. +The problem is a bug in the 4.2BSD TCP. +.IP \fBIPFORWARDING\fP +.br +Normally, 4.4BSD machines with multiple network interfaces +will forward IP packets received that should be resent to another host. +If the line ``options IPFORWARDING="0"'' is in the system configuration +file, IP packet forwarding will be disabled. +.IP \fBIPSENDREDIRECTS\fP +.br +When forwarding IP packets, 4.4BSD IP will note when a packet is forwarded +using the same interface on which it arrived. +When this is noted, if the source machine is on the directly-attached +network, an ICMP redirect is sent to the source host. +If the packet was forwarded using a route to a host or to a subnet, +a host redirect is sent, otherwise a network redirect is sent. +The generation of redirects may be inhibited with the configuration +option ``options IPSENDREDIRECTS="0".'' +.br +.IP \fBSUBNETSARELOCAL\fP +TCP calculates a maximum segment size to use for each connection, +and sends no datagrams larger than that size. +This size will be no larger than that supported on the outgoing +interface. +Furthermore, if the destination is not on the local network, +the size will be no larger than 576 bytes. +For this test, other subnets of a directly-connected subnetted +network are considered to be local unless the line +``options SUBNETSARELOCAL="0"'' is used in the system configuration file. +.LP +The following options are supported by the Xerox NS protocols: +.IP \fBNSIP\fP +.br +This option allows NS IDP datagrams to be encapsulated in Internet IP +packets for transmission to a collaborating NSIP host. +This may be used to pass IDP packets through IP-only link layer networks. +See +.IR nsip (4P) +for details. +.IP \fBTHREEWAYSHAKE\fP +.br +The NS Sequenced Packet Protocol does not require a three-way handshake +before considering a connection to be in the established state. +(A three-way handshake consists of a connection request, an acknowledgement +of the request along with a symmetrical opening indication, +and then an acknowledgement of the reciprocal opening packet.) +This option forces a three-way handshake before data may be transmitted +on Sequenced Packet sockets. diff --git a/share/doc/smm/02.config/spell.ok b/share/doc/smm/02.config/spell.ok new file mode 100644 index 000000000000..cc9d6fdf11e2 --- /dev/null +++ b/share/doc/smm/02.config/spell.ok @@ -0,0 +1,304 @@ +ACC +ANSEL +ARP +Autoconfiguration +BUFPAGES +CANTWAIT +CH +COMPAT +CSS +Co +Config +Config''SMM:2 +DCLR +DFLDSIZ +DFLSSIZ +DFUNNY +DHAHA +DMA +Dev +Dquot +ECC +EMULEX +Emulex +Ethernet +FPNUMBER +FUNNY,HAHA +HAVEBDP +ICMP +IDP +IE +INET +IP +IPC +IPFORWARDING +IPL +IPSENDREDIRECTS +Info +Karels +LH +Leffler +MASSBUS +MAXDSIZ +MAXTSIZ +Makefile +Mb +MicroVAX +Mkopt +Mkoption +NBUF +NEED16 +NEEDBDP +NINODE +NMOUNT +NOFILE +NPROC +NS +NSC +NSIP +NUP +PST +RCS +RDY +RH +RK07 +RK611 +SCCS +SITENAME +SMM:2 +SUBNETSARELOCAL +SYSPTSIZE +TCP +THREEWAYSHAKE +Timezone +UCBVAX +UDP +UNIBUS +UPAGES +UPCS2 +USRPTSIZE +VAX +VAX630 +VAX730 +VAX750 +VAX780 +VAX8600 +VAXWELL +VAXen +Vax +Vaxwell +acc0 +accrint +accxint +addr +arg +args +assym.s +autoconfiguration +autoconfigure +autoconfigured +backpointer +badaddr +blkno +br +br5 +buf +bufpages +buses +caddr +callout +catchall +cmap.h +cmd +conf +conf.c +config +csr +ct.c +ctlr +cvec +datagrams +define''s +dev +devices.machine +dgo +dh.c +dh0 +dh1 +dh2 +dhreg.h +dhrint +dhxint +dinfo +dk +dk.h +dm0 +dmintr +dname +dquot +dst +dumpdev +dumplo +dumpmagic +dumpsize +dz.c +dz0 +dzrint +dzxint +ec0 +eccollide +ecrint +ecxint +endif +es +files.machine +filesystem +foo +foo.c +genkernel +gettimeofday +gigabytes +gprof +hardwired +hd +hk +hk0 +hkkernel +howmany +hp0 +hp0b +hp1 +hp2 +hp3 +hpkernel +ht0 +hz +ident +ifdef +ifndef +il0 +ilcint +ilrint +info +intr +ioconf.c +kgmon +linterrs +loopback +machdep.c +machparam.h +makefile +makelinks +makeoptions +maxusers +mba +mba0 +mba1 +mbuf.h +mcount.c +memsize +minfo +mname +moniker +mspw +nbuf +ncallout +nclist +ndquot +ndrive +netimp +netinet +netns +netstat +nexi +nexus +nfile +ninode +nmbclusters +nnn.ddd +nproc +nquota +nsip +ntext +optionlist +param.c +param.h +pathnames +pg +physaddr +pty +rc +reg +rk.c +rk0 +rk1 +rkintr +savecore +sc +sc0 +sc1 +scdriver +setrlimit +sizeof +softc +source.c +subr +swapxxx.c +sysname +te0 +te1 +timezone +tm0 +tmintr +tu0 +uba +uba.c +uba0 +ubago +uballoc +ubamem +ubanum +ubareg.h +ubarelse +ubavar.h +ubglue.s +ubinfo +ud +ui +um +up.c +up0 +up1 +up2 +upaddr +upattach +upba +upcs1 +upcs2 +updevice +updgo +updinfo +updtab +upintr +upip +upmaptype +upminfo +upprobe +upslave +upstd +upkernel +upwatch +upwstart +value,name2 +value2 +vax +vaxif +vaxmba +vaxuba +vmparam.h +kernel +wildcard +wildcarded +wildcarding +xclu +xxx diff --git a/share/doc/smm/03.fsck/0.t b/share/doc/smm/03.fsck/0.t new file mode 100644 index 000000000000..0c223e4b950a --- /dev/null +++ b/share/doc/smm/03.fsck/0.t @@ -0,0 +1,144 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.if n .ND +.TL +Fsck_ffs \- The UNIX\(dg File System Check Program +.EH 'SMM:3-%''The \s-2UNIX\s+2 File System Check Program' +.OH 'The \s-2UNIX\s+2 File System Check Program''SMM:3-%' +.AU +Marshall Kirk McKusick +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.AU +T. J. Kowalski +.AI +Bell Laboratories +Murray Hill, New Jersey 07974 +.AB +.FS +\(dgUNIX is a trademark of Bell Laboratories. +.FE +.FS +This work was done under grants from +the National Science Foundation under grant MCS80-05144, +and the Defense Advance Research Projects Agency (DoD) under +Arpa Order No. 4031 monitored by Naval Electronic System Command under +Contract No. N00039-82-C-0235. +.FE +This document reflects the use of +.I fsck_ffs +with the 4.2BSD and 4.3BSD file system organization. This +is a revision of the +original paper written by +T. J. Kowalski. +.PP +File System Check Program (\fIfsck_ffs\fR) +is an interactive file system check and repair program. +.I Fsck_ffs +uses the redundant structural information in the +UNIX file system to perform several consistency checks. +If an inconsistency is detected, it is reported +to the operator, who may elect to fix or ignore +each inconsistency. +These inconsistencies result from the permanent interruption +of the file system updates, which are performed every +time a file is modified. +Unless there has been a hardware failure, +.I fsck_ffs +is able to repair corrupted file systems +using procedures based upon the order in which UNIX honors +these file system update requests. +.PP +The purpose of this document is to describe the normal updating +of the file system, +to discuss the possible causes of file system corruption, +and to present the corrective actions implemented +by +.I fsck_ffs. +Both the program and the interaction between the +program and the operator are described. +.sp 2 +.LP +Revised October 7, 1996 +.AE +.LP +.bp +.ce +.B "TABLE OF CONTENTS" +.LP +.sp 1 +.nf +.B "1. Introduction" +.LP +.sp .5v +.nf +.B "2. Overview of the file system +2.1. Superblock +2.2. Summary Information +2.3. Cylinder groups +2.4. Fragments +2.5. Updates to the file system +.LP +.sp .5v +.nf +.B "3. Fixing corrupted file systems +3.1. Detecting and correcting corruption +3.2. Super block checking +3.3. Free block checking +3.4. Checking the inode state +3.5. Inode links +3.6. Inode data size +3.7. Checking the data associated with an inode +3.8. File system connectivity +.LP +.sp .5v +.nf +.B Acknowledgements +.LP +.sp .5v +.nf +.B References +.LP +.sp .5v +.nf +.B "4. Appendix A +4.1. Conventions +4.2. Initialization +4.3. Phase 1 - Check Blocks and Sizes +4.4. Phase 1b - Rescan for more Dups +4.5. Phase 2 - Check Pathnames +4.6. Phase 3 - Check Connectivity +4.7. Phase 4 - Check Reference Counts +4.8. Phase 5 - Check Cyl groups +4.9. Cleanup +.ds RH Introduction +.bp diff --git a/share/doc/smm/03.fsck/1.t b/share/doc/smm/03.fsck/1.t new file mode 100644 index 000000000000..f9048d55ab38 --- /dev/null +++ b/share/doc/smm/03.fsck/1.t @@ -0,0 +1,77 @@ +.\" Copyright (c) 1982, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds RH Introduction +.NH +Introduction +.PP +This document reflects the use of +.I fsck_ffs +with the 4.2BSD and 4.3BSD file system organization. This +is a revision of the +original paper written by +T. J. Kowalski. +.PP +When a UNIX +operating system is brought up, a consistency +check of the file systems should always be performed. +This precautionary measure helps to insure +a reliable environment for file storage on disk. +If an inconsistency is discovered, +corrective action must be taken. +.I Fsck_ffs +runs in two modes. +Normally it is run non-interactively by the system after +a normal boot. +When running in this mode, +it will only make changes to the file system that are known +to always be correct. +If an unexpected inconsistency is found +.I fsck_ffs +will exit with a non-zero exit status, +leaving the system running single-user. +Typically the operator then runs +.I fsck_ffs +interactively. +When running in this mode, +each problem is listed followed by a suggested corrective action. +The operator must decide whether or not the suggested correction +should be made. +.PP +The purpose of this memo is to dispel the +mystique surrounding +file system inconsistencies. +It first describes the updating of the file system +(the calm before the storm) and +then describes file system corruption (the storm). +Finally, +the set of deterministic corrective actions +used by +.I fsck_ffs +(the Coast Guard +to the rescue) is presented. +.ds RH Overview of the File System diff --git a/share/doc/smm/03.fsck/2.t b/share/doc/smm/03.fsck/2.t new file mode 100644 index 000000000000..5ecc4b8df925 --- /dev/null +++ b/share/doc/smm/03.fsck/2.t @@ -0,0 +1,259 @@ +.\" Copyright (c) 1982, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds RH Overview of the file system +.NH +Overview of the file system +.PP +The file system is discussed in detail in [Mckusick84]; +this section gives a brief overview. +.NH 2 +Superblock +.PP +A file system is described by its +.I "super-block" . +The super-block is built when the file system is created (\c +.I newfs (8)) +and never changes. +The super-block +contains the basic parameters of the file system, +such as the number of data blocks it contains +and a count of the maximum number of files. +Because the super-block contains critical data, +.I newfs +replicates it to protect against catastrophic loss. +The +.I "default super block" +always resides at a fixed offset from the beginning +of the file system's disk partition. +The +.I "redundant super blocks" +are not referenced unless a head crash +or other hard disk error causes the default super-block +to be unusable. +The redundant blocks are sprinkled throughout the disk partition. +.PP +Within the file system are files. +Certain files are distinguished as directories and contain collections +of pointers to files that may themselves be directories. +Every file has a descriptor associated with it called an +.I "inode". +The inode contains information describing ownership of the file, +time stamps indicating modification and access times for the file, +and an array of indices pointing to the data blocks for the file. +In this section, +we assume that the first 12 blocks +of the file are directly referenced by values stored +in the inode structure itself\(dg. +.FS +\(dgThe actual number may vary from system to system, but is usually in +the range 5-13. +.FE +The inode structure may also contain references to indirect blocks +containing further data block indices. +In a file system with a 4096 byte block size, a singly indirect +block contains 1024 further block addresses, +a doubly indirect block contains 1024 addresses of further single indirect +blocks, +and a triply indirect block contains 1024 addresses of further doubly indirect +blocks (the triple indirect block is never needed in practice). +.PP +In order to create files with up to +2\(ua32 bytes, +using only two levels of indirection, +the minimum size of a file system block is 4096 bytes. +The size of file system blocks can be any power of two +greater than or equal to 4096. +The block size of the file system is maintained in the super-block, +so it is possible for file systems of different block sizes +to be accessible simultaneously on the same system. +The block size must be decided when +.I newfs +creates the file system; +the block size cannot be subsequently +changed without rebuilding the file system. +.NH 2 +Summary information +.PP +Associated with the super block is non replicated +.I "summary information" . +The summary information changes +as the file system is modified. +The summary information contains +the number of blocks, fragments, inodes and directories in the file system. +.NH 2 +Cylinder groups +.PP +The file system partitions the disk into one or more areas called +.I "cylinder groups". +A cylinder group is comprised of one or more consecutive +cylinders on a disk. +Each cylinder group includes inode slots for files, a +.I "block map" +describing available blocks in the cylinder group, +and summary information describing the usage of data blocks +within the cylinder group. +A fixed number of inodes is allocated for each cylinder group +when the file system is created. +The current policy is to allocate one inode for each 2048 +bytes of disk space; +this is expected to be far more inodes than will ever be needed. +.PP +All the cylinder group bookkeeping information could be +placed at the beginning of each cylinder group. +However if this approach were used, +all the redundant information would be on the top platter. +A single hardware failure that destroyed the top platter +could cause the loss of all copies of the redundant super-blocks. +Thus the cylinder group bookkeeping information +begins at a floating offset from the beginning of the cylinder group. +The offset for +the +.I "i+1" st +cylinder group is about one track further +from the beginning of the cylinder group +than it was for the +.I "i" th +cylinder group. +In this way, +the redundant +information spirals down into the pack; +any single track, cylinder, +or platter can be lost without losing all copies of the super-blocks. +Except for the first cylinder group, +the space between the beginning of the cylinder group +and the beginning of the cylinder group information stores data. +.NH 2 +Fragments +.PP +To avoid waste in storing small files, +the file system space allocator divides a single +file system block into one or more +.I "fragments". +The fragmentation of the file system is specified +when the file system is created; +each file system block can be optionally broken into +2, 4, or 8 addressable fragments. +The lower bound on the size of these fragments is constrained +by the disk sector size; +typically 512 bytes is the lower bound on fragment size. +The block map associated with each cylinder group +records the space availability at the fragment level. +Aligned fragments are examined +to determine block availability. +.PP +On a file system with a block size of 4096 bytes +and a fragment size of 1024 bytes, +a file is represented by zero or more 4096 byte blocks of data, +and possibly a single fragmented block. +If a file system block must be fragmented to obtain +space for a small amount of data, +the remainder of the block is made available for allocation +to other files. +For example, +consider an 11000 byte file stored on +a 4096/1024 byte file system. +This file uses two full size blocks and a 3072 byte fragment. +If no fragments with at least 3072 bytes +are available when the file is created, +a full size block is split yielding the necessary 3072 byte +fragment and an unused 1024 byte fragment. +This remaining fragment can be allocated to another file, as needed. +.NH 2 +Updates to the file system +.PP +Every working day hundreds of files +are created, modified, and removed. +Every time a file is modified, +the operating system performs a +series of file system updates. +These updates, when written on disk, yield a consistent file system. +The file system stages +all modifications of critical information; +modification can +either be completed or cleanly backed out after a crash. +Knowing the information that is first written to the file system, +deterministic procedures can be developed to +repair a corrupted file system. +To understand this process, +the order that the update +requests were being honored must first be understood. +.PP +When a user program does an operation to change the file system, +such as a +.I write , +the data to be written is copied into an internal +.I "in-core" +buffer in the kernel. +Normally, the disk update is handled asynchronously; +the user process is allowed to proceed even though +the data has not yet been written to the disk. +The data, +along with the inode information reflecting the change, +is eventually written out to disk. +The real disk write may not happen until long after the +.I write +system call has returned. +Thus at any given time, the file system, +as it resides on the disk, +lags the state of the file system represented by the in-core information. +.PP +The disk information is updated to reflect the in-core information +when the buffer is required for another use, +when a +.I sync (2) +is done (at 30 second intervals) by +.I "/etc/update" "(8)," +or by manual operator intervention with the +.I sync (8) +command. +If the system is halted without writing out the in-core information, +the file system on the disk will be in an inconsistent state. +.PP +If all updates are done asynchronously, several serious +inconsistencies can arise. +One inconsistency is that a block may be claimed by two inodes. +Such an inconsistency can occur when the system is halted before +the pointer to the block in the old inode has been cleared +in the copy of the old inode on the disk, +and after the pointer to the block in the new inode has been written out +to the copy of the new inode on the disk. +Here, +there is no deterministic method for deciding +which inode should really claim the block. +A similar problem can arise with a multiply claimed inode. +.PP +The problem with asynchronous inode updates +can be avoided by doing all inode deallocations synchronously. +Consequently, +inodes and indirect blocks are written to the disk synchronously +(\fIi.e.\fP the process blocks until the information is +really written to disk) +when they are being deallocated. +Similarly inodes are kept consistent by synchronously +deleting, adding, or changing directory entries. +.ds RH Fixing corrupted file systems diff --git a/share/doc/smm/03.fsck/3.t b/share/doc/smm/03.fsck/3.t new file mode 100644 index 000000000000..380c25161117 --- /dev/null +++ b/share/doc/smm/03.fsck/3.t @@ -0,0 +1,446 @@ +.\" Copyright (c) 1982, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds RH Fixing corrupted file systems +.NH +Fixing corrupted file systems +.PP +A file system +can become corrupted in several ways. +The most common of these ways are +improper shutdown procedures +and hardware failures. +.PP +File systems may become corrupted during an +.I "unclean halt" . +This happens when proper shutdown +procedures are not observed, +physically write-protecting a mounted file system, +or a mounted file system is taken off-line. +The most common operator procedural failure is forgetting to +.I sync +the system before halting the CPU. +.PP +File systems may become further corrupted if proper startup +procedures are not observed, e.g., +not checking a file system for inconsistencies, +and not repairing inconsistencies. +Allowing a corrupted file system to be used (and, thus, to be modified +further) can be disastrous. +.PP +Any piece of hardware can fail at any time. +Failures +can be as subtle as a bad block +on a disk pack, or as blatant as a non-functional disk-controller. +.NH 2 +Detecting and correcting corruption +.PP +Normally +.I fsck_ffs +is run non-interactively. +In this mode it will only fix +corruptions that are expected to occur from an unclean halt. +These actions are a proper subset of the actions that +.I fsck_ffs +will take when it is running interactively. +Throughout this paper we assume that +.I fsck_ffs +is being run interactively, +and all possible errors can be encountered. +When an inconsistency is discovered in this mode, +.I fsck_ffs +reports the inconsistency for the operator to +chose a corrective action. +.PP +A quiescent\(dd +.FS +\(dd I.e., unmounted and not being written on. +.FE +file system may be checked for structural integrity +by performing consistency checks on the +redundant data intrinsic to a file system. +The redundant data is either read from +the file system, +or computed from other known values. +The file system +.B must +be in a quiescent state when +.I fsck_ffs +is run, +since +.I fsck_ffs +is a multi-pass program. +.PP +In the following sections, +we discuss methods to discover inconsistencies +and possible corrective actions +for the cylinder group blocks, the inodes, the indirect blocks, and +the data blocks containing directory entries. +.NH 2 +Super-block checking +.PP +The most commonly corrupted item in a file system +is the summary information +associated with the super-block. +The summary information is prone to corruption +because it is modified with every change to the file +system's blocks or inodes, +and is usually corrupted +after an unclean halt. +.PP +The super-block is checked for inconsistencies +involving file-system size, number of inodes, +free-block count, and the free-inode count. +The file-system size must be larger than the +number of blocks used by the super-block +and the number of blocks used by the list of inodes. +The file-system size and layout information +are the most critical pieces of information for +.I fsck_ffs . +While there is no way to actually check these sizes, +since they are statically determined by +.I newfs , +.I fsck_ffs +can check that these sizes are within reasonable bounds. +All other file system checks require that these sizes be correct. +If +.I fsck_ffs +detects corruption in the static parameters of the default super-block, +.I fsck_ffs +requests the operator to specify the location of an alternate super-block. +.NH 2 +Free block checking +.PP +.I Fsck_ffs +checks that all the blocks +marked as free in the cylinder group block maps +are not claimed by any files. +When all the blocks have been initially accounted for, +.I fsck_ffs +checks that +the number of free blocks +plus the number of blocks claimed by the inodes +equals the total number of blocks in the file system. +.PP +If anything is wrong with the block allocation maps, +.I fsck_ffs +will rebuild them, +based on the list it has computed of allocated blocks. +.PP +The summary information associated with the super-block +counts the total number of free blocks within the file system. +.I Fsck_ffs +compares this count to the +number of free blocks it found within the file system. +If the two counts do not agree, then +.I fsck_ffs +replaces the incorrect count in the summary information +by the actual free-block count. +.PP +The summary information +counts the total number of free inodes within the file system. +.I Fsck_ffs +compares this count to the number +of free inodes it found within the file system. +If the two counts do not agree, then +.I fsck_ffs +replaces the incorrect count in the +summary information by the actual free-inode count. +.NH 2 +Checking the inode state +.PP +An individual inode is not as likely to be corrupted as +the allocation information. +However, because of the great number of active inodes, +a few of the inodes are usually corrupted. +.PP +The list of inodes in the file system +is checked sequentially starting with inode 2 +(inode 0 marks unused inodes; +inode 1 is saved for future generations) +and progressing through the last inode in the file system. +The state of each inode is checked for +inconsistencies involving format and type, +link count, +duplicate blocks, +bad blocks, +and inode size. +.PP +Each inode contains a mode word. +This mode word describes the type and state of the inode. +Inodes must be one of six types: +regular inode, directory inode, symbolic link inode, +special block inode, special character inode, or socket inode. +Inodes may be found in one of three allocation states: +unallocated, allocated, and neither unallocated nor allocated. +This last state suggests an incorrectly formated inode. +An inode can get in this state if +bad data is written into the inode list. +The only possible corrective action is for +.I fsck_ffs +is to clear the inode. +.NH 2 +Inode links +.PP +Each inode counts the +total number of directory entries +linked to the inode. +.I Fsck_ffs +verifies the link count of each inode +by starting at the root of the file system, +and descending through the directory structure. +The actual link count for each inode +is calculated during the descent. +.PP +If the stored link count is non-zero and the actual +link count is zero, +then no directory entry appears for the inode. +If this happens, +.I fsck_ffs +will place the disconnected file in the +.I lost+found +directory. +If the stored and actual link counts are non-zero and unequal, +a directory entry may have been added or removed without the inode being +updated. +If this happens, +.I fsck_ffs +replaces the incorrect stored link count by the actual link count. +.PP +Each inode contains a list, +or pointers to +lists (indirect blocks), +of all the blocks claimed by the inode. +Since indirect blocks are owned by an inode, +inconsistencies in indirect blocks directly +affect the inode that owns it. +.PP +.I Fsck_ffs +compares each block number claimed by an inode +against a list of already allocated blocks. +If another inode already claims a block number, +then the block number is added to a list of +.I "duplicate blocks" . +Otherwise, the list of allocated blocks +is updated to include the block number. +.PP +If there are any duplicate blocks, +.I fsck_ffs +will perform a partial second +pass over the inode list +to find the inode of the duplicated block. +The second pass is needed, +since without examining the files associated with +these inodes for correct content, +not enough information is available +to determine which inode is corrupted and should be cleared. +If this condition does arise +(only hardware failure will cause it), +then the inode with the earliest +modify time is usually incorrect, +and should be cleared. +If this happens, +.I fsck_ffs +prompts the operator to clear both inodes. +The operator must decide which one should be kept +and which one should be cleared. +.PP +.I Fsck_ffs +checks the range of each block number claimed by an inode. +If the block number is +lower than the first data block in the file system, +or greater than the last data block, +then the block number is a +.I "bad block number" . +Many bad blocks in an inode are usually caused by +an indirect block that was not written to the file system, +a condition which can only occur if there has been a hardware failure. +If an inode contains bad block numbers, +.I fsck_ffs +prompts the operator to clear it. +.NH 2 +Inode data size +.PP +Each inode contains a count of the number of data blocks +that it contains. +The number of actual data blocks +is the sum of the allocated data blocks +and the indirect blocks. +.I Fsck_ffs +computes the actual number of data blocks +and compares that block count against +the actual number of blocks the inode claims. +If an inode contains an incorrect count +.I fsck_ffs +prompts the operator to fix it. +.PP +Each inode contains a thirty-two bit size field. +The size is the number of data bytes +in the file associated with the inode. +The consistency of the byte size field is roughly checked +by computing from the size field the maximum number of blocks +that should be associated with the inode, +and comparing that expected block count against +the actual number of blocks the inode claims. +.NH 2 +Checking the data associated with an inode +.PP +An inode can directly or indirectly +reference three kinds of data blocks. +All referenced blocks must be the same kind. +The three types of data blocks are: +plain data blocks, symbolic link data blocks, and directory data blocks. +Plain data blocks +contain the information stored in a file; +symbolic link data blocks +contain the path name stored in a link. +Directory data blocks contain directory entries. +.I Fsck_ffs +can only check the validity of directory data blocks. +.PP +Each directory data block is checked for +several types of inconsistencies. +These inconsistencies include +directory inode numbers pointing to unallocated inodes, +directory inode numbers that are greater than +the number of inodes in the file system, +incorrect directory inode numbers for ``\fB.\fP'' and ``\fB..\fP'', +and directories that are not attached to the file system. +If the inode number in a directory data block +references an unallocated inode, +then +.I fsck_ffs +will remove that directory entry. +Again, +this condition can only arise when there has been a hardware failure. +.PP +.I Fsck_ffs +also checks for directories with unallocated blocks (holes). +Such directories should never be created. +When found, +.I fsck_ffs +will prompt the user to adjust the length of the offending directory +which is done by shortening the size of the directory to the end of the +last allocated block preceding the hole. +Unfortunately, this means that another Phase 1 run has to be done. +.I Fsck_ffs +will remind the user to rerun fsck_ffs after repairing a +directory containing an unallocated block. +.PP +If a directory entry inode number references +outside the inode list, then +.I fsck_ffs +will remove that directory entry. +This condition occurs if bad data is written into a directory data block. +.PP +The directory inode number entry for ``\fB.\fP'' +must be the first entry in the directory data block. +The inode number for ``\fB.\fP'' +must reference itself; +e.g., it must equal the inode number +for the directory data block. +The directory inode number entry +for ``\fB..\fP'' must be +the second entry in the directory data block. +Its value must equal the inode number for the +parent of the directory entry +(or the inode number of the directory +data block if the directory is the +root directory). +If the directory inode numbers are +incorrect, +.I fsck_ffs +will replace them with the correct values. +If there are multiple hard links to a directory, +the first one encountered is considered the real parent +to which ``\fB..\fP'' should point; +\fIfsck_ffs\fP recommends deletion for the subsequently discovered names. +.NH 2 +File system connectivity +.PP +.I Fsck_ffs +checks the general connectivity of the file system. +If directories are not linked into the file system, then +.I fsck_ffs +links the directory back into the file system in the +.I lost+found +directory. +This condition only occurs when there has been a hardware failure. +.ds RH "References" +.SH +\s+2Acknowledgements\s0 +.PP +I thank Bill Joy, Sam Leffler, Robert Elz and Dennis Ritchie +for their suggestions and help in implementing the new file system. +Thanks also to Robert Henry for his editorial input to +get this document together. +Finally we thank our sponsors, +the National Science Foundation under grant MCS80-05144, +and the Defense Advance Research Projects Agency (DoD) under +Arpa Order No. 4031 monitored by Naval Electronic System Command under +Contract No. N00039-82-C-0235. (Kirk McKusick, July 1983) +.PP +I would like to thank Larry A. Wehr for advice that lead +to the first version of +.I fsck_ffs +and Rick B. Brandt for adapting +.I fsck_ffs +to +UNIX/TS. (T. Kowalski, July 1979) +.sp 2 +.SH +\s+2References\s0 +.LP +.IP [Dolotta78] 20 +Dolotta, T. A., and Olsson, S. B. eds., +.I "UNIX User's Manual, Edition 1.1\^" , +January 1978. +.IP [Joy83] 20 +Joy, W., Cooper, E., Fabry, R., Leffler, S., McKusick, M., and Mosher, D. +4.2BSD System Manual, +.I "University of California at Berkeley" , +.I "Computer Systems Research Group Technical Report" +#4, 1982. +.IP [McKusick84] 20 +McKusick, M., Joy, W., Leffler, S., and Fabry, R. +A Fast File System for UNIX, +\fIACM Transactions on Computer Systems 2\fP, 3. +pp. 181-197, August 1984. +.IP [Ritchie78] 20 +Ritchie, D. M., and Thompson, K., +The UNIX Time-Sharing System, +.I "The Bell System Technical Journal" +.B 57 , +6 (July-August 1978, Part 2), pp. 1905-29. +.IP [Thompson78] 20 +Thompson, K., +UNIX Implementation, +.I "The Bell System Technical Journal\^" +.B 57 , +6 (July-August 1978, Part 2), pp. 1931-46. +.ds RH Appendix A \- Fsck_ffs Error Conditions +.bp diff --git a/share/doc/smm/03.fsck/4.t b/share/doc/smm/03.fsck/4.t new file mode 100644 index 000000000000..4fb370cdc8a2 --- /dev/null +++ b/share/doc/smm/03.fsck/4.t @@ -0,0 +1,1418 @@ +.\" Copyright (c) 1982, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds RH Appendix A \- Fsck_ffs Error Conditions +.NH +Appendix A \- Fsck_ffs Error Conditions +.NH 2 +Conventions +.PP +.I Fsck_ffs +is +a multi-pass file system check program. +Each file system pass invokes a different Phase of the +.I fsck_ffs +program. +After the initial setup, +.I fsck_ffs +performs successive Phases over each file system, +checking blocks and sizes, +path-names, +connectivity, +reference counts, +and the map of free blocks, +(possibly rebuilding it), +and performs some cleanup. +.LP +Normally +.I fsck_ffs +is run non-interactively to +.I preen +the file systems after an unclean halt. +While preen'ing a file system, +it will only fix corruptions that are expected +to occur from an unclean halt. +These actions are a proper subset of the actions that +.I fsck_ffs +will take when it is running interactively. +Throughout this appendix many errors have several options +that the operator can take. +When an inconsistency is detected, +.I fsck_ffs +reports the error condition to the operator. +If a response is required, +.I fsck_ffs +prints a prompt message and +waits for a response. +When preen'ing most errors are fatal. +For those that are expected, +the response taken is noted. +This appendix explains the meaning of each error condition, +the possible responses, and the related error conditions. +.LP +The error conditions are organized by the +.I Phase +of the +.I fsck_ffs +program in which they can occur. +The error conditions that may occur +in more than one Phase +will be discussed in initialization. +.NH 2 +Initialization +.PP +Before a file system check can be performed, certain +tables have to be set up and certain files opened. +This section concerns itself with the opening of files and +the initialization of tables. +This section lists error conditions resulting from +command line options, +memory requests, +opening of files, +status of files, +file system size checks, +and creation of the scratch file. +All the initialization errors are fatal +when the file system is being preen'ed. +.sp +.LP +.B "\fIC\fP option?" +.br +\fIC\fP is not a legal option to +.I fsck_ffs ; +legal options are \-b, \-c, \-y, \-n, and \-p. +.I Fsck_ffs +terminates on this error condition. +See the +.I fsck_ffs (8) +manual entry for further detail. +.sp +.LP +.B "cannot alloc NNN bytes for blockmap" +.br +.B "cannot alloc NNN bytes for freemap" +.br +.B "cannot alloc NNN bytes for statemap" +.br +.B "cannot alloc NNN bytes for lncntp" +.br +.I Fsck_ffs 's +request for memory for its virtual +memory tables failed. +This should never happen. +.I Fsck_ffs +terminates on this error condition. +See a guru. +.sp +.LP +.B "Can't open checklist file: \fIF\fP" +.br +The file system checklist file +\fIF\fP (usually +.I /etc/fstab ) +can not be opened for reading. +.I Fsck_ffs +terminates on this error condition. +Check access modes of \fIF\fP. +.sp +.LP +.B "Can't stat root" +.br +.I Fsck_ffs 's +request for statistics about the root directory ``/'' failed. +This should never happen. +.I Fsck_ffs +terminates on this error condition. +See a guru. +.sp +.LP +.B "Can't stat \fIF\fP" +.br +.B "Can't make sense out of name \fIF\fP" +.br +.I Fsck_ffs 's +request for statistics about the file system \fIF\fP failed. +When running manually, +it ignores this file system +and continues checking the next file system given. +Check access modes of \fIF\fP. +.sp +.LP +.B "Can't open \fIF\fP" +.br +.I Fsck_ffs 's +request attempt to open the file system \fIF\fP failed. +When running manually, it ignores this file system +and continues checking the next file system given. +Check access modes of \fIF\fP. +.sp +.LP +.B "\fIF\fP: (NO WRITE)" +.br +Either the \-n flag was specified or +.I fsck_ffs 's +attempt to open the file system \fIF\fP for writing failed. +When running manually, +all the diagnostics are printed out, +but no modifications are attempted to fix them. +.sp +.LP +.B "file is not a block or character device; OK" +.br +You have given +.I fsck_ffs +a regular file name by mistake. +Check the type of the file specified. +.LP +Possible responses to the OK prompt are: +.IP YES +ignore this error condition. +.IP NO +ignore this file system and continues checking +the next file system given. +.sp +.LP +.B "UNDEFINED OPTIMIZATION IN SUPERBLOCK (SET TO DEFAULT)" +.br +The superblock optimization parameter is neither OPT_TIME +nor OPT_SPACE. +.LP +Possible responses to the SET TO DEFAULT prompt are: +.IP YES +The superblock is set to request optimization to minimize +running time of the system. +(If optimization to minimize disk space utilization is +desired, it can be set using \fItunefs\fP(8).) +.IP NO +ignore this error condition. +.sp +.LP +.B "IMPOSSIBLE MINFREE=\fID\fP IN SUPERBLOCK (SET TO DEFAULT)" +.br +The superblock minimum space percentage is greater than 99% +or less then 0%. +.LP +Possible responses to the SET TO DEFAULT prompt are: +.IP YES +The minfree parameter is set to 10%. +(If some other percentage is desired, +it can be set using \fItunefs\fP(8).) +.IP NO +ignore this error condition. +.sp +.LP +.B "IMPOSSIBLE INTERLEAVE=\fID\fP IN SUPERBLOCK (SET TO DEFAULT)" +.br +The file system interleave is less than or equal to zero. +.LP +Possible responses to the SET TO DEFAULT prompt are: +.IP YES +The interleave parameter is set to 1. +.IP NO +ignore this error condition. +.sp +.LP +.B "IMPOSSIBLE NPSECT=\fID\fP IN SUPERBLOCK (SET TO DEFAULT)" +.br +The number of physical sectors per track is less than the number +of usable sectors per track. +.LP +Possible responses to the SET TO DEFAULT prompt are: +.IP YES +The npsect parameter is set to the number of usable sectors per track. +.IP NO +ignore this error condition. +.sp +.LP +One of the following messages will appear: +.br +.B "MAGIC NUMBER WRONG" +.br +.B "NCG OUT OF RANGE" +.br +.B "CPG OUT OF RANGE" +.br +.B "NCYL DOES NOT JIVE WITH NCG*CPG" +.br +.B "SIZE PREPOSTEROUSLY LARGE" +.br +.B "TRASHED VALUES IN SUPER BLOCK" +.br +and will be followed by the message: +.br +.B "\fIF\fP: BAD SUPER BLOCK: \fIB\fP" +.br +.B "USE -b OPTION TO FSCK_FFS TO SPECIFY LOCATION OF AN ALTERNATE" +.br +.B "SUPER-BLOCK TO SUPPLY NEEDED INFORMATION; SEE fsck_ffs(8)." +.br +The super block has been corrupted. +An alternative super block must be selected from among those +listed by +.I newfs +(8) when the file system was created. +For file systems with a blocksize less than 32K, +specifying \-b 32 is a good first choice. +.sp +.LP +.B "INTERNAL INCONSISTENCY: \fIM\fP" +.br +.I Fsck_ffs 's +has had an internal panic, whose message is specified as \fIM\fP. +This should never happen. +See a guru. +.sp +.LP +.B "CAN NOT SEEK: BLK \fIB\fP (CONTINUE)" +.br +.I Fsck_ffs 's +request for moving to a specified block number \fIB\fP in +the file system failed. +This should never happen. +See a guru. +.LP +Possible responses to the CONTINUE prompt are: +.IP YES +attempt to continue to run the file system check. +Often, +however the problem will persist. +This error condition will not allow a complete check of the file system. +A second run of +.I fsck_ffs +should be made to re-check this file system. +If the block was part of the virtual memory buffer +cache, +.I fsck_ffs +will terminate with the message ``Fatal I/O error''. +.IP NO +terminate the program. +.sp +.LP +.B "CAN NOT READ: BLK \fIB\fP (CONTINUE)" +.br +.I Fsck_ffs 's +request for reading a specified block number \fIB\fP in +the file system failed. +This should never happen. +See a guru. +.LP +Possible responses to the CONTINUE prompt are: +.IP YES +attempt to continue to run the file system check. +It will retry the read and print out the message: +.br +.B "THE FOLLOWING SECTORS COULD NOT BE READ: \fIN\fP" +.br +where \fIN\fP indicates the sectors that could not be read. +If +.I fsck_ffs +ever tries to write back one of the blocks on which the read failed +it will print the message: +.br +.B "WRITING ZERO'ED BLOCK \fIN\fP TO DISK" +.br +where \fIN\fP indicates the sector that was written with zero's. +If the disk is experiencing hardware problems, the problem will persist. +This error condition will not allow a complete check of the file system. +A second run of +.I fsck_ffs +should be made to re-check this file system. +If the block was part of the virtual memory buffer +cache, +.I fsck_ffs +will terminate with the message ``Fatal I/O error''. +.IP NO +terminate the program. +.sp +.LP +.B "CAN NOT WRITE: BLK \fIB\fP (CONTINUE)" +.br +.I Fsck_ffs 's +request for writing a specified block number \fIB\fP +in the file system failed. +The disk is write-protected; +check the write protect lock on the drive. +If that is not the problem, see a guru. +.LP +Possible responses to the CONTINUE prompt are: +.IP YES +attempt to continue to run the file system check. +The write operation will be retried with the failed blocks +indicated by the message: +.br +.B "THE FOLLOWING SECTORS COULD NOT BE WRITTEN: \fIN\fP" +.br +where \fIN\fP indicates the sectors that could not be written. +If the disk is experiencing hardware problems, the problem will persist. +This error condition will not allow a complete check of the file system. +A second run of +.I fsck_ffs +should be made to re-check this file system. +If the block was part of the virtual memory buffer +cache, +.I fsck_ffs +will terminate with the message ``Fatal I/O error''. +.IP NO +terminate the program. +.sp +.LP +.B "bad inode number DDD to ginode" +.br +An internal error has attempted to read non-existent inode \fIDDD\fP. +This error causes +.I fsck_ffs +to exit. +See a guru. +.NH 2 +Phase 1 \- Check Blocks and Sizes +.PP +This phase concerns itself with +the inode list. +This section lists error conditions resulting from +checking inode types, +setting up the zero-link-count table, +examining inode block numbers for bad or duplicate blocks, +checking inode size, +and checking inode format. +All errors in this phase except +.B "INCORRECT BLOCK COUNT" +and +.B "PARTIALLY TRUNCATED INODE" +are fatal if the file system is being preen'ed. +.sp +.LP +.B "UNKNOWN FILE TYPE I=\fII\fP (CLEAR)" +.br +The mode word of the inode \fII\fP indicates that the inode is not a +special block inode, special character inode, socket inode, regular inode, +symbolic link, or directory inode. +.LP +Possible responses to the CLEAR prompt are: +.IP YES +de-allocate inode \fII\fP by zeroing its contents. +This will always invoke the UNALLOCATED error condition in Phase 2 +for each directory entry pointing to this inode. +.IP NO +ignore this error condition. +.sp +.LP +.B "PARTIALLY TRUNCATED INODE I=\fII\fP (SALVAGE)" +.br +.I Fsck_ffs +has found inode \fII\fP whose size is shorter than the number of +blocks allocated to it. +This condition should only occur if the system crashes while in the +midst of truncating a file. +When preen'ing the file system, +.I fsck_ffs +completes the truncation to the specified size. +.LP +Possible responses to SALVAGE are: +.IP YES +complete the truncation to the size specified in the inode. +.IP NO +ignore this error condition. +.sp +.LP +.B "LINK COUNT TABLE OVERFLOW (CONTINUE)" +.br +An internal table for +.I fsck_ffs +containing allocated inodes with a link count of +zero cannot allocate more memory. +Increase the virtual memory for +.I fsck_ffs . +.LP +Possible responses to the CONTINUE prompt are: +.IP YES +continue with the program. +This error condition will not allow a complete check of the file system. +A second run of +.I fsck_ffs +should be made to re-check this file system. +If another allocated inode with a zero link count is found, +this error condition is repeated. +.IP NO +terminate the program. +.sp +.LP +.B "\fIB\fP BAD I=\fII\fP" +.br +Inode \fII\fP contains block number \fIB\fP with a number +lower than the number of the first data block in the file system or +greater than the number of the last block +in the file system. +This error condition may invoke the +.B "EXCESSIVE BAD BLKS" +error condition in Phase 1 (see next paragraph) if +inode \fII\fP has too many block numbers outside the file system range. +This error condition will always invoke the +.B "BAD/DUP" +error condition in Phase 2 and Phase 4. +.sp +.LP +.B "EXCESSIVE BAD BLKS I=\fII\fP (CONTINUE)" +.br +There is more than a tolerable number (usually 10) of blocks with a number +lower than the number of the first data block in the file system or greater than +the number of last block in the file system associated with inode \fII\fP. +.LP +Possible responses to the CONTINUE prompt are: +.IP YES +ignore the rest of the blocks in this inode +and continue checking with the next inode in the file system. +This error condition will not allow a complete check of the file system. +A second run of +.I fsck_ffs +should be made to re-check this file system. +.IP NO +terminate the program. +.sp +.LP +.B "BAD STATE DDD TO BLKERR" +.br +An internal error has scrambled +.I fsck_ffs 's +state map to have the impossible value \fIDDD\fP. +.I Fsck_ffs +exits immediately. +See a guru. +.sp +.LP +.B "\fIB\fP DUP I=\fII\fP" +.br +Inode \fII\fP contains block number \fIB\fP that is already claimed by +another inode. +This error condition may invoke the +.B "EXCESSIVE DUP BLKS" +error condition in Phase 1 if +inode \fII\fP has too many block numbers claimed by other inodes. +This error condition will always invoke Phase 1b and the +.B "BAD/DUP" +error condition in Phase 2 and Phase 4. +.sp +.LP +.B "EXCESSIVE DUP BLKS I=\fII\fP (CONTINUE)" +.br +There is more than a tolerable number (usually 10) of blocks claimed by other +inodes. +.LP +Possible responses to the CONTINUE prompt are: +.IP YES +ignore the rest of the blocks in this inode +and continue checking with the next inode in the file system. +This error condition will not allow a complete check of the file system. +A second run of +.I fsck_ffs +should be made to re-check this file system. +.IP NO +terminate the program. +.sp +.LP +.B "DUP TABLE OVERFLOW (CONTINUE)" +.br +An internal table in +.I fsck_ffs +containing duplicate block numbers cannot allocate any more space. +Increase the amount of virtual memory available to +.I fsck_ffs . +.LP +Possible responses to the CONTINUE prompt are: +.IP YES +continue with the program. +This error condition will not allow a complete check of the file system. +A second run of +.I fsck_ffs +should be made to re-check this file system. +If another duplicate block is found, this error condition will repeat. +.IP NO +terminate the program. +.sp +.LP +.B "PARTIALLY ALLOCATED INODE I=\fII\fP (CLEAR)" +.br +Inode \fII\fP is neither allocated nor unallocated. +.LP +Possible responses to the CLEAR prompt are: +.IP YES +de-allocate inode \fII\fP by zeroing its contents. +.IP NO +ignore this error condition. +.sp +.LP +.B "INCORRECT BLOCK COUNT I=\fII\fP (\fIX\fP should be \fIY\fP) (CORRECT)" +.br +The block count for inode \fII\fP is \fIX\fP blocks, +but should be \fIY\fP blocks. +When preen'ing the count is corrected. +.LP +Possible responses to the CORRECT prompt are: +.IP YES +replace the block count of inode \fII\fP with \fIY\fP. +.IP NO +ignore this error condition. +.NH 2 +Phase 1B: Rescan for More Dups +.PP +When a duplicate block is found in the file system, the file system is +rescanned to find the inode that previously claimed that block. +This section lists the error condition when the duplicate block is found. +.sp +.LP +.B "\fIB\fP DUP I=\fII\fP" +.br +Inode \fII\fP contains block number \fIB\fP that +is already claimed by another inode. +This error condition will always invoke the +.B "BAD/DUP" +error condition in Phase 2. +You can determine which inodes have overlapping blocks by examining +this error condition and the DUP error condition in Phase 1. +.NH 2 +Phase 2 \- Check Pathnames +.PP +This phase concerns itself with removing directory entries +pointing to +error conditioned inodes +from Phase 1 and Phase 1b. +This section lists error conditions resulting from +root inode mode and status, +directory inode pointers in range, +and directory entries pointing to bad inodes, +and directory integrity checks. +All errors in this phase are fatal if the file system is being preen'ed, +except for directories not being a multiple of the blocks size +and extraneous hard links. +.sp +.LP +.B "ROOT INODE UNALLOCATED (ALLOCATE)" +.br +The root inode (usually inode number 2) has no allocate mode bits. +This should never happen. +.LP +Possible responses to the ALLOCATE prompt are: +.IP YES +allocate inode 2 as the root inode. +The files and directories usually found in the root will be recovered +in Phase 3 and put into +.I lost+found . +If the attempt to allocate the root fails, +.I fsck_ffs +will exit with the message: +.br +.B "CANNOT ALLOCATE ROOT INODE" . +.IP NO +.I fsck_ffs +will exit. +.sp +.LP +.B "ROOT INODE NOT DIRECTORY (REALLOCATE)" +.br +The root inode (usually inode number 2) +is not directory inode type. +.LP +Possible responses to the REALLOCATE prompt are: +.IP YES +clear the existing contents of the root inode +and reallocate it. +The files and directories usually found in the root will be recovered +in Phase 3 and put into +.I lost+found . +If the attempt to allocate the root fails, +.I fsck_ffs +will exit with the message: +.br +.B "CANNOT ALLOCATE ROOT INODE" . +.IP NO +.I fsck_ffs +will then prompt with +.B "FIX" +.LP +Possible responses to the FIX prompt are: +.IP YES +replace the root inode's type to be a directory. +If the root inode's data blocks are not directory blocks, +many error conditions will be produced. +.IP NO +terminate the program. +.sp +.LP +.B "DUPS/BAD IN ROOT INODE (REALLOCATE)" +.br +Phase 1 or Phase 1b have found duplicate blocks +or bad blocks in the root inode (usually inode number 2) for the file system. +.LP +Possible responses to the REALLOCATE prompt are: +.IP YES +clear the existing contents of the root inode +and reallocate it. +The files and directories usually found in the root will be recovered +in Phase 3 and put into +.I lost+found . +If the attempt to allocate the root fails, +.I fsck_ffs +will exit with the message: +.br +.B "CANNOT ALLOCATE ROOT INODE" . +.IP NO +.I fsck_ffs +will then prompt with +.B "CONTINUE" . +.LP +Possible responses to the CONTINUE prompt are: +.IP YES +ignore the +.B "DUPS/BAD" +error condition in the root inode and +attempt to continue to run the file system check. +If the root inode is not correct, +then this may result in many other error conditions. +.IP NO +terminate the program. +.sp +.LP +.B "NAME TOO LONG \fIF\fP" +.br +An excessively long path name has been found. +This usually indicates loops in the file system name space. +This can occur if the super user has made circular links to directories. +The offending links must be removed (by a guru). +.sp +.LP +.B "I OUT OF RANGE I=\fII\fP NAME=\fIF\fP (REMOVE)" +.br +A directory entry \fIF\fP has an inode number \fII\fP that is greater than +the end of the inode list. +.LP +Possible responses to the REMOVE prompt are: +.IP YES +the directory entry \fIF\fP is removed. +.IP NO +ignore this error condition. +.sp +.LP +.B "UNALLOCATED I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP \fItype\fP=\fIF\fP (REMOVE)" +.br +A directory or file entry \fIF\fP points to an unallocated inode \fII\fP. +The owner \fIO\fP, mode \fIM\fP, size \fIS\fP, modify time \fIT\fP, +and name \fIF\fP are printed. +.LP +Possible responses to the REMOVE prompt are: +.IP YES +the directory entry \fIF\fP is removed. +.IP NO +ignore this error condition. +.sp +.LP +.B "DUP/BAD I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP \fItype\fP=\fIF\fP (REMOVE)" +.br +Phase 1 or Phase 1b have found duplicate blocks or bad blocks +associated with directory or file entry \fIF\fP, inode \fII\fP. +The owner \fIO\fP, mode \fIM\fP, size \fIS\fP, modify time \fIT\fP, +and directory name \fIF\fP are printed. +.LP +Possible responses to the REMOVE prompt are: +.IP YES +the directory entry \fIF\fP is removed. +.IP NO +ignore this error condition. +.sp +.LP +.B "ZERO LENGTH DIRECTORY I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP (REMOVE)" +.br +A directory entry \fIF\fP has a size \fIS\fP that is zero. +The owner \fIO\fP, mode \fIM\fP, size \fIS\fP, modify time \fIT\fP, +and directory name \fIF\fP are printed. +.LP +Possible responses to the REMOVE prompt are: +.IP YES +the directory entry \fIF\fP is removed; +this will always invoke the BAD/DUP error condition in Phase 4. +.IP NO +ignore this error condition. +.sp +.LP +.B "DIRECTORY TOO SHORT I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP (FIX)" +.br +A directory \fIF\fP has been found whose size \fIS\fP +is less than the minimum size directory. +The owner \fIO\fP, mode \fIM\fP, size \fIS\fP, modify time \fIT\fP, +and directory name \fIF\fP are printed. +.LP +Possible responses to the FIX prompt are: +.IP YES +increase the size of the directory to the minimum directory size. +.IP NO +ignore this directory. +.sp +.LP +.B "DIRECTORY \fIF\fP LENGTH \fIS\fP NOT MULTIPLE OF \fIB\fP (ADJUST) +.br +A directory \fIF\fP has been found with size \fIS\fP that is not +a multiple of the directory blocksize \fIB\fP. +.LP +Possible responses to the ADJUST prompt are: +.IP YES +the length is rounded up to the appropriate block size. +This error can occur on 4.2BSD file systems. +Thus when preen'ing the file system only a warning is printed +and the directory is adjusted. +.IP NO +ignore the error condition. +.sp +.LP +.B "DIRECTORY CORRUPTED I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP (SALVAGE)" +.br +A directory with an inconsistent internal state has been found. +.LP +Possible responses to the FIX prompt are: +.IP YES +throw away all entries up to the next directory boundary (usually 512-byte) +boundary. +This drastic action can throw away up to 42 entries, +and should be taken only after other recovery efforts have failed. +.IP NO +skip up to the next directory boundary and resume reading, +but do not modify the directory. +.sp +.LP +.B "BAD INODE NUMBER FOR `.' I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP (FIX)" +.br +A directory \fII\fP has been found whose inode number for `.' does +does not equal \fII\fP. +.LP +Possible responses to the FIX prompt are: +.IP YES +change the inode number for `.' to be equal to \fII\fP. +.IP NO +leave the inode number for `.' unchanged. +.sp +.LP +.B "MISSING `.' I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP (FIX)" +.br +A directory \fII\fP has been found whose first entry is unallocated. +.LP +Possible responses to the FIX prompt are: +.IP YES +build an entry for `.' with inode number equal to \fII\fP. +.IP NO +leave the directory unchanged. +.sp +.LP +.B "MISSING `.' I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP" +.br +.B "CANNOT FIX, FIRST ENTRY IN DIRECTORY CONTAINS \fIF\fP" +.br +A directory \fII\fP has been found whose first entry is \fIF\fP. +.I Fsck_ffs +cannot resolve this problem. +The file system should be mounted and the offending entry \fIF\fP +moved elsewhere. +The file system should then be unmounted and +.I fsck_ffs +should be run again. +.sp +.LP +.B "MISSING `.' I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP" +.br +.B "CANNOT FIX, INSUFFICIENT SPACE TO ADD `.'" +.br +A directory \fII\fP has been found whose first entry is not `.'. +.I Fsck_ffs +cannot resolve this problem as it should never happen. +See a guru. +.sp +.LP +.B "EXTRA `.' ENTRY I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP (FIX)" +.br +A directory \fII\fP has been found that has more than one entry for `.'. +.LP +Possible responses to the FIX prompt are: +.IP YES +remove the extra entry for `.'. +.IP NO +leave the directory unchanged. +.sp +.LP +.B "BAD INODE NUMBER FOR `..' I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP (FIX)" +.br +A directory \fII\fP has been found whose inode number for `..' does +does not equal the parent of \fII\fP. +.LP +Possible responses to the FIX prompt are: +.IP YES +change the inode number for `..' to be equal to the parent of \fII\fP +(``\fB..\fP'' in the root inode points to itself). +.IP NO +leave the inode number for `..' unchanged. +.sp +.LP +.B "MISSING `..' I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP (FIX)" +.br +A directory \fII\fP has been found whose second entry is unallocated. +.LP +Possible responses to the FIX prompt are: +.IP YES +build an entry for `..' with inode number equal to the parent of \fII\fP +(``\fB..\fP'' in the root inode points to itself). +.IP NO +leave the directory unchanged. +.sp +.LP +.B "MISSING `..' I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP" +.br +.B "CANNOT FIX, SECOND ENTRY IN DIRECTORY CONTAINS \fIF\fP" +.br +A directory \fII\fP has been found whose second entry is \fIF\fP. +.I Fsck_ffs +cannot resolve this problem. +The file system should be mounted and the offending entry \fIF\fP +moved elsewhere. +The file system should then be unmounted and +.I fsck_ffs +should be run again. +.sp +.LP +.B "MISSING `..' I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP" +.br +.B "CANNOT FIX, INSUFFICIENT SPACE TO ADD `..'" +.br +A directory \fII\fP has been found whose second entry is not `..'. +.I Fsck_ffs +cannot resolve this problem. +The file system should be mounted and the second entry in the directory +moved elsewhere. +The file system should then be unmounted and +.I fsck_ffs +should be run again. +.sp +.LP +.B "EXTRA `..' ENTRY I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP DIR=\fIF\fP (FIX)" +.br +A directory \fII\fP has been found that has more than one entry for `..'. +.LP +Possible responses to the FIX prompt are: +.IP YES +remove the extra entry for `..'. +.IP NO +leave the directory unchanged. +.sp +.LP +.B "\fIN\fP IS AN EXTRANEOUS HARD LINK TO A DIRECTORY \fID\fP (REMOVE) +.br +.I Fsck_ffs +has found a hard link, \fIN\fP, to a directory, \fID\fP. +When preen'ing the extraneous links are ignored. +.LP +Possible responses to the REMOVE prompt are: +.IP YES +delete the extraneous entry, \fIN\fP. +.IP NO +ignore the error condition. +.sp +.LP +.B "BAD INODE \fIS\fP TO DESCEND" +.br +An internal error has caused an impossible state \fIS\fP to be passed to the +routine that descends the file system directory structure. +.I Fsck_ffs +exits. +See a guru. +.sp +.LP +.B "BAD RETURN STATE \fIS\fP FROM DESCEND" +.br +An internal error has caused an impossible state \fIS\fP to be returned +from the routine that descends the file system directory structure. +.I Fsck_ffs +exits. +See a guru. +.sp +.LP +.B "BAD STATE \fIS\fP FOR ROOT INODE" +.br +An internal error has caused an impossible state \fIS\fP to be assigned +to the root inode. +.I Fsck_ffs +exits. +See a guru. +.NH 2 +Phase 3 \- Check Connectivity +.PP +This phase concerns itself with the directory connectivity seen in +Phase 2. +This section lists error conditions resulting from +unreferenced directories, +and missing or full +.I lost+found +directories. +.sp +.LP +.B "UNREF DIR I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP (RECONNECT)" +.br +The directory inode \fII\fP was not connected to a directory entry +when the file system was traversed. +The owner \fIO\fP, mode \fIM\fP, size \fIS\fP, and +modify time \fIT\fP of directory inode \fII\fP are printed. +When preen'ing, the directory is reconnected if its size is non-zero, +otherwise it is cleared. +.LP +Possible responses to the RECONNECT prompt are: +.IP YES +reconnect directory inode \fII\fP to the file system in the +directory for lost files (usually \fIlost+found\fP). +This may invoke the +.I lost+found +error condition in Phase 3 +if there are problems connecting directory inode \fII\fP to \fIlost+found\fP. +This may also invoke the CONNECTED error condition in Phase 3 if the link +was successful. +.IP NO +ignore this error condition. +This will always invoke the UNREF error condition in Phase 4. +.sp +.LP +.B "NO lost+found DIRECTORY (CREATE)" +.br +There is no +.I lost+found +directory in the root directory of the file system; +When preen'ing +.I fsck_ffs +tries to create a \fIlost+found\fP directory. +.LP +Possible responses to the CREATE prompt are: +.IP YES +create a \fIlost+found\fP directory in the root of the file system. +This may raise the message: +.br +.B "NO SPACE LEFT IN / (EXPAND)" +.br +See below for the possible responses. +Inability to create a \fIlost+found\fP directory generates the message: +.br +.B "SORRY. CANNOT CREATE lost+found DIRECTORY" +.br +and aborts the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +.IP NO +abort the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +.sp +.LP +.B "lost+found IS NOT A DIRECTORY (REALLOCATE)" +.br +The entry for +.I lost+found +is not a directory. +.LP +Possible responses to the REALLOCATE prompt are: +.IP YES +allocate a directory inode, and change \fIlost+found\fP to reference it. +The previous inode reference by the \fIlost+found\fP name is not cleared. +Thus it will either be reclaimed as an UNREF'ed inode or have its +link count ADJUST'ed later in this Phase. +Inability to create a \fIlost+found\fP directory generates the message: +.br +.B "SORRY. CANNOT CREATE lost+found DIRECTORY" +.br +and aborts the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +.IP NO +abort the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +.sp +.LP +.B "NO SPACE LEFT IN /lost+found (EXPAND)" +.br +There is no space to add another entry to the +.I lost+found +directory in the root directory +of the file system. +When preen'ing the +.I lost+found +directory is expanded. +.LP +Possible responses to the EXPAND prompt are: +.IP YES +the +.I lost+found +directory is expanded to make room for the new entry. +If the attempted expansion fails +.I fsck_ffs +prints the message: +.br +.B "SORRY. NO SPACE IN lost+found DIRECTORY" +.br +and aborts the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +Clean out unnecessary entries in +.I lost+found . +This error is fatal if the file system is being preen'ed. +.IP NO +abort the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +.sp +.LP +.B "DIR I=\fII1\fP CONNECTED. PARENT WAS I=\fII2\fP" +.br +This is an advisory message indicating a directory inode \fII1\fP was +successfully connected to the +.I lost+found +directory. +The parent inode \fII2\fP of the directory inode \fII1\fP is +replaced by the inode number of the +.I lost+found +directory. +.sp +.LP +.B "DIRECTORY \fIF\fP LENGTH \fIS\fP NOT MULTIPLE OF \fIB\fP (ADJUST) +.br +A directory \fIF\fP has been found with size \fIS\fP that is not +a multiple of the directory blocksize \fIB\fP +(this can reoccur in Phase 3 if it is not adjusted in Phase 2). +.LP +Possible responses to the ADJUST prompt are: +.IP YES +the length is rounded up to the appropriate block size. +This error can occur on 4.2BSD file systems. +Thus when preen'ing the file system only a warning is printed +and the directory is adjusted. +.IP NO +ignore the error condition. +.sp +.LP +.B "BAD INODE \fIS\fP TO DESCEND" +.br +An internal error has caused an impossible state \fIS\fP to be passed to the +routine that descends the file system directory structure. +.I Fsck_ffs +exits. +See a guru. +.NH 2 +Phase 4 \- Check Reference Counts +.PP +This phase concerns itself with the link count information +seen in Phase 2 and Phase 3. +This section lists error conditions resulting from +unreferenced files, +missing or full +.I lost+found +directory, +incorrect link counts for files, directories, symbolic links, or special files, +unreferenced files, symbolic links, and directories, +and bad or duplicate blocks in files, symbolic links, and directories. +All errors in this phase are correctable if the file system is being preen'ed +except running out of space in the \fIlost+found\fP directory. +.sp +.LP +.B "UNREF FILE I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP (RECONNECT)" +.br +Inode \fII\fP was not connected to a directory entry +when the file system was traversed. +The owner \fIO\fP, mode \fIM\fP, size \fIS\fP, and +modify time \fIT\fP of inode \fII\fP are printed. +When preen'ing the file is cleared if either its size or its +link count is zero, +otherwise it is reconnected. +.LP +Possible responses to the RECONNECT prompt are: +.IP YES +reconnect inode \fII\fP to the file system in the directory for +lost files (usually \fIlost+found\fP). +This may invoke the +.I lost+found +error condition in Phase 4 +if there are problems connecting inode \fII\fP to +.I lost+found . +.IP NO +ignore this error condition. +This will always invoke the CLEAR error condition in Phase 4. +.sp +.LP +.B "(CLEAR)" +.br +The inode mentioned in the immediately previous error condition can not be +reconnected. +This cannot occur if the file system is being preen'ed, +since lack of space to reconnect files is a fatal error. +.LP +Possible responses to the CLEAR prompt are: +.IP YES +de-allocate the inode mentioned in the immediately previous error condition by zeroing its contents. +.IP NO +ignore this error condition. +.sp +.LP +.B "NO lost+found DIRECTORY (CREATE)" +.br +There is no +.I lost+found +directory in the root directory of the file system; +When preen'ing +.I fsck_ffs +tries to create a \fIlost+found\fP directory. +.LP +Possible responses to the CREATE prompt are: +.IP YES +create a \fIlost+found\fP directory in the root of the file system. +This may raise the message: +.br +.B "NO SPACE LEFT IN / (EXPAND)" +.br +See below for the possible responses. +Inability to create a \fIlost+found\fP directory generates the message: +.br +.B "SORRY. CANNOT CREATE lost+found DIRECTORY" +.br +and aborts the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +.IP NO +abort the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +.sp +.LP +.B "lost+found IS NOT A DIRECTORY (REALLOCATE)" +.br +The entry for +.I lost+found +is not a directory. +.LP +Possible responses to the REALLOCATE prompt are: +.IP YES +allocate a directory inode, and change \fIlost+found\fP to reference it. +The previous inode reference by the \fIlost+found\fP name is not cleared. +Thus it will either be reclaimed as an UNREF'ed inode or have its +link count ADJUST'ed later in this Phase. +Inability to create a \fIlost+found\fP directory generates the message: +.br +.B "SORRY. CANNOT CREATE lost+found DIRECTORY" +.br +and aborts the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +.IP NO +abort the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +.sp +.LP +.B "NO SPACE LEFT IN /lost+found (EXPAND)" +.br +There is no space to add another entry to the +.I lost+found +directory in the root directory +of the file system. +When preen'ing the +.I lost+found +directory is expanded. +.LP +Possible responses to the EXPAND prompt are: +.IP YES +the +.I lost+found +directory is expanded to make room for the new entry. +If the attempted expansion fails +.I fsck_ffs +prints the message: +.br +.B "SORRY. NO SPACE IN lost+found DIRECTORY" +.br +and aborts the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +Clean out unnecessary entries in +.I lost+found . +This error is fatal if the file system is being preen'ed. +.IP NO +abort the attempt to linkup the lost inode. +This will always invoke the UNREF error condition in Phase 4. +.sp +.LP +.B "LINK COUNT \fItype\fP I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP COUNT=\fIX\fP SHOULD BE \fIY\fP (ADJUST)" +.br +The link count for inode \fII\fP, +is \fIX\fP but should be \fIY\fP. +The owner \fIO\fP, mode \fIM\fP, size \fIS\fP, and modify time \fIT\fP +are printed. +When preen'ing the link count is adjusted unless the number of references +is increasing, a condition that should never occur unless precipitated +by a hardware failure. +When the number of references is increasing under preen mode, +.I fsck_ffs +exits with the message: +.br +.B "LINK COUNT INCREASING" +.LP +Possible responses to the ADJUST prompt are: +.IP YES +replace the link count of file inode \fII\fP with \fIY\fP. +.IP NO +ignore this error condition. +.sp +.LP +.B "UNREF \fItype\fP I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP (CLEAR)" +.br +Inode \fII\fP, was not connected to a directory entry when the +file system was traversed. +The owner \fIO\fP, mode \fIM\fP, size \fIS\fP, +and modify time \fIT\fP of inode \fII\fP +are printed. +When preen'ing, +this is a file that was not connected because its size or link count was zero, +hence it is cleared. +.LP +Possible responses to the CLEAR prompt are: +.IP YES +de-allocate inode \fII\fP by zeroing its contents. +.IP NO +ignore this error condition. +.sp +.LP +.B "BAD/DUP \fItype\fP I=\fII\fP OWNER=\fIO\fP MODE=\fIM\fP SIZE=\fIS\fP MTIME=\fIT\fP (CLEAR)" +.br +Phase 1 or Phase 1b have found duplicate blocks +or bad blocks associated with +inode \fII\fP. +The owner \fIO\fP, mode \fIM\fP, size \fIS\fP, +and modify time \fIT\fP of inode \fII\fP +are printed. +This error cannot arise when the file system is being preen'ed, +as it would have caused a fatal error earlier. +.LP +Possible responses to the CLEAR prompt are: +.IP YES +de-allocate inode \fII\fP by zeroing its contents. +.IP NO +ignore this error condition. +.NH 2 +Phase 5 - Check Cyl groups +.PP +This phase concerns itself with the free-block and used-inode maps. +This section lists error conditions resulting from +allocated blocks in the free-block maps, +free blocks missing from free-block maps, +and the total free-block count incorrect. +It also lists error conditions resulting from +free inodes in the used-inode maps, +allocated inodes missing from used-inode maps, +and the total used-inode count incorrect. +.sp +.LP +.B "CG \fIC\fP: BAD MAGIC NUMBER" +.br +The magic number of cylinder group \fIC\fP is wrong. +This usually indicates that the cylinder group maps have been destroyed. +When running manually the cylinder group is marked as needing +to be reconstructed. +This error is fatal if the file system is being preen'ed. +.sp +.LP +.B "BLK(S) MISSING IN BIT MAPS (SALVAGE)" +.br +A cylinder group block map is missing some free blocks. +During preen'ing the maps are reconstructed. +.LP +Possible responses to the SALVAGE prompt are: +.IP YES +reconstruct the free block map. +.IP NO +ignore this error condition. +.sp +.LP +.B "SUMMARY INFORMATION BAD (SALVAGE)" +.br +The summary information was found to be incorrect. +When preen'ing, +the summary information is recomputed. +.LP +Possible responses to the SALVAGE prompt are: +.IP YES +reconstruct the summary information. +.IP NO +ignore this error condition. +.sp +.LP +.B "FREE BLK COUNT(S) WRONG IN SUPERBLOCK (SALVAGE)" +.br +The superblock free block information was found to be incorrect. +When preen'ing, +the superblock free block information is recomputed. +.LP +Possible responses to the SALVAGE prompt are: +.IP YES +reconstruct the superblock free block information. +.IP NO +ignore this error condition. +.NH 2 +Cleanup +.PP +Once a file system has been checked, a few cleanup functions are performed. +This section lists advisory messages about +the file system +and modify status of the file system. +.sp +.LP +.B "\fIV\fP files, \fIW\fP used, \fIX\fP free (\fIY\fP frags, \fIZ\fP blocks)" +.br +This is an advisory message indicating that +the file system checked contained +\fIV\fP files using +\fIW\fP fragment sized blocks leaving +\fIX\fP fragment sized blocks free in the file system. +The numbers in parenthesis breaks the free count down into +\fIY\fP free fragments and +\fIZ\fP free full sized blocks. +.sp +.LP +.B "***** REBOOT UNIX *****" +.br +This is an advisory message indicating that +the root file system has been modified by +.I fsck_ffs. +If UNIX is not rebooted immediately, +the work done by +.I fsck_ffs +may be undone by the in-core copies of tables +UNIX keeps. +When preen'ing, +.I fsck_ffs +will exit with a code of 4. +The standard auto-reboot script distributed with 4.3BSD +interprets an exit code of 4 by issuing a reboot system call. +.sp +.LP +.B "***** FILE SYSTEM WAS MODIFIED *****" +.br +This is an advisory message indicating that +the current file system was modified by +.I fsck_ffs. +If this file system is mounted or is the current root file system, +.I fsck_ffs +should be halted and UNIX rebooted. +If UNIX is not rebooted immediately, +the work done by +.I fsck_ffs +may be undone by the in-core copies of tables +UNIX keeps. diff --git a/share/doc/smm/03.fsck/Makefile b/share/doc/smm/03.fsck/Makefile new file mode 100644 index 000000000000..59cf82cd2cf4 --- /dev/null +++ b/share/doc/smm/03.fsck/Makefile @@ -0,0 +1,5 @@ +VOLUME= smm/03.fsck +SRCS= 0.t 1.t 2.t 3.t 4.t +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/smm/04.quotas/Makefile b/share/doc/smm/04.quotas/Makefile new file mode 100644 index 000000000000..e9f7ac58710b --- /dev/null +++ b/share/doc/smm/04.quotas/Makefile @@ -0,0 +1,5 @@ +VOLUME= smm/04.quotas +SRCS= quotas.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/smm/04.quotas/quotas.ms b/share/doc/smm/04.quotas/quotas.ms new file mode 100644 index 000000000000..15ce42daea00 --- /dev/null +++ b/share/doc/smm/04.quotas/quotas.ms @@ -0,0 +1,312 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.EH 'SMM:4-%''Disc Quotas in a \s-2UNIX\s+2 Environment' +.OH 'Disc Quotas in a \s-2UNIX\s+2 Environment''SMM:4-%' +.ND 5th July, 1983 +.TL +Disc Quotas in a \s-2UNIX\s+2\s-3\u*\d\s0 Environment +.FS +* UNIX is a trademark of Bell Laboratories. +.FE +.AU +Robert Elz +.AI +Department of Computer Science +University of Melbourne, +Parkville, +Victoria, +Australia. +.AB +.PP +In most computing environments, disc space is not +infinite. +The disc quota system provides a mechanism +to control usage of disc space, on an +individual basis. +.PP +Quotas may be set for each individual user, on any, or +all filesystems. +.PP +The quota system will warn users when they +exceed their allotted limit, but allow some +extra space for current work. +Repeatedly remaining over quota at logout, +will cause a fatal over quota condition eventually. +.PP +The quota system is an optional part of +\s-2VMUNIX\s0 that may be included when the +system is configured. +.AE +.NH 1 +Users' view of disc quotas +.PP +To most users, disc quotas will either be of no concern, +or a fact of life that cannot be avoided. +The +\fIquota\fP\|(1) +command will provide information on any disc quotas +that may have been imposed upon a user. +.PP +There are two individual possible quotas that may be +imposed, usually if one is, both will be. +A limit can be set on the amount of space a user +can occupy, and there may be a limit on the number +of files (inodes) he can own. +.PP +.I Quota +provides information on the quotas that have +been set by the system administrators, in each +of these areas, and current usage. +.PP +There are four numbers for each limit, the current +usage, soft limit (quota), hard limit, and number +of remaining login warnings. +The soft limit is the number of 1K blocks (or files) +that the user is expected to remain below. +Each time the user's usage goes past this limit, +he will be warned. +The hard limit cannot be exceeded. +If a user's usage reaches this number, further +requests for space (or attempts to create a file) +will fail with an EDQUOT error, and the first time +this occurs, a message will be written to the user's +terminal. +Only one message will be output, until space occupied +is reduced below the limit, and reaches it again, +in order to avoid continual noise from those +programs that ignore write errors. +.PP +Whenever a user logs in with a usage greater than +his soft limit, he will be warned, and his login +warning count decremented. +When he logs in under quota, the counter is reset +to its maximum value (which is a system configuration +parameter, that is typically 3). +If the warning count should ever reach zero (caused +by three successive logins over quota), the +particular limit that has been exceeded will be treated +as if the hard limit has been reached, and no +more resources will be allocated to the user. +The \fBonly\fP way to reset this condition is +to reduce usage below quota, then log in again. +.NH 2 +Surviving when quota limit is reached +.PP +In most cases, the only way to recover from over +quota conditions, is to abort whatever activity was in progress +on the filesystem that has reached its limit, remove +sufficient files to bring the limit back below quota, +and retry the failed program. +.PP +However, if you are in the editor and a write fails +because of an over quota situation, that is not +a suitable course of action, as it is most likely +that initially attempting to write the file +will have truncated its previous contents, so should +the editor be aborted without correctly writing the +file not only will the recent changes be lost, but +possibly much, or even all, of the data +that previously existed. +.PP +There are several possible safe exits for a user +caught in this situation. +He may use the editor \fB!\fP shell escape command to +examine his file space, and remove surplus files. +Alternatively, using \fIcsh\fP, he may suspend the +editor, remove some files, then resume it. +A third possibility, is to write the file to +some other filesystem (perhaps to a file on /tmp) +where the user's quota has not been exceeded. +Then after rectifying the quota situation, +the file can be moved back to the filesystem +it belongs on. +.NH 1 +Administering the quota system +.PP +To set up and establish the disc quota system, +there are several steps necessary to be performed +by the system administrator. +.PP +First, the system must be configured to include +the disc quota sub-system. +This is done by including the line: +.DS +options QUOTA +.DE +in the system configuration file, then running +\fIconfig\fP\|(8) +followed by a system configuration\s-3\u*\d\s0. +.FS +* See also the document ``Building 4.2BSD UNIX Systems with Config''. +.FE +.PP +Second, a decision as to what filesystems need to have +quotas applied needs to be made. +Usually, only filesystems that house users' home directories, +or other user files, will need to be subjected to +the quota system, though it may also prove useful to +also include \fB/usr\fR. +If possible, \fB/tmp\fP should usually be free of quotas. +.PP +Having decided on which filesystems quotas need to be +set upon, the administrator should then allocate the +available space amongst the competing needs. How this +should be done is (way) beyond the scope of this document. +.PP +Then, the +\fIedquota\fP\|(8) +command can be used to actually set the limits desired upon +each user. Where a number of users are to be given the +same quotas (a common occurrence) the \fB\-p\fP switch +to edquota will allow this to be easily accomplished. +.PP +Once the quotas are set, ready to operate, the system +must be informed to enforce quotas on the desired filesystems. +This is accomplished with the +\fIquotaon\fP\|(8) +command. +.I Quotaon +will either enable quotas for a particular filesystem, or +with the \fB\-a\fP switch, will enable quotas for each +filesystem indicated in \fB/etc/fstab\fP as using quotas. +See +\fIfstab\fP\|(5) +for details. +Most sites using the quota system, will include the +line +.DS C +/etc/quotaon -a +.DE +in \fB/etc/rc.local\fP. +.PP +Should quotas need to be disabled, the +\fIquotaoff\fP(8) +command will do that, however, should the filesystem be +about to be dismounted, the +\fIumount\fP\|(8) +command will disable quotas immediately before the +filesystem is unmounted. +This is actually an effect of the +\fIumount\fP\|(2) +system call, and it guarantees that the quota system +will not be disabled if the umount would fail +because the filesystem is not idle. +.PP +Periodically (certainly after each reboot, and when quotas +are first enabled for a filesystem), the records retained +in the quota file should be checked for consistency with +the actual number of blocks and files allocated to +the user. +The +\fIquotacheck\fP\|(8) +command can be used to accomplish this. +It is not necessary to dismount the filesystem, or disable +the quota system to run this command, though on +active filesystems inaccurate results may occur. +This does no real harm in most cases, another run of +.I quotacheck +when the filesystem is idle will certainly correct any inaccuracy. +.PP +The super-user may use the +\fIquota\fP\|(1) +command to examine the usage and quotas of any user, and +the +\fIrepquota\fP\|(8) +command may be used to check the usages and limits for +all users on a filesystem. +.NH 1 +Some implementation detail. +.PP +Disc quota usage and information is stored in a file on the +filesystem that the quotas are to be applied to. +Conventionally, this file is \fBquotas\fR in the root of +the filesystem. +While this name is not known to the system in any way, +several of the user level utilities "know" it, and +choosing any other name would not be wise. +.PP +The data in the file comprises an array of structures, indexed +by uid, one structure for each user on the system (whether +the user has a quota on this filesystem or not). +If the uid space is sparse, then the file may have holes +in it, which would be lost by copying, so it is best to +avoid this. +.PP +The system is informed of the existence of the quota +file by the +\fIsetquota\fP\|(2) +system call. +It then reads the quota entries for each user currently +active, then for any files open owned by users who +are not currently active. +Each subsequent open of a file on the filesystem, will +be accompanied by a pairing with its quota information. +In most cases this information will be retained in core, +either because the user who owns the file is running some +process, because other files are open owned by the same +user, or because some file (perhaps this one) was recently +accessed. +In memory, the quota information is kept hashed by user-id +and filesystem, and retained in an LRU chain so recently +released data can be easily reclaimed. +Information about those users whose last process has +recently terminated is also retained in this way. +.PP +Each time a block is accessed or released, and each time an inode +is allocated or freed, the quota system gets told +about it, and in the case of allocations, gets the +opportunity to object. +.PP +Measurements have shown +that the quota code uses a very small percentage of the system +cpu time consumed in writing a new block to disc. +.NH 1 +Acknowledgments +.PP +The current disc quota system is loosely based upon a very +early scheme implemented at the University of New South +Wales, and Sydney University in the mid 70's. That system +implemented a single combined limit for both files and blocks +on all filesystems. +.PP +A later system was implemented at the University of Melbourne +by the author, but was not kept highly accurately, eg: +chown's (etc) did not affect quotas, nor did i/o to a file +other than one owned by the instigator. +.PP +The current system has been running (with only minor modifications) +since January 82 at Melbourne. +It is actually just a small part of a much broader resource +control scheme, which is capable of controlling almost +anything that is usually uncontrolled in unix. The rest +of this is, as yet, still in a state where it is far too +subject to change to be considered for distribution. +.PP +For the 4.2BSD release, much work has been done to clean +up and sanely incorporate the quota code by Sam Leffler and +Kirk McKusick at The University of California at Berkeley. diff --git a/share/doc/smm/05.fastfs/0.t b/share/doc/smm/05.fastfs/0.t new file mode 100644 index 000000000000..bd6d4c74225f --- /dev/null +++ b/share/doc/smm/05.fastfs/0.t @@ -0,0 +1,153 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.EQ +delim $$ +.EN +.if n .ND +.TL +A Fast File System for UNIX* +.EH 'SMM:05-%''A Fast File System for \s-2UNIX\s+2' +.OH 'A Fast File System for \s-2UNIX\s+2''SMM:05-%' +.AU +Marshall Kirk McKusick, William N. Joy\(dg, +Samuel J. Leffler\(dd, Robert S. Fabry +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.AB +.FS +* UNIX is a trademark of Bell Laboratories. +.FE +.FS +\(dg William N. Joy is currently employed by: +Sun Microsystems, Inc, 2550 Garcia Avenue, Mountain View, CA 94043 +.FE +.FS +\(dd Samuel J. Leffler is currently employed by: +Lucasfilm Ltd., PO Box 2009, San Rafael, CA 94912 +.FE +.FS +This work was done under grants from +the National Science Foundation under grant MCS80-05144, +and the Defense Advance Research Projects Agency (DoD) under +ARPA Order No. 4031 monitored by Naval Electronic System Command under +Contract No. N00039-82-C-0235. +.FE +A reimplementation of the UNIX file system is described. +The reimplementation provides substantially higher throughput +rates by using more flexible allocation policies +that allow better locality of reference and can +be adapted to a wide range of peripheral and processor characteristics. +The new file system clusters data that is sequentially accessed +and provides two block sizes to allow fast access to large files +while not wasting large amounts of space for small files. +File access rates of up to ten times faster than the traditional +UNIX file system are experienced. +Long needed enhancements to the programmers' +interface are discussed. +These include a mechanism to place advisory locks on files, +extensions of the name space across file systems, +the ability to use long file names, +and provisions for administrative control of resource usage. +.sp +.LP +Revised February 18, 1984 +.AE +.LP +.sp 2 +CR Categories and Subject Descriptors: +D.4.3 +.B "[Operating Systems]": +File Systems Management \- +.I "file organization, directory structures, access methods"; +D.4.2 +.B "[Operating Systems]": +Storage Management \- +.I "allocation/deallocation strategies, secondary storage devices"; +D.4.8 +.B "[Operating Systems]": +Performance \- +.I "measurements, operational analysis"; +H.3.2 +.B "[Information Systems]": +Information Storage \- +.I "file organization" +.sp +Additional Keywords and Phrases: +UNIX, +file system organization, +file system performance, +file system design, +application program interface. +.sp +General Terms: +file system, +measurement, +performance. +.bp +.ce +.B "TABLE OF CONTENTS" +.LP +.sp 1 +.nf +.B "1. Introduction" +.LP +.sp .5v +.nf +.B "2. Old file system +.LP +.sp .5v +.nf +.B "3. New file system organization +3.1. Optimizing storage utilization +3.2. File system parameterization +3.3. Layout policies +.LP +.sp .5v +.nf +.B "4. Performance +.LP +.sp .5v +.nf +.B "5. File system functional enhancements +5.1. Long file names +5.2. File locking +5.3. Symbolic links +5.4. Rename +5.5. Quotas +.LP +.sp .5v +.nf +.B Acknowledgements +.LP +.sp .5v +.nf +.B References diff --git a/share/doc/smm/05.fastfs/1.t b/share/doc/smm/05.fastfs/1.t new file mode 100644 index 000000000000..a85b5e8b2971 --- /dev/null +++ b/share/doc/smm/05.fastfs/1.t @@ -0,0 +1,106 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds RH Introduction +.NH +Introduction +.PP +This paper describes the changes from the original 512 byte UNIX file +system to the new one released with the 4.2 Berkeley Software Distribution. +It presents the motivations for the changes, +the methods used to effect these changes, +the rationale behind the design decisions, +and a description of the new implementation. +This discussion is followed by a summary of +the results that have been obtained, +directions for future work, +and the additions and changes +that have been made to the facilities that are +available to programmers. +.PP +The original UNIX system that runs on the PDP-11\(dg +.FS +\(dg DEC, PDP, VAX, MASSBUS, and UNIBUS are +trademarks of Digital Equipment Corporation. +.FE +has simple and elegant file system facilities. File system input/output +is buffered by the kernel; +there are no alignment constraints on +data transfers and all operations are made to appear synchronous. +All transfers to the disk are in 512 byte blocks, which can be placed +arbitrarily within the data area of the file system. Virtually +no constraints other than available disk space are placed on file growth +[Ritchie74], [Thompson78].* +.FS +* In practice, a file's size is constrained to be less than about +one gigabyte. +.FE +.PP +When used on the VAX-11 together with other UNIX enhancements, +the original 512 byte UNIX file +system is incapable of providing the data throughput rates +that many applications require. +For example, +applications +such as VLSI design and image processing +do a small amount of processing +on a large quantities of data and +need to have a high throughput from the file system. +High throughput rates are also needed by programs +that map files from the file system into large virtual +address spaces. +Paging data in and out of the file system is likely +to occur frequently [Ferrin82b]. +This requires a file system providing +higher bandwidth than the original 512 byte UNIX +one that provides only about +two percent of the maximum disk bandwidth or about +20 kilobytes per second per arm [White80], [Smith81b]. +.PP +Modifications have been made to the UNIX file system to improve +its performance. +Since the UNIX file system interface +is well understood and not inherently slow, +this development retained the abstraction and simply changed +the underlying implementation to increase its throughput. +Consequently, users of the system have not been faced with +massive software conversion. +.PP +Problems with file system performance have been dealt with +extensively in the literature; see [Smith81a] for a survey. +Previous work to improve the UNIX file system performance has been +done by [Ferrin82a]. +The UNIX operating system drew many of its ideas from Multics, +a large, high performance operating system [Feiertag71]. +Other work includes Hydra [Almes78], +Spice [Thompson80], +and a file system for a LISP environment [Symbolics81]. +A good introduction to the physical latencies of disks is +described in [Pechura83]. +.ds RH Old file system +.sp 2 +.ne 1i diff --git a/share/doc/smm/05.fastfs/2.t b/share/doc/smm/05.fastfs/2.t new file mode 100644 index 000000000000..e3a2149bac84 --- /dev/null +++ b/share/doc/smm/05.fastfs/2.t @@ -0,0 +1,137 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds RH Old file system +.NH +Old File System +.PP +In the file system developed at Bell Laboratories +(the ``traditional'' file system), +each disk drive is divided into one or more +partitions. Each of these disk partitions may contain +one file system. A file system never spans multiple +partitions.\(dg +.FS +\(dg By ``partition'' here we refer to the subdivision of +physical space on a disk drive. In the traditional file +system, as in the new file system, file systems are really +located in logical disk partitions that may overlap. This +overlapping is made available, for example, +to allow programs to copy entire disk drives containing multiple +file systems. +.FE +A file system is described by its super-block, +which contains the basic parameters of the file system. +These include the number of data blocks in the file system, +a count of the maximum number of files, +and a pointer to the \fIfree list\fP, a linked +list of all the free blocks in the file system. +.PP +Within the file system are files. +Certain files are distinguished as directories and contain +pointers to files that may themselves be directories. +Every file has a descriptor associated with it called an +.I "inode". +An inode contains information describing ownership of the file, +time stamps marking last modification and access times for the file, +and an array of indices that point to the data blocks for the file. +For the purposes of this section, we assume that the first 8 blocks +of the file are directly referenced by values stored +in an inode itself*. +.FS +* The actual number may vary from system to system, but is usually in +the range 5-13. +.FE +An inode may also contain references to indirect blocks +containing further data block indices. +In a file system with a 512 byte block size, a singly indirect +block contains 128 further block addresses, +a doubly indirect block contains 128 addresses of further singly indirect +blocks, +and a triply indirect block contains 128 addresses of further doubly indirect +blocks. +.PP +A 150 megabyte traditional UNIX file system consists +of 4 megabytes of inodes followed by 146 megabytes of data. +This organization segregates the inode information from the data; +thus accessing a file normally incurs a long seek from the +file's inode to its data. +Files in a single directory are not typically allocated +consecutive slots in the 4 megabytes of inodes, +causing many non-consecutive blocks of inodes +to be accessed when executing +operations on the inodes of several files in a directory. +.PP +The allocation of data blocks to files is also suboptimum. +The traditional +file system never transfers more than 512 bytes per disk transaction +and often finds that the next sequential data block is not on the same +cylinder, forcing seeks between 512 byte transfers. +The combination of the small block size, +limited read-ahead in the system, +and many seeks severely limits file system throughput. +.PP +The first work at Berkeley on the UNIX file system attempted to improve both +reliability and throughput. +The reliability was improved by staging modifications +to critical file system information so that they could +either be completed or repaired cleanly by a program +after a crash [Kowalski78]. +The file system performance was improved by a factor of more than two by +changing the basic block size from 512 to 1024 bytes. +The increase was because of two factors: +each disk transfer accessed twice as much data, +and most files could be described without need to access +indirect blocks since the direct blocks contained twice as much data. +The file system with these changes will henceforth be referred to as the +.I "old file system." +.PP +This performance improvement gave a strong indication that +increasing the block size was a good method for improving +throughput. +Although the throughput had doubled, +the old file system was still using only about +four percent of the disk bandwidth. +The main problem was that although the free list was initially +ordered for optimal access, +it quickly became scrambled as files were created and removed. +Eventually the free list became entirely random, +causing files to have their blocks allocated randomly over the disk. +This forced a seek before every block access. +Although old file systems provided transfer rates of up +to 175 kilobytes per second when they were first created, +this rate deteriorated to 30 kilobytes per second after a +few weeks of moderate use because of this +randomization of data block placement. +There was no way of restoring the performance of an old file system +except to dump, rebuild, and restore the file system. +Another possibility, as suggested by [Maruyama76], +would be to have a process that periodically +reorganized the data on the disk to restore locality. +.ds RH New file system +.sp 2 +.ne 1i diff --git a/share/doc/smm/05.fastfs/3.t b/share/doc/smm/05.fastfs/3.t new file mode 100644 index 000000000000..c51def302297 --- /dev/null +++ b/share/doc/smm/05.fastfs/3.t @@ -0,0 +1,590 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds RH New file system +.NH +New file system organization +.PP +In the new file system organization (as in the +old file system organization), +each disk drive contains one or more file systems. +A file system is described by its super-block, +located at the beginning of the file system's disk partition. +Because the super-block contains critical data, +it is replicated to protect against catastrophic loss. +This is done when the file system is created; +since the super-block data does not change, +the copies need not be referenced unless a head crash +or other hard disk error causes the default super-block +to be unusable. +.PP +To insure that it is possible to create files as large as +.if n 2 ** 32 +.if t $2 sup 32$ +bytes with only two levels of indirection, +the minimum size of a file system block is 4096 bytes. +The size of file system blocks can be any power of two +greater than or equal to 4096. +The block size of a file system is recorded in the +file system's super-block +so it is possible for file systems with different block sizes +to be simultaneously accessible on the same system. +The block size must be decided at the time that +the file system is created; +it cannot be subsequently changed without rebuilding the file system. +.PP +The new file system organization divides a disk partition +into one or more areas called +.I "cylinder groups". +A cylinder group is comprised of one or more consecutive +cylinders on a disk. +Associated with each cylinder group is some bookkeeping information +that includes a redundant copy of the super-block, +space for inodes, +a bit map describing available blocks in the cylinder group, +and summary information describing the usage of data blocks +within the cylinder group. +The bit map of available blocks in the cylinder group replaces +the traditional file system's free list. +For each cylinder group a static number of inodes +is allocated at file system creation time. +The default policy is to allocate one inode for each 2048 +bytes of space in the cylinder group, expecting this +to be far more than will ever be needed. +.PP +All the cylinder group bookkeeping information could be +placed at the beginning of each cylinder group. +However if this approach were used, +all the redundant information would be on the top platter. +A single hardware failure that destroyed the top platter +could cause the loss of all redundant copies of the super-block. +Thus the cylinder group bookkeeping information +begins at a varying offset from the beginning of the cylinder group. +The offset for each successive cylinder group is calculated to be +about one track further from the beginning of the cylinder group +than the preceding cylinder group. +In this way the redundant +information spirals down into the pack so that any single track, cylinder, +or platter can be lost without losing all copies of the super-block. +Except for the first cylinder group, +the space between the beginning of the cylinder group +and the beginning of the cylinder group information +is used for data blocks.\(dg +.FS +\(dg While it appears that the first cylinder group could be laid +out with its super-block at the ``known'' location, +this would not work for file systems +with blocks sizes of 16 kilobytes or greater. +This is because of a requirement that the first 8 kilobytes of the disk +be reserved for a bootstrap program and a separate requirement that +the cylinder group information begin on a file system block boundary. +To start the cylinder group on a file system block boundary, +file systems with block sizes larger than 8 kilobytes +would have to leave an empty space between the end of +the boot block and the beginning of the cylinder group. +Without knowing the size of the file system blocks, +the system would not know what roundup function to use +to find the beginning of the first cylinder group. +.FE +.NH 2 +Optimizing storage utilization +.PP +Data is laid out so that larger blocks can be transferred +in a single disk transaction, greatly increasing file system throughput. +As an example, consider a file in the new file system +composed of 4096 byte data blocks. +In the old file system this file would be composed of 1024 byte blocks. +By increasing the block size, disk accesses in the new file +system may transfer up to four times as much information per +disk transaction. +In large files, several +4096 byte blocks may be allocated from the same cylinder so that +even larger data transfers are possible before requiring a seek. +.PP +The main problem with +larger blocks is that most UNIX +file systems are composed of many small files. +A uniformly large block size wastes space. +Table 1 shows the effect of file system +block size on the amount of wasted space in the file system. +The files measured to obtain these figures reside on +one of our time sharing +systems that has roughly 1.2 gigabytes of on-line storage. +The measurements are based on the active user file systems containing +about 920 megabytes of formatted space. +.KF +.DS B +.TS +box; +l|l|l +a|n|l. +Space used % waste Organization +_ +775.2 Mb 0.0 Data only, no separation between files +807.8 Mb 4.2 Data only, each file starts on 512 byte boundary +828.7 Mb 6.9 Data + inodes, 512 byte block UNIX file system +866.5 Mb 11.8 Data + inodes, 1024 byte block UNIX file system +948.5 Mb 22.4 Data + inodes, 2048 byte block UNIX file system +1128.3 Mb 45.6 Data + inodes, 4096 byte block UNIX file system +.TE +Table 1 \- Amount of wasted space as a function of block size. +.DE +.KE +The space wasted is calculated to be the percentage of space +on the disk not containing user data. +As the block size on the disk +increases, the waste rises quickly, to an intolerable +45.6% waste with 4096 byte file system blocks. +.PP +To be able to use large blocks without undue waste, +small files must be stored in a more efficient way. +The new file system accomplishes this goal by allowing the division +of a single file system block into one or more +.I "fragments". +The file system fragment size is specified +at the time that the file system is created; +each file system block can optionally be broken into +2, 4, or 8 fragments, each of which is addressable. +The lower bound on the size of these fragments is constrained +by the disk sector size, +typically 512 bytes. +The block map associated with each cylinder group +records the space available in a cylinder group +at the fragment level; +to determine if a block is available, aligned fragments are examined. +Figure 1 shows a piece of a map from a 4096/1024 file system. +.KF +.DS B +.TS +box; +l|c c c c. +Bits in map XXXX XXOO OOXX OOOO +Fragment numbers 0-3 4-7 8-11 12-15 +Block numbers 0 1 2 3 +.TE +Figure 1 \- Example layout of blocks and fragments in a 4096/1024 file system. +.DE +.KE +Each bit in the map records the status of a fragment; +an ``X'' shows that the fragment is in use, +while an ``O'' shows that the fragment is available for allocation. +In this example, +fragments 0\-5, 10, and 11 are in use, +while fragments 6\-9, and 12\-15 are free. +Fragments of adjoining blocks cannot be used as a full block, +even if they are large enough. +In this example, +fragments 6\-9 cannot be allocated as a full block; +only fragments 12\-15 can be coalesced into a full block. +.PP +On a file system with a block size of 4096 bytes +and a fragment size of 1024 bytes, +a file is represented by zero or more 4096 byte blocks of data, +and possibly a single fragmented block. +If a file system block must be fragmented to obtain +space for a small amount of data, +the remaining fragments of the block are made +available for allocation to other files. +As an example consider an 11000 byte file stored on +a 4096/1024 byte file system. +This file would uses two full size blocks and one +three fragment portion of another block. +If no block with three aligned fragments is +available at the time the file is created, +a full size block is split yielding the necessary +fragments and a single unused fragment. +This remaining fragment can be allocated to another file as needed. +.PP +Space is allocated to a file when a program does a \fIwrite\fP +system call. +Each time data is written to a file, the system checks to see if +the size of the file has increased*. +.FS +* A program may be overwriting data in the middle of an existing file +in which case space would already have been allocated. +.FE +If the file needs to be expanded to hold the new data, +one of three conditions exists: +.IP 1) +There is enough space left in an already allocated +block or fragment to hold the new data. +The new data is written into the available space. +.IP 2) +The file contains no fragmented blocks (and the last +block in the file +contains insufficient space to hold the new data). +If space exists in a block already allocated, +the space is filled with new data. +If the remainder of the new data contains more than +a full block of data, a full block is allocated and +the first full block of new data is written there. +This process is repeated until less than a full block +of new data remains. +If the remaining new data to be written will +fit in less than a full block, +a block with the necessary fragments is located, +otherwise a full block is located. +The remaining new data is written into the located space. +.IP 3) +The file contains one or more fragments (and the +fragments contain insufficient space to hold the new data). +If the size of the new data plus the size of the data +already in the fragments exceeds the size of a full block, +a new block is allocated. +The contents of the fragments are copied +to the beginning of the block +and the remainder of the block is filled with new data. +The process then continues as in (2) above. +Otherwise, if the new data to be written will +fit in less than a full block, +a block with the necessary fragments is located, +otherwise a full block is located. +The contents of the existing fragments +appended with the new data +are written into the allocated space. +.PP +The problem with expanding a file one fragment at a +a time is that data may be copied many times as a +fragmented block expands to a full block. +Fragment reallocation can be minimized +if the user program writes a full block at a time, +except for a partial block at the end of the file. +Since file systems with different block sizes may reside on +the same system, +the file system interface has been extended to provide +application programs the optimal size for a read or write. +For files the optimal size is the block size of the file system +on which the file is being accessed. +For other objects, such as pipes and sockets, +the optimal size is the underlying buffer size. +This feature is used by the Standard +Input/Output Library, +a package used by most user programs. +This feature is also used by +certain system utilities such as archivers and loaders +that do their own input and output management +and need the highest possible file system bandwidth. +.PP +The amount of wasted space in the 4096/1024 byte new file system +organization is empirically observed to be about the same as in the +1024 byte old file system organization. +A file system with 4096 byte blocks and 512 byte fragments +has about the same amount of wasted space as the 512 byte +block UNIX file system. +The new file system uses less space +than the 512 byte or 1024 byte +file systems for indexing information for +large files and the same amount of space +for small files. +These savings are offset by the need to use +more space for keeping track of available free blocks. +The net result is about the same disk utilization +when a new file system's fragment size +equals an old file system's block size. +.PP +In order for the layout policies to be effective, +a file system cannot be kept completely full. +For each file system there is a parameter, termed +the free space reserve, that +gives the minimum acceptable percentage of file system +blocks that should be free. +If the number of free blocks drops below this level +only the system administrator can continue to allocate blocks. +The value of this parameter may be changed at any time, +even when the file system is mounted and active. +The transfer rates that appear in section 4 were measured on file +systems kept less than 90% full (a reserve of 10%). +If the number of free blocks falls to zero, +the file system throughput tends to be cut in half, +because of the inability of the file system to localize +blocks in a file. +If a file system's performance degrades because +of overfilling, it may be restored by removing +files until the amount of free space once again +reaches the minimum acceptable level. +Access rates for files created during periods of little +free space may be restored by moving their data once enough +space is available. +The free space reserve must be added to the +percentage of waste when comparing the organizations given +in Table 1. +Thus, the percentage of waste in +an old 1024 byte UNIX file system is roughly +comparable to a new 4096/512 byte file system +with the free space reserve set at 5%. +(Compare 11.8% wasted with the old file system +to 6.9% waste + 5% reserved space in the +new file system.) +.NH 2 +File system parameterization +.PP +Except for the initial creation of the free list, +the old file system ignores the parameters of the underlying hardware. +It has no information about either the physical characteristics +of the mass storage device, +or the hardware that interacts with it. +A goal of the new file system is to parameterize the +processor capabilities and +mass storage characteristics +so that blocks can be allocated in an +optimum configuration-dependent way. +Parameters used include the speed of the processor, +the hardware support for mass storage transfers, +and the characteristics of the mass storage devices. +Disk technology is constantly improving and +a given installation can have several different disk technologies +running on a single processor. +Each file system is parameterized so that it can be +adapted to the characteristics of the disk on which +it is placed. +.PP +For mass storage devices such as disks, +the new file system tries to allocate new blocks +on the same cylinder as the previous block in the same file. +Optimally, these new blocks will also be +rotationally well positioned. +The distance between ``rotationally optimal'' blocks varies greatly; +it can be a consecutive block +or a rotationally delayed block +depending on system characteristics. +On a processor with an input/output channel that does not require +any processor intervention between mass storage transfer requests, +two consecutive disk blocks can often be accessed +without suffering lost time because of an intervening disk revolution. +For processors without input/output channels, +the main processor must field an interrupt and +prepare for a new disk transfer. +The expected time to service this interrupt and +schedule a new disk transfer depends on the +speed of the main processor. +.PP +The physical characteristics of each disk include +the number of blocks per track and the rate at which +the disk spins. +The allocation routines use this information to calculate +the number of milliseconds required to skip over a block. +The characteristics of the processor include +the expected time to service an interrupt and schedule a +new disk transfer. +Given a block allocated to a file, +the allocation routines calculate the number of blocks to +skip over so that the next block in the file will +come into position under the disk head in the expected +amount of time that it takes to start a new +disk transfer operation. +For programs that sequentially access large amounts of data, +this strategy minimizes the amount of time spent waiting for +the disk to position itself. +.PP +To ease the calculation of finding rotationally optimal blocks, +the cylinder group summary information includes +a count of the available blocks in a cylinder +group at different rotational positions. +Eight rotational positions are distinguished, +so the resolution of the +summary information is 2 milliseconds for a typical 3600 +revolution per minute drive. +The super-block contains a vector of lists called +.I "rotational layout tables". +The vector is indexed by rotational position. +Each component of the vector +lists the index into the block map for every data block contained +in its rotational position. +When looking for an allocatable block, +the system first looks through the summary counts for a rotational +position with a non-zero block count. +It then uses the index of the rotational position to find the appropriate +list to use to index through +only the relevant parts of the block map to find a free block. +.PP +The parameter that defines the +minimum number of milliseconds between the completion of a data +transfer and the initiation of +another data transfer on the same cylinder +can be changed at any time, +even when the file system is mounted and active. +If a file system is parameterized to lay out blocks with +a rotational separation of 2 milliseconds, +and the disk pack is then moved to a system that has a +processor requiring 4 milliseconds to schedule a disk operation, +the throughput will drop precipitously because of lost disk revolutions +on nearly every block. +If the eventual target machine is known, +the file system can be parameterized for it +even though it is initially created on a different processor. +Even if the move is not known in advance, +the rotational layout delay can be reconfigured after the disk is moved +so that all further allocation is done based on the +characteristics of the new host. +.NH 2 +Layout policies +.PP +The file system layout policies are divided into two distinct parts. +At the top level are global policies that use file system +wide summary information to make decisions regarding +the placement of new inodes and data blocks. +These routines are responsible for deciding the +placement of new directories and files. +They also calculate rotationally optimal block layouts, +and decide when to force a long seek to a new cylinder group +because there are insufficient blocks left +in the current cylinder group to do reasonable layouts. +Below the global policy routines are +the local allocation routines that use a locally optimal scheme to +lay out data blocks. +.PP +Two methods for improving file system performance are to increase +the locality of reference to minimize seek latency +as described by [Trivedi80], and +to improve the layout of data to make larger transfers possible +as described by [Nevalainen77]. +The global layout policies try to improve performance +by clustering related information. +They cannot attempt to localize all data references, +but must also try to spread unrelated data +among different cylinder groups. +If too much localization is attempted, +the local cylinder group may run out of space +forcing the data to be scattered to non-local cylinder groups. +Taken to an extreme, +total localization can result in a single huge cluster of data +resembling the old file system. +The global policies try to balance the two conflicting +goals of localizing data that is concurrently accessed +while spreading out unrelated data. +.PP +One allocatable resource is inodes. +Inodes are used to describe both files and directories. +Inodes of files in the same directory are frequently accessed together. +For example, the ``list directory'' command often accesses +the inode for each file in a directory. +The layout policy tries to place all the inodes of +files in a directory in the same cylinder group. +To ensure that files are distributed throughout the disk, +a different policy is used for directory allocation. +A new directory is placed in a cylinder group that has a greater +than average number of free inodes, +and the smallest number of directories already in it. +The intent of this policy is to allow the inode clustering policy +to succeed most of the time. +The allocation of inodes within a cylinder group is done using a +next free strategy. +Although this allocates the inodes randomly within a cylinder group, +all the inodes for a particular cylinder group can be read with +8 to 16 disk transfers. +(At most 16 disk transfers are required because a cylinder +group may have no more than 2048 inodes.) +This puts a small and constant upper bound on the number of +disk transfers required to access the inodes +for all the files in a directory. +In contrast, the old file system typically requires +one disk transfer to fetch the inode for each file in a directory. +.PP +The other major resource is data blocks. +Since data blocks for a file are typically accessed together, +the policy routines try to place all data +blocks for a file in the same cylinder group, +preferably at rotationally optimal positions in the same cylinder. +The problem with allocating all the data blocks +in the same cylinder group is that large files will +quickly use up available space in the cylinder group, +forcing a spill over to other areas. +Further, using all the space in a cylinder group +causes future allocations for any file in the cylinder group +to also spill to other areas. +Ideally none of the cylinder groups should ever become completely full. +The heuristic solution chosen is to +redirect block allocation +to a different cylinder group +when a file exceeds 48 kilobytes, +and at every megabyte thereafter.* +.FS +* The first spill over point at 48 kilobytes is the point +at which a file on a 4096 byte block file system first +requires a single indirect block. This appears to be +a natural first point at which to redirect block allocation. +The other spillover points are chosen with the intent of +forcing block allocation to be redirected when a +file has used about 25% of the data blocks in a cylinder group. +In observing the new file system in day to day use, the heuristics appear +to work well in minimizing the number of completely filled +cylinder groups. +.FE +The newly chosen cylinder group is selected from those cylinder +groups that have a greater than average number of free blocks left. +Although big files tend to be spread out over the disk, +a megabyte of data is typically accessible before +a long seek must be performed, +and the cost of one long seek per megabyte is small. +.PP +The global policy routines call local allocation routines with +requests for specific blocks. +The local allocation routines will +always allocate the requested block +if it is free, otherwise it +allocates a free block of the requested size that is +rotationally closest to the requested block. +If the global layout policies had complete information, +they could always request unused blocks and +the allocation routines would be reduced to simple bookkeeping. +However, maintaining complete information is costly; +thus the implementation of the global layout policy +uses heuristics that employ only partial information. +.PP +If a requested block is not available, the local allocator uses +a four level allocation strategy: +.IP 1) +Use the next available block rotationally closest +to the requested block on the same cylinder. It is assumed +here that head switching time is zero. On disk +controllers where this is not the case, it may be possible +to incorporate the time required to switch between disk platters +when constructing the rotational layout tables. This, however, +has not yet been tried. +.IP 2) +If there are no blocks available on the same cylinder, +use a block within the same cylinder group. +.IP 3) +If that cylinder group is entirely full, +quadratically hash the cylinder group number to choose +another cylinder group to look for a free block. +.IP 4) +Finally if the hash fails, apply an exhaustive search +to all cylinder groups. +.PP +Quadratic hash is used because of its speed in finding +unused slots in nearly full hash tables [Knuth75]. +File systems that are parameterized to maintain at least +10% free space rarely use this strategy. +File systems that are run without maintaining any free +space typically have so few free blocks that almost any +allocation is random; +the most important characteristic of +the strategy used under such conditions is that the strategy be fast. +.ds RH Performance +.sp 2 +.ne 1i diff --git a/share/doc/smm/05.fastfs/4.t b/share/doc/smm/05.fastfs/4.t new file mode 100644 index 000000000000..af94c7f51975 --- /dev/null +++ b/share/doc/smm/05.fastfs/4.t @@ -0,0 +1,246 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds RH Performance +.NH +Performance +.PP +Ultimately, the proof of the effectiveness of the +algorithms described in the previous section +is the long term performance of the new file system. +.PP +Our empirical studies have shown that the inode layout policy has +been effective. +When running the ``list directory'' command on a large directory +that itself contains many directories (to force the system +to access inodes in multiple cylinder groups), +the number of disk accesses for inodes is cut by a factor of two. +The improvements are even more dramatic for large directories +containing only files, +disk accesses for inodes being cut by a factor of eight. +This is most encouraging for programs such as spooling daemons that +access many small files, +since these programs tend to flood the +disk request queue on the old file system. +.PP +Table 2 summarizes the measured throughput of the new file system. +Several comments need to be made about the conditions under which these +tests were run. +The test programs measure the rate at which user programs can transfer +data to or from a file without performing any processing on it. +These programs must read and write enough data to +insure that buffering in the +operating system does not affect the results. +They are also run at least three times in succession; +the first to get the system into a known state +and the second two to insure that the +experiment has stabilized and is repeatable. +The tests used and their results are +discussed in detail in [Kridle83]\(dg. +.FS +\(dg A UNIX command that is similar to the reading test that we used is +``cp file /dev/null'', where ``file'' is eight megabytes long. +.FE +The systems were running multi-user but were otherwise quiescent. +There was no contention for either the CPU or the disk arm. +The only difference between the UNIBUS and MASSBUS tests +was the controller. +All tests used an AMPEX Capricorn 330 megabyte Winchester disk. +As Table 2 shows, all file system test runs were on a VAX 11/750. +All file systems had been in production use for at least +a month before being measured. +The same number of system calls were performed in all tests; +the basic system call overhead was a negligible portion of +the total running time of the tests. +.KF +.DS B +.TS +box; +c c|c s s +c c|c c c. +Type of Processor and Read +File System Bus Measured Speed Bandwidth % CPU +_ +old 1024 750/UNIBUS 29 Kbytes/sec 29/983 3% 11% +new 4096/1024 750/UNIBUS 221 Kbytes/sec 221/983 22% 43% +new 8192/1024 750/UNIBUS 233 Kbytes/sec 233/983 24% 29% +new 4096/1024 750/MASSBUS 466 Kbytes/sec 466/983 47% 73% +new 8192/1024 750/MASSBUS 466 Kbytes/sec 466/983 47% 54% +.TE +.ce 1 +Table 2a \- Reading rates of the old and new UNIX file systems. +.TS +box; +c c|c s s +c c|c c c. +Type of Processor and Write +File System Bus Measured Speed Bandwidth % CPU +_ +old 1024 750/UNIBUS 48 Kbytes/sec 48/983 5% 29% +new 4096/1024 750/UNIBUS 142 Kbytes/sec 142/983 14% 43% +new 8192/1024 750/UNIBUS 215 Kbytes/sec 215/983 22% 46% +new 4096/1024 750/MASSBUS 323 Kbytes/sec 323/983 33% 94% +new 8192/1024 750/MASSBUS 466 Kbytes/sec 466/983 47% 95% +.TE +.ce 1 +Table 2b \- Writing rates of the old and new UNIX file systems. +.DE +.KE +.PP +Unlike the old file system, +the transfer rates for the new file system do not +appear to change over time. +The throughput rate is tied much more strongly to the +amount of free space that is maintained. +The measurements in Table 2 were based on a file system +with a 10% free space reserve. +Synthetic work loads suggest that throughput deteriorates +to about half the rates given in Table 2 when the file +systems are full. +.PP +The percentage of bandwidth given in Table 2 is a measure +of the effective utilization of the disk by the file system. +An upper bound on the transfer rate from the disk is calculated +by multiplying the number of bytes on a track by the number +of revolutions of the disk per second. +The bandwidth is calculated by comparing the data rates +the file system is able to achieve as a percentage of this rate. +Using this metric, the old file system is only +able to use about 3\-5% of the disk bandwidth, +while the new file system uses up to 47% +of the bandwidth. +.PP +Both reads and writes are faster in the new system than in the old system. +The biggest factor in this speedup is because of the larger +block size used by the new file system. +The overhead of allocating blocks in the new system is greater +than the overhead of allocating blocks in the old system, +however fewer blocks need to be allocated in the new system +because they are bigger. +The net effect is that the cost per byte allocated is about +the same for both systems. +.PP +In the new file system, the reading rate is always at least +as fast as the writing rate. +This is to be expected since the kernel must do more work when +allocating blocks than when simply reading them. +Note that the write rates are about the same +as the read rates in the 8192 byte block file system; +the write rates are slower than the read rates in the 4096 byte block +file system. +The slower write rates occur because +the kernel has to do twice as many disk allocations per second, +making the processor unable to keep up with the disk transfer rate. +.PP +In contrast the old file system is about 50% +faster at writing files than reading them. +This is because the write system call is asynchronous and +the kernel can generate disk transfer +requests much faster than they can be serviced, +hence disk transfers queue up in the disk buffer cache. +Because the disk buffer cache is sorted by minimum seek distance, +the average seek between the scheduled disk writes is much +less than it would be if the data blocks were written out +in the random disk order in which they are generated. +However when the file is read, +the read system call is processed synchronously so +the disk blocks must be retrieved from the disk in the +non-optimal seek order in which they are requested. +This forces the disk scheduler to do long +seeks resulting in a lower throughput rate. +.PP +In the new system the blocks of a file are more optimally +ordered on the disk. +Even though reads are still synchronous, +the requests are presented to the disk in a much better order. +Even though the writes are still asynchronous, +they are already presented to the disk in minimum seek +order so there is no gain to be had by reordering them. +Hence the disk seek latencies that limited the old file system +have little effect in the new file system. +The cost of allocation is the factor in the new system that +causes writes to be slower than reads. +.PP +The performance of the new file system is currently +limited by memory to memory copy operations +required to move data from disk buffers in the +system's address space to data buffers in the user's +address space. These copy operations account for +about 40% of the time spent performing an input/output operation. +If the buffers in both address spaces were properly aligned, +this transfer could be performed without copying by +using the VAX virtual memory management hardware. +This would be especially desirable when transferring +large amounts of data. +We did not implement this because it would change the +user interface to the file system in two major ways: +user programs would be required to allocate buffers on page boundaries, +and data would disappear from buffers after being written. +.PP +Greater disk throughput could be achieved by rewriting the disk drivers +to chain together kernel buffers. +This would allow contiguous disk blocks to be read +in a single disk transaction. +Many disks used with UNIX systems contain either +32 or 48 512 byte sectors per track. +Each track holds exactly two or three 8192 byte file system blocks, +or four or six 4096 byte file system blocks. +The inability to use contiguous disk blocks +effectively limits the performance +on these disks to less than 50% of the available bandwidth. +If the next block for a file cannot be laid out contiguously, +then the minimum spacing to the next allocatable +block on any platter is between a sixth and a half a revolution. +The implication of this is that the best possible layout without +contiguous blocks uses only half of the bandwidth of any given track. +If each track contains an odd number of sectors, +then it is possible to resolve the rotational delay to any number of sectors +by finding a block that begins at the desired +rotational position on another track. +The reason that block chaining has not been implemented is because it +would require rewriting all the disk drivers in the system, +and the current throughput rates are already limited by the +speed of the available processors. +.PP +Currently only one block is allocated to a file at a time. +A technique used by the DEMOS file system +when it finds that a file is growing rapidly, +is to preallocate several blocks at once, +releasing them when the file is closed if they remain unused. +By batching up allocations, the system can reduce the +overhead of allocating at each write, +and it can cut down on the number of disk writes needed to +keep the block pointers on the disk +synchronized with the block allocation [Powell79]. +This technique was not included because block allocation +currently accounts for less than 10% of the time spent in +a write system call and, once again, the +current throughput rates are already limited by the speed +of the available processors. +.ds RH Functional enhancements +.sp 2 +.ne 1i diff --git a/share/doc/smm/05.fastfs/5.t b/share/doc/smm/05.fastfs/5.t new file mode 100644 index 000000000000..8a3f4bce812d --- /dev/null +++ b/share/doc/smm/05.fastfs/5.t @@ -0,0 +1,287 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ds RH Functional enhancements +.NH +File system functional enhancements +.PP +The performance enhancements to the +UNIX file system did not require +any changes to the semantics or +data structures visible to application programs. +However, several changes had been generally desired for some +time but had not been introduced because they would require users to +dump and restore all their file systems. +Since the new file system already +required all existing file systems to +be dumped and restored, +these functional enhancements were introduced at this time. +.NH 2 +Long file names +.PP +File names can now be of nearly arbitrary length. +Only programs that read directories are affected by this change. +To promote portability to UNIX systems that +are not running the new file system, a set of directory +access routines have been introduced to provide a consistent +interface to directories on both old and new systems. +.PP +Directories are allocated in 512 byte units called chunks. +This size is chosen so that each allocation can be transferred +to disk in a single operation. +Chunks are broken up into variable length records termed +directory entries. A directory entry +contains the information necessary to map the name of a +file to its associated inode. +No directory entry is allowed to span multiple chunks. +The first three fields of a directory entry are fixed length +and contain: an inode number, the size of the entry, and the length +of the file name contained in the entry. +The remainder of an entry is variable length and contains +a null terminated file name, padded to a 4 byte boundary. +The maximum length of a file name in a directory is +currently 255 characters. +.PP +Available space in a directory is recorded by having +one or more entries accumulate the free space in their +entry size fields. This results in directory entries +that are larger than required to hold the +entry name plus fixed length fields. Space allocated +to a directory should always be completely accounted for +by totaling up the sizes of its entries. +When an entry is deleted from a directory, +its space is returned to a previous entry +in the same directory chunk by increasing the size of the +previous entry by the size of the deleted entry. +If the first entry of a directory chunk is free, then +the entry's inode number is set to zero to indicate +that it is unallocated. +.NH 2 +File locking +.PP +The old file system had no provision for locking files. +Processes that needed to synchronize the updates of a +file had to use a separate ``lock'' file. +A process would try to create a ``lock'' file. +If the creation succeeded, then the process +could proceed with its update; +if the creation failed, then the process would wait and try again. +This mechanism had three drawbacks. +Processes consumed CPU time by looping over attempts to create locks. +Locks left lying around because of system crashes had +to be manually removed (normally in a system startup command script). +Finally, processes running as system administrator +are always permitted to create files, +so were forced to use a different mechanism. +While it is possible to get around all these problems, +the solutions are not straight forward, +so a mechanism for locking files has been added. +.PP +The most general schemes allow multiple processes +to concurrently update a file. +Several of these techniques are discussed in [Peterson83]. +A simpler technique is to serialize access to a file with locks. +To attain reasonable efficiency, +certain applications require the ability to lock pieces of a file. +Locking down to the byte level has been implemented in the +Onyx file system by [Bass81]. +However, for the standard system applications, +a mechanism that locks at the granularity of a file is sufficient. +.PP +Locking schemes fall into two classes, +those using hard locks and those using advisory locks. +The primary difference between advisory locks and hard locks is the +extent of enforcement. +A hard lock is always enforced when a program tries to +access a file; +an advisory lock is only applied when it is requested by a program. +Thus advisory locks are only effective when all programs accessing +a file use the locking scheme. +With hard locks there must be some override +policy implemented in the kernel. +With advisory locks the policy is left to the user programs. +In the UNIX system, programs with system administrator +privilege are allowed override any protection scheme. +Because many of the programs that need to use locks must +also run as the system administrator, +we chose to implement advisory locks rather than +create an additional protection scheme that was inconsistent +with the UNIX philosophy or could +not be used by system administration programs. +.PP +The file locking facilities allow cooperating programs to apply +advisory +.I shared +or +.I exclusive +locks on files. +Only one process may have an exclusive +lock on a file while multiple shared locks may be present. +Both shared and exclusive locks cannot be present on +a file at the same time. +If any lock is requested when +another process holds an exclusive lock, +or an exclusive lock is requested when another process holds any lock, +the lock request will block until the lock can be obtained. +Because shared and exclusive locks are advisory only, +even if a process has obtained a lock on a file, +another process may access the file. +.PP +Locks are applied or removed only on open files. +This means that locks can be manipulated without +needing to close and reopen a file. +This is useful, for example, when a process wishes +to apply a shared lock, read some information +and determine whether an update is required, then +apply an exclusive lock and update the file. +.PP +A request for a lock will cause a process to block if the lock +can not be immediately obtained. +In certain instances this is unsatisfactory. +For example, a process that +wants only to check if a lock is present would require a separate +mechanism to find out this information. +Consequently, a process may specify that its locking +request should return with an error if a lock can not be immediately +obtained. +Being able to conditionally request a lock +is useful to ``daemon'' processes +that wish to service a spooling area. +If the first instance of the +daemon locks the directory where spooling takes place, +later daemon processes can +easily check to see if an active daemon exists. +Since locks exist only while the locking processes exist, +lock files can never be left active after +the processes exit or if the system crashes. +.PP +Almost no deadlock detection is attempted. +The only deadlock detection done by the system is that the file +to which a lock is applied must not already have a +lock of the same type (i.e. the second of two successive calls +to apply a lock of the same type will fail). +.NH 2 +Symbolic links +.PP +The traditional UNIX file system allows multiple +directory entries in the same file system +to reference a single file. Each directory entry +``links'' a file's name to an inode and its contents. +The link concept is fundamental; +inodes do not reside in directories, but exist separately and +are referenced by links. +When all the links to an inode are removed, +the inode is deallocated. +This style of referencing an inode does +not allow references across physical file +systems, nor does it support inter-machine linkage. +To avoid these limitations +.I "symbolic links" +similar to the scheme used by Multics [Feiertag71] have been added. +.PP +A symbolic link is implemented as a file that contains a pathname. +When the system encounters a symbolic link while +interpreting a component of a pathname, +the contents of the symbolic link is prepended to the rest +of the pathname, and this name is interpreted to yield the +resulting pathname. +In UNIX, pathnames are specified relative to the root +of the file system hierarchy, or relative to a process's +current working directory. Pathnames specified relative +to the root are called absolute pathnames. Pathnames +specified relative to the current working directory are +termed relative pathnames. +If a symbolic link contains an absolute pathname, +the absolute pathname is used, +otherwise the contents of the symbolic link is evaluated +relative to the location of the link in the file hierarchy. +.PP +Normally programs do not want to be aware that there is a +symbolic link in a pathname that they are using. +However certain system utilities +must be able to detect and manipulate symbolic links. +Three new system calls provide the ability to detect, read, and write +symbolic links; seven system utilities required changes +to use these calls. +.PP +In future Berkeley software distributions +it may be possible to reference file systems located on +remote machines using pathnames. When this occurs, +it will be possible to create symbolic links that span machines. +.NH 2 +Rename +.PP +Programs that create a new version of an existing +file typically create the +new version as a temporary file and then rename the temporary file +with the name of the target file. +In the old UNIX file system renaming required three calls to the system. +If a program were interrupted or the system crashed between these calls, +the target file could be left with only its temporary name. +To eliminate this possibility the \fIrename\fP system call +has been added. The rename call does the rename operation +in a fashion that guarantees the existence of the target name. +.PP +Rename works both on data files and directories. +When renaming directories, +the system must do special validation checks to insure +that the directory tree structure is not corrupted by the creation +of loops or inaccessible directories. +Such corruption would occur if a parent directory were moved +into one of its descendants. +The validation check requires tracing the descendents of the target +directory to insure that it does not include the directory being moved. +.NH 2 +Quotas +.PP +The UNIX system has traditionally attempted to share all available +resources to the greatest extent possible. +Thus any single user can allocate all the available space +in the file system. +In certain environments this is unacceptable. +Consequently, a quota mechanism has been added for restricting the +amount of file system resources that a user can obtain. +The quota mechanism sets limits on both the number of inodes +and the number of disk blocks that a user may allocate. +A separate quota can be set for each user on each file system. +Resources are given both a hard and a soft limit. +When a program exceeds a soft limit, +a warning is printed on the users terminal; +the offending program is not terminated +unless it exceeds its hard limit. +The idea is that users should stay below their soft limit between +login sessions, +but they may use more resources while they are actively working. +To encourage this behavior, +users are warned when logging in if they are over +any of their soft limits. +If users fails to correct the problem for too many login sessions, +they are eventually reprimanded by having their soft limit +enforced as their hard limit. +.ds RH Acknowledgements +.sp 2 +.ne 1i diff --git a/share/doc/smm/05.fastfs/6.t b/share/doc/smm/05.fastfs/6.t new file mode 100644 index 000000000000..afda726d035b --- /dev/null +++ b/share/doc/smm/05.fastfs/6.t @@ -0,0 +1,153 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.ds RH Acknowledgements +.SH +\s+2Acknowledgements\s0 +.PP +We thank Robert Elz for his ongoing interest in the new file system, +and for adding disk quotas in a rational and efficient manner. +We also acknowledge Dennis Ritchie for his suggestions +on the appropriate modifications to the user interface. +We appreciate Michael Powell's explanations on how +the DEMOS file system worked; +many of his ideas were used in this implementation. +Special commendation goes to Peter Kessler and Robert Henry for acting +like real users during the early debugging stage when file systems were +less stable than they should have been. +The criticisms and suggestions by the reviews contributed significantly +to the coherence of the paper. +Finally we thank our sponsors, +the National Science Foundation under grant MCS80-05144, +and the Defense Advance Research Projects Agency (DoD) under +ARPA Order No. 4031 monitored by Naval Electronic System Command under +Contract No. N00039-82-C-0235. +.ds RH References +.nr H2 1 +.sp 2 +.SH +\s+2References\s0 +.LP +.IP [Almes78] 20 +Almes, G., and Robertson, G. +"An Extensible File System for Hydra" +Proceedings of the Third International Conference on Software Engineering, +IEEE, May 1978. +.IP [Bass81] 20 +Bass, J. +"Implementation Description for File Locking", +Onyx Systems Inc, 73 E. Trimble Rd, San Jose, CA 95131 +Jan 1981. +.IP [Feiertag71] 20 +Feiertag, R. J. and Organick, E. I., +"The Multics Input-Output System", +Proceedings of the Third Symposium on Operating Systems Principles, +ACM, Oct 1971. pp 35-41 +.IP [Ferrin82a] 20 +Ferrin, T.E., +"Performance and Robustness Improvements in Version 7 UNIX", +Computer Graphics Laboratory Technical Report 2, +School of Pharmacy, University of California, +San Francisco, January 1982. +Presented at the 1982 Winter Usenix Conference, Santa Monica, California. +.IP [Ferrin82b] 20 +Ferrin, T.E., +"Performance Issuses of VMUNIX Revisited", +;login: (The Usenix Association Newsletter), Vol 7, #5, November 1982. pp 3-6 +.IP [Kridle83] 20 +Kridle, R., and McKusick, M., +"Performance Effects of Disk Subsystem Choices for +VAX Systems Running 4.2BSD UNIX", +Computer Systems Research Group, Dept of EECS, Berkeley, CA 94720, +Technical Report #8. +.IP [Kowalski78] 20 +Kowalski, T. +"FSCK - The UNIX System Check Program", +Bell Laboratory, Murray Hill, NJ 07974. March 1978 +.IP [Knuth75] 20 +Kunth, D. +"The Art of Computer Programming", +Volume 3 - Sorting and Searching, +Addison-Wesley Publishing Company Inc, Reading, Mass, 1975. pp 506-549 +.IP [Maruyama76] +Maruyama, K., and Smith, S. +"Optimal reorganization of Distributed Space Disk Files", +CACM, 19, 11. Nov 1976. pp 634-642 +.IP [Nevalainen77] 20 +Nevalainen, O., Vesterinen, M. +"Determining Blocking Factors for Sequential Files by Heuristic Methods", +The Computer Journal, 20, 3. Aug 1977. pp 245-247 +.IP [Pechura83] 20 +Pechura, M., and Schoeffler, J. +"Estimating File Access Time of Floppy Disks", +CACM, 26, 10. Oct 1983. pp 754-763 +.IP [Peterson83] 20 +Peterson, G. +"Concurrent Reading While Writing", +ACM Transactions on Programming Languages and Systems, +ACM, 5, 1. Jan 1983. pp 46-55 +.IP [Powell79] 20 +Powell, M. +"The DEMOS File System", +Proceedings of the Sixth Symposium on Operating Systems Principles, +ACM, Nov 1977. pp 33-42 +.IP [Ritchie74] 20 +Ritchie, D. M. and Thompson, K., +"The UNIX Time-Sharing System", +CACM 17, 7. July 1974. pp 365-375 +.IP [Smith81a] 20 +Smith, A. +"Input/Output Optimization and Disk Architectures: A Survey", +Performance and Evaluation 1. Jan 1981. pp 104-117 +.IP [Smith81b] 20 +Smith, A. +"Bibliography on File and I/O System Optimization and Related Topics", +Operating Systems Review, 15, 4. Oct 1981. pp 39-54 +.IP [Symbolics81] 20 +"Symbolics File System", +Symbolics Inc, 9600 DeSoto Ave, Chatsworth, CA 91311 +Aug 1981. +.IP [Thompson78] 20 +Thompson, K. +"UNIX Implementation", +Bell System Technical Journal, 57, 6, part 2. pp 1931-1946 +July-August 1978. +.IP [Thompson80] 20 +Thompson, M. +"Spice File System", +Carnegie-Mellon University, +Department of Computer Science, Pittsburg, PA 15213 +#CMU-CS-80, Sept 1980. +.IP [Trivedi80] 20 +Trivedi, K. +"Optimal Selection of CPU Speed, Device Capabilities, and File Assignments", +Journal of the ACM, 27, 3. July 1980. pp 457-473 +.IP [White80] 20 +White, R. M. +"Disk Storage Technology", +Scientific American, 243(2), August 1980. diff --git a/share/doc/smm/05.fastfs/Makefile b/share/doc/smm/05.fastfs/Makefile new file mode 100644 index 000000000000..fa7cf0f830df --- /dev/null +++ b/share/doc/smm/05.fastfs/Makefile @@ -0,0 +1,7 @@ +VOLUME= smm/05.fastfs +SRCS= 0.t 1.t 2.t 3.t 4.t 5.t 6.t +MACROS= -ms +USE_TBL= +USE_EQN= + +.include <bsd.doc.mk> diff --git a/share/doc/smm/06.nfs/0.t b/share/doc/smm/06.nfs/0.t new file mode 100644 index 000000000000..8e869de223a8 --- /dev/null +++ b/share/doc/smm/06.nfs/0.t @@ -0,0 +1,69 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" This document is derived from software contributed to Berkeley by +.\" Rick Macklem at The University of Guelph. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.(l C +.sz 14 +.b "The 4.4BSD NFS Implementation" +.sp +.sz 10 +Rick Macklem +.i "University of Guelph" +.)l +.sp 2 +.ce 1 +.sz 12 +.b "ABSTRACT" +.eh 'SMM:06-%''The 4.4BSD NFS Implementation' +.oh 'The 4.4BSD NFS Implementation''SMM:06-%' +.pp +The 4.4BSD implementation of the Network File System (NFS)\** is +intended to interoperate with +.(f +\**Network File System (NFS) is believed to be a registered trademark of +Sun Microsystems Inc. +.)f +other NFS Version 2 Protocol (RFC1094) implementations but also +allows use of an alternate protocol that is hoped to provide better +performance in certain environments. +This paper will informally discuss these various protocol features and +their use. +There is a brief overview of the implementation followed +by several sections on various problem areas related to NFS +and some hints on how to deal with them. +.pp +Not Quite NFS (NQNFS) is an NFS like protocol designed to maintain full cache +consistency between clients in a crash tolerant manner. It is an adaptation +of the NFS protocol such that the server supports both NFS +and NQNFS clients while maintaining full consistency between the server and +NQNFS clients. +It borrows heavily from work done on Spritely-NFS [Srinivasan89], but uses +Leases [Gray89] to avoid the need to recover server state information +after a crash. +.sp diff --git a/share/doc/smm/06.nfs/1.t b/share/doc/smm/06.nfs/1.t new file mode 100644 index 000000000000..b1b07cdb2e6b --- /dev/null +++ b/share/doc/smm/06.nfs/1.t @@ -0,0 +1,547 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" This document is derived from software contributed to Berkeley by +.\" Rick Macklem at The University of Guelph. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.sh 1 "NFS Implementation" +.pp +The 4.4BSD implementation of NFS and the alternate protocol nicknamed +Not Quite NFS (NQNFS) are kernel resident, but make use of a few system +daemons. +The kernel implementation does not use an RPC library, handling the RPC +request and reply messages directly in \fImbuf\fR data areas. NFS +interfaces to the network using +sockets via. the kernel interface available in +\fIsys/kern/uipc_syscalls.c\fR as \fIsosend(), soreceive(),\fR... +There are connection management routines for support of sockets for connection +oriented protocols and timeout/retransmit support for datagram sockets on +the client side. +For connection oriented transport protocols, +such as TCP/IP, there is one connection +for each client to server mount point that is maintained until an umount. +If the connection breaks, the client will attempt a reconnect with a new +socket. +The client side can operate without any daemons running, but performance +will be improved by running nfsiod daemons that perform read-aheads +and write-behinds. +For the server side to function, the daemons portmap, mountd and +nfsd must be running. +The mountd daemon performs two important functions. +.ip 1) +Upon startup and after a hangup signal, mountd reads the exports +file and pushes the export information for each local file system down +into the kernel via. the mount system call. +.ip 2) +Mountd handles remote mount protocol (RFC1094, Appendix A) requests. +.lp +The nfsd master daemon forks off children that enter the kernel +via. the nfssvc system call. The children normally remain kernel +resident, providing a process context for the NFS RPC servers. +Meanwhile, the master nfsd waits to accept new connections from clients +using connection oriented transport protocols and passes the new sockets down +into the kernel. +The client side mount_nfs along with portmap and +mountd are the only parts of the NFS subsystem that make any +use of the Sun RPC library. +.sh 1 "Mount Problems" +.pp +There are several problems that can be encountered at the time of an NFS +mount, ranging from an unresponsive NFS server (crashed, network partitioned +from client, etc.) to various interoperability problems between different +NFS implementations. +.pp +On the server side, +if the 4.4BSD NFS server will be handling any PC clients, mountd will +require the \fB-n\fR option to enable non-root mount request servicing. +Running of a pcnfsd\** daemon will also be necessary. +.(f +\** Pcnfsd is available in source form from Sun Microsystems and many +anonymous ftp sites. +.)f +The server side requires that the daemons +mountd and nfsd be running and that +they be registered with portmap properly. +If problems are encountered, +the safest fix is to kill all the daemons and then restart them in +the order portmap, mountd and nfsd. +Other server side problems are normally caused by problems with the format +of the exports file, which is covered under +Security and in the exports man page. +.pp +On the client side, there are several mount options useful for dealing +with server problems. +In cases where a file system is not critical for system operation, the +\fB-b\fR +mount option may be specified so that mount_nfs will go into the +background for a mount attempt on an unresponsive server. +This is useful for mounts specified in +\fIfstab(5)\fR, +so that the system will not get hung while booting doing +\fBmount -a\fR +because a file server is not responsive. +On the other hand, if the file system is critical to system operation, this +option should not be used so that the client will wait for the server to +come up before completing bootstrapping. +There are also three mount options to help deal with interoperability issues +with various non-BSD NFS servers. The +\fB-P\fR +option specifies that the NFS +client use a reserved IP port number to satisfy some servers' security +requirements.\** +.(f +\**Any security benefit of this is highly questionable and as +such the BSD server does not require a client to use a reserved port number. +.)f +The +\fB-c\fR +option stops the NFS client from doing a \fIconnect\fR on the UDP +socket, so that the mount works with servers that send NFS replies from +port numbers other than the standard 2049.\** +.(f +\**The Encore Multimax is known +to require this. +.)f +Finally, the +\fB-g=\fInum\fR +option sets the maximum size of the group list in the credentials passed +to an NFS server in every RPC request. Although RFC1057 specifies a maximum +size of 16 for the group list, some servers can't handle that many. +If a user, particularly root doing a mount, +keeps getting access denied from a file server, try temporarily +reducing the number of groups that user is in to less than 5 +by editing /etc/group. If the user can then access the file system, slowly +increase the number of groups for that user until the limit is found and +then peg the limit there with the +\fB-g=\fInum\fR +option. +This implies that the server will only see the first \fInum\fR +groups that the user is in, which can cause some accessibility problems. +.pp +For sites that have many NFS servers, amd [Pendry93] +is a useful administration tool. +It also reduces the number of actual NFS mount points, alleviating problems +with commands such as df(1) that hang when any of the NFS servers is +unreachable. +.sh 1 "Dealing with Hung Servers" +.pp +There are several mount options available to help a client deal with +being hung waiting for response from a crashed or unreachable\** server. +.(f +\**Due to a network partitioning or similar. +.)f +By default, a hard mount will continue to try to contact the server +``forever'' to complete the system call. This type of mount is appropriate +when processes on the client that access files in the file system do not +tolerate file I/O systems calls that return -1 with \fIerrno == EINTR\fR +and/or access to the file system is critical for normal system operation. +.lp +There are two other alternatives: +.ip 1) +A soft mount (\fB-s\fR option) retries an RPC \fIn\fR +times and then the corresponding +system call returns -1 with errno set to EINTR. +For TCP transport, the actual RPC request is not retransmitted, but the +timeout intervals waiting for a reply from the server are done +in the same manner as UDP for this purpose. +The problem with this type of mount is that most applications do not +expect an EINTR error return from file I/O system calls (since it never +occurs for a local file system) and get confused by the error return +from the I/O system call. +The option +\fB-x=\fInum\fR +is used to set the RPC retry limit and if set too low, the error returns +will start occurring whenever the NFS server is slow due to heavy load. +Alternately, a large retry limit can result in a process hung for a long +time, due to a crashed server or network partitioning. +.ip 2) +An interruptible mount (\fB-i\fR option) checks to see if a termination signal +is pending for the process when waiting for server response and if it is, +the I/O system call posts an EINTR. Normally this results in the process +being terminated by the signal when returning from the system call. +This feature allows you to ``^C'' out of processes that are hung +due to unresponsive servers. +The problem with this approach is that signals that are caught by +a process are not recognized as termination signals +and the process will remain hung.\** +.(f +\**Unfortunately, there are also some resource allocation situations in the +BSD kernel where the termination signal will be ignored and the process +will not terminate. +.)f +.sh 1 "RPC Transport Issues" +.pp +The NFS Version 2 protocol runs over UDP/IP transport by +sending each Sun Remote Procedure Call (RFC1057) +request/reply message in a single UDP +datagram. Since UDP does not guarantee datagram delivery, the +Remote Procedure Call (RPC) layer +times out and retransmits an RPC request if +no RPC reply has been received. Since this round trip timeout (RTO) value +is for the entire RPC operation, including RPC message transmission to the +server, queuing at the server for an nfsd, performing the RPC and +sending the RPC reply message back to the client, it can be highly variable +for even a moderately loaded NFS server. +As a result, the RTO interval must be a conservation (large) estimate, in +order to avoid extraneous RPC request retransmits.\** +.(f +\**At best, an extraneous RPC request retransmit increases +the load on the server and at worst can result in damaged files +on the server when non-idempotent RPCs are redone [Juszczak89]. +.)f +Also, with an 8Kbyte read/write data size +(the default), the read/write reply/request will be an 8+Kbyte UDP datagram +that must normally be fragmented at the IP layer for transmission.\** +.(f +\**6 IP fragments for an Ethernet, +which has a maximum transmission unit of 1500bytes. +.)f +For IP fragments to be successfully reassembled into +the IP datagram at the receive end, all +fragments must be received within a fairly short ``time to live''. +If one fragment is lost/damaged in transit, +the entire RPC must be retransmitted and redone. +This problem can be exaggerated by a network interface on the receiver that +cannot handle the reception of back to back network packets. [Kent87a] +.pp +There are several tuning mount +options on the client side that can prove useful when trying to +alleviate performance problems related to UDP RPC transport. +The options +\fB-r=\fInum\fR +and +\fB-w=\fInum\fR +specify the maximum read or write data size respectively. +The size \fInum\fR +should be a power of 2 (4K, 2K, 1K) and adjusted downward from the +maximum of 8Kbytes +whenever IP fragmentation is causing problems. The best indicator of +IP fragmentation problems is a significant number of +\fIfragments dropped after timeout\fR +reported by the \fIip:\fR section of a \fBnetstat -s\fR +command on either the client or server. +Of course, if the fragments are being dropped at the server, it can be +fun figuring out which client(s) are involved. +The most likely candidates are clients that are not +on the same local area network as the +server or have network interfaces that do not receive several +back to back network packets properly. +.pp +By default, the 4.4BSD NFS client dynamically estimates the retransmit +timeout interval for the RPC and this appears to work reasonably well for +many environments. However, the +\fB-d\fR +flag can be specified to turn off +the dynamic estimation of retransmit timeout, so that the client will +use a static initial timeout interval.\** +.(f +\**After the first retransmit timeout, the initial interval is backed off +exponentially. +.)f +The +\fB-t=\fInum\fR +option can be used with +\fB-d\fR +to set the initial timeout interval to other than the default of 2 seconds. +The best indicator that dynamic estimation should be turned off would +be a significant number\** in the \fIX Replies\fR field and a +.(f +\**Even 0.1% of the total RPCs is probably significant. +.)f +large number in the \fIRetries\fR field +in the \fIRpc Info:\fR section as reported +by the \fBnfsstat\fR command. +On the server, there would be significant numbers of \fIInprog\fR recent +request cache hits in the \fIServer Cache Stats:\fR section as reported +by the \fBnfsstat\fR command, when run on the server. +.pp +The tradeoff is that a smaller timeout interval results in a better +average RPC response time, but increases the risk of extraneous retries +that in turn increase server load and the possibility of damaged files +on the server. It is probably best to err on the safe side and use a large +(>= 2sec) fixed timeout if the dynamic retransmit timeout estimation +seems to be causing problems. +.pp +An alternative to all this fiddling is to run NFS over TCP transport instead +of UDP. +Since the 4.4BSD TCP implementation provides reliable +delivery with congestion control, it avoids all of the above problems. +It also permits the use of read and write data sizes greater than the 8Kbyte +limit for UDP transport.\** +.(f +\**Read/write data sizes greater than 8Kbytes will not normally improve +performance unless the kernel constant MAXBSIZE is increased and the +file system on the server has a block size greater than 8Kbytes. +.)f +NFS over TCP usually delivers comparable to significantly better performance +than NFS over UDP +unless the client or server processor runs at less than 5-10MIPS. For a +slow processor, the extra CPU overhead of using TCP transport will become +significant and TCP transport may only be useful when the client +to server interconnect traverses congested gateways. +The main problem with using TCP transport is that it is only supported +between BSD clients and servers.\** +.(f +\**There are rumors of commercial NFS over TCP implementations on the horizon +and these may well be worth exploring. +.)f +.sh 1 "Other Tuning Tricks" +.pp +Another mount option that may improve performance over +certain network interconnects is \fB-a=\fInum\fR +which sets the number of blocks that the system will +attempt to read-ahead during sequential reading of a file. The default value +of 1 seems to be appropriate for most situations, but a larger value might +achieve better performance for some environments, such as a mount to a server +across a ``high bandwidth * round trip delay'' interconnect. +.pp +For the adventurous, playing with the size of the buffer cache +can also improve performance for some environments that use NFS heavily. +Under some workloads, a buffer cache of 4-6Mbytes can result in significant +performance improvements over 1-2Mbytes, both in client side system call +response time and reduced server RPC load. +The buffer cache size defaults to 10% of physical memory, +but this can be overridden by specifying the BUFPAGES option +in the machine's config file.\** +.(f +BUFPAGES is the number of physical machine pages allocated to the buffer cache. +ie. BUFPAGES * NBPG = buffer cache size in bytes +.)f +When increasing the size of BUFPAGES, it is also advisable to increase the +number of buffers NBUF by a corresponding amount. +Note that there is a tradeoff of memory allocated to the buffer cache versus +available for paging, which implies that making the buffer cache larger +will increase paging rate, with possibly disastrous results. +.sh 1 "Security Issues" +.pp +When a machine is running an NFS server it opens up a great big security hole. +For ordinary NFS, the server receives client credentials +in the RPC request as a user id +and a list of group ids and trusts them to be authentic! +The only tool available to restrict remote access to +file systems with is the exports(5) file, +so file systems should be exported with great care. +The exports file is read by mountd upon startup and after a hangup signal +is posted for it and then as much of the access specifications as possible are +pushed down into the kernel for use by the nfsd(s). +The trick here is that the kernel information is stored on a per +local file system mount point and client host address basis and cannot refer to +individual directories within the local server file system. +It is best to think of the exports file as referring to the various local +file systems and not just directory paths as mount points. +A local file system may be exported to a specific host, all hosts that +match a subnet mask or all other hosts (the world). The latter is very +dangerous and should only be used for public information. It is also +strongly recommended that file systems exported to ``the world'' be exported +read-only. +For each host or group of hosts, the file system can be exported read-only or +read/write. +You can also define one of three client user id to server credential +mappings to help control access. +Root (user id == 0) can be mapped to some default credentials while all other +user ids are accepted as given. +If the default credentials for user id equal zero +are root, then there is essentially no remapping. +Most NFS file systems are exported this way, most commonly mapping +user id == 0 to the credentials for the user nobody. +Since the client user id and group id list is used unchanged on the server +(except for root), this also implies that +the user id and group id space must be common between the client and server. +(ie. user id N on the client must refer to the same user on the server) +All user ids can be mapped to a default set of credentials, typically that of +the user nobody. This essentially gives world access to all +users on the corresponding hosts. +.pp +As well as the standard NFS Version 2 protocol (RFC1094) implementation, BSD +systems can use a variant of the protocol called Not Quite NFS (NQNFS) that +supports a variety of protocol extensions. +This protocol uses 64bit file offsets +and sizes, an \fIaccess rpc\fR, an \fIappend\fR option on the write rpc +and extended file attributes to support 4.4BSD file system functionality +more fully. +It also makes use of a variant of short term +\fIleases\fR [Gray89] with delayed write client caching, +in an effort to provide full cache consistency and better performance. +This protocol is available between 4.4BSD systems only and is used when +the \fB-q\fR mount option is specified. +It can be used with any of the aforementioned options for NFS, such as TCP +transport (\fB-T\fR). +Although this protocol is experimental, it is recommended over NFS for +mounts between 4.4BSD systems.\** +.(f +\**I would appreciate email from anyone who can provide +NFS vs. NQNFS performance measurements, +particularly fast clients, many clients or over an internetwork +connection with a large ``bandwidth * RTT'' product. +.)f +.sh 1 "Monitoring NFS Activity" +.pp +The basic command for monitoring NFS activity on clients and servers is +nfsstat. It reports cumulative statistics of various NFS activities, +such as counts of the various different RPCs and cache hit rates on the client +and server. Of particular interest on the server are the fields in the +\fIServer Cache Stats:\fR section, which gives numbers for RPC retries received +in the first three fields and total RPCs in the fourth. The first three fields +should remain a very small percentage of the total. If not, it +would indicate one or more clients doing retries too aggressively and the fix +would be to isolate these clients, +disable the dynamic RTO estimation on them and +make their initial timeout interval a conservative (ie. large) value. +.pp +On the client side, the fields in the \fIRpc Info:\fR section are of particular +interest, as they give an overall picture of NFS activity. +The \fITimedOut\fR field is the number of I/O system calls that returned -1 +for ``soft'' mounts and can be reduced +by increasing the retry limit or changing +the mount type to ``intr'' or ``hard''. +The \fIInvalid\fR field is a count of trashed RPC replies that are received +and should remain zero.\** +.(f +\**Some NFS implementations run with UDP checksums disabled, so garbage RPC +messages can be received. +.)f +The \fIX Replies\fR field counts the number of repeated RPC replies received +from the server and is a clear indication of a too aggressive RTO estimate. +Unfortunately, a good NFS server implementation will use a ``recent request +cache'' [Juszczak89] that will suppress the extraneous replies. +A large value for \fIRetries\fR indicates a problem, but +it could be any of: +.ip \(bu +a too aggressive RTO estimate +.ip \(bu +an overloaded NFS server +.ip \(bu +IP fragments being dropped (gateway, client or server) +.lp +and requires further investigation. +The \fIRequests\fR field is the total count of RPCs done on all servers. +.pp +The \fBnetstat -s\fR comes in useful during investigation of RPC transport +problems. +The field \fIfragments dropped after timeout\fR in +the \fIip:\fR section indicates IP fragments are +being lost and a significant number of these occurring indicates that the +use of TCP transport or a smaller read/write data size is in order. +A significant number of \fIbad checksums\fR reported in the \fIudp:\fR +section would suggest network problems of a more generic sort. +(cabling, transceiver or network hardware interface problems or similar) +.pp +There is a RPC activity logging facility for both the client and +server side in the kernel. +When logging is enabled by setting the kernel variable nfsrtton to +one, the logs in the kernel structures nfsrtt (for the client side) +and nfsdrt (for the server side) are updated upon the completion +of each RPC in a circular manner. +The pos element of the structure is the index of the next element +of the log array to be updated. +In other words, elements of the log array from \fIlog\fR[pos] to +\fIlog\fR[pos - 1] are in chronological order. +The include file <sys/nfsrtt.h> should be consulted for details on the +fields in the two log structures.\** +.(f +\**Unfortunately, a monitoring tool that uses these logs is still in the +planning (dreaming) stage. +.)f +.sh 1 "Diskless Client Support" +.pp +The NFS client does include kernel support for diskless/dataless operation +where the root file system and optionally the swap area is remote NFS mounted. +A diskless/dataless client is configured using a version of the +``swapkernel.c'' file as provided in the directory \fIcontrib/diskless.nfs\fR. +If the swap device == NODEV, it specifies an NFS mounted swap area and should +be configured the same size as set up by diskless_setup when run on the server. +This file must be put in the \fIsys/compile/<machine_name>\fR kernel build +directory after the config command has been run, since config does +not know about specifying NFS root and swap areas. +The kernel variable mountroot must be set to nfs_mountroot instead of +ffs_mountroot and the kernel structure nfs_diskless must be filled in +properly. +There are some primitive system administration tools in the \fIcontrib/diskless.nfs\fR directory to assist in filling in +the nfs_diskless structure and in setting up an NFS server for +diskless/dataless clients. +The tools were designed to provide a bare bones capability, to allow maximum +flexibility when setting up different servers. +.lp +The tools are as follows: +.ip \(bu +diskless_offset.c - This little program reads a ``kernel'' object file and +writes the file byte offset of the nfs_diskless structure in it to +standard out. It was kept separate because it sometimes has to +be compiled/linked in funny ways depending on the client architecture. +(See the comment at the beginning of it.) +.ip \(bu +diskless_setup.c - This program is run on the server and sets up files for a +given client. It mostly just fills in an nfs_diskless structure and +writes it out to either the "kernel" file or a separate file called +/var/diskless/setup.<official-hostname> +.ip \(bu +diskless_boot.c - There are two functions in here that may be used +by a bootstrap server such as tftpd to permit sharing of the ``kernel'' +object file for similar clients. This saves disk space on the bootstrap +server and simplify organization, but are not critical for correct operation. +They read the ``kernel'' +file, but optionally fill in the nfs_diskless structure from a +separate "setup.<official-hostname>" file so that there is only +one copy of "kernel" for all similar (same arch etc.) clients. +These functions use a text file called +/var/diskless/boot.<official-hostname> to control the netboot. +.lp +The basic setup steps are: +.ip \(bu +make a "kernel" for the client(s) with mountroot() == nfs_mountroot() +and swdevt[0].sw_dev == NODEV if it is to do nfs swapping as well +(See the same swapkernel.c file) +.ip \(bu +run diskless_offset on the kernel file to find out the byte offset +of the nfs_diskless structure +.ip \(bu +Run diskless_setup on the server to set up the server and fill in the +nfs_diskless structure for that client. +The nfs_diskless structure can either be written into the +kernel file (the -x option) or +saved in /var/diskless/setup.<official-hostname>. +.ip \(bu +Set up the bootstrap server. If the nfs_diskless structure was written into +the ``kernel'' file, any vanilla bootstrap protocol such as bootp/tftp can +be used. If the bootstrap server has been modified to use the functions in +diskless_boot.c, then a +file called /var/diskless/boot.<official-hostname> +must be created. +It is simply a two line text file, where the first line is the pathname +of the correct ``kernel'' file and the second line has the pathname of +the nfs_diskless structure file and its byte offset in it. +For example: +.br + /var/diskless/kernel.pmax +.br + /var/diskless/setup.rickers.cis.uoguelph.ca 642308 +.br +.ip \(bu +Create a /var subtree for each client in an appropriate place on the server, +such as /var/diskless/var/<client-hostname>/... +By using the <client-hostname> to differentiate /var for each host, +/etc/rc can be modified to mount the correct /var from the server. diff --git a/share/doc/smm/06.nfs/2.t b/share/doc/smm/06.nfs/2.t new file mode 100644 index 000000000000..11be009a8d6f --- /dev/null +++ b/share/doc/smm/06.nfs/2.t @@ -0,0 +1,524 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" This document is derived from software contributed to Berkeley by +.\" Rick Macklem at The University of Guelph. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.sh 1 "Not Quite NFS, Crash Tolerant Cache Consistency for NFS" +.pp +Not Quite NFS (NQNFS) is an NFS like protocol designed to maintain full cache +consistency between clients in a crash tolerant manner. +It is an adaptation of the NFS protocol such that the server supports both NFS +and NQNFS clients while maintaining full consistency between the server and +NQNFS clients. +This section borrows heavily from work done on Spritely-NFS [Srinivasan89], +but uses Leases [Gray89] to avoid the need to recover server state information +after a crash. +The reader is strongly encouraged to read these references before +trying to grasp the material presented here. +.sh 2 "Overview" +.pp +The protocol maintains cache consistency by using a somewhat +Sprite [Nelson88] like protocol, +but is based on short term leases\** instead of hard state information +about open files. +.(f +\** A lease is a ticket permitting an activity that is +valid until some expiry time. +.)f +The basic principal is that the protocol will disable client caching of a +file whenever that file is write shared\**. +.(f +\** Write sharing occurs when at least one client is modifying a file while +other client(s) are reading the file. +.)f +Whenever a client wishes to cache data for a file it must hold a valid lease. +There are three types of leases: read caching, write caching and non-caching. +The latter type requires that all file operations be done synchronously with +the server via. RPCs. +A read caching lease allows for client data caching, but no file modifications +may be done. +A write caching lease allows for client caching of writes, +but requires that all writes be pushed to the server when the lease expires. +If a client has dirty buffers\** +.(f +\** Cached write data is not yet pushed (written) to the server. +.)f +when a write cache lease has almost expired, it will attempt to +extend the lease but is required to push the dirty buffers if extension fails. +A client gets leases by either doing a \fBGetLease RPC\fR or by piggybacking +a \fBGetLease Request\fR onto another RPC. Piggybacking is supported for the +frequent RPCs Getattr, Setattr, Lookup, Readlink, Read, Write and Readdir +in an effort to minimize the number of \fBGetLease RPCs\fR required. +All leases are at the granularity of a file, since all NFS RPCs operate on +individual files and NFS has no intrinsic notion of a file hierarchy. +Directories, symbolic links and file attributes may be read cached but +are not write cached. +The exception here is the attribute file_size, which is updated during cached +writing on the client to reflect a growing file. +.pp +It is the server's responsibility to ensure that consistency is maintained +among the NQNFS clients by disabling client caching whenever a server file +operation would cause inconsistencies. +The possibility of inconsistencies occurs whenever a client has +a write caching lease and any other client, +or local operations on the server, +tries to access the file or when +a modify operation is attempted on a file being read cached by client(s). +At this time, the server sends an \fBeviction notice\fR to all clients holding +the lease and then waits for lease termination. +Lease termination occurs when a \fBvacated the premises\fR message has been +received from all the clients that have signed the lease or when the lease +expires via. timeout. +The message pair \fBeviction notice\fR and \fBvacated the premises\fR roughly +correspond to a Sprite server\(->client callback, but are not implemented as an +actual RPC, to avoid the server waiting indefinitely for a reply from a dead +client. +.pp +Server consistency checking can be viewed as issuing intrinsic leases for a +file operation for the duration of the operation only. For example, the +\fBCreate RPC\fR will get an intrinsic write lease on the directory in which +the file is being created, disabling client read caches for that directory. +.pp +By relegating this responsibility to the server, consistency between the +server and NQNFS clients is maintained when NFS clients are modifying the +file system as well.\** +.(f +\** The NFS clients will continue to be \fIapproximately\fR consistent with +the server. +.)f +.pp +The leases are issued as time intervals to avoid the requirement of time of day +clock synchronization. There are three important time constants known to +the server. The \fBmaximum_lease_term\fR sets an upper bound on lease duration. +The \fBclock_skew\fR is added to all lease terms on the server to correct for +differing clock speeds between the client and server and \fBwrite_slack\fR is +the number of seconds the server is willing to wait for a client with +an expired write caching lease to push dirty writes. +.pp +The server maintains a \fBmodify_revision\fR number for each file. It is +defined as an unsigned quadword integer that is never zero and that must +increase whenever the corresponding file is modified on the server. +It is used +by the client to determine whether or not cached data for the file is +stale. +Generating this value is easier said than done. The current implementation +uses the following technique, which is believed to be adequate. +The high order longword is stored in the ufs inode and is initialized to one +when an inode is first allocated. +The low order longword is stored in main memory only and is initialized to +zero when an inode is read in from disk. +When the file is modified for the first time within a given second of +wall clock time, the high order longword is incremented by one and +the low order longword reset to zero. +For subsequent modifications within the same second of wall clock +time, the low order longword is incremented. If the low order longword wraps +around to zero, the high order longword is incremented again. +Since the high order longword only increments once per second and the inode +is pushed to disk frequently during file modification, this implies +0 \(<= Current\(miDisk \(<= 5. +When the inode is read in from disk, 10 +is added to the high order longword, which ensures that the quadword +is greater than any value it could have had before a crash. +This introduces apparent modifications every time the inode falls out of +the LRU inode cache, but this should only reduce the client caching performance +by a (hopefully) small margin. +.sh 2 "Crash Recovery and other Failure Scenarios" +.pp +The server must maintain the state of all the current leases held by clients. +The nice thing about short term leases is that maximum_lease_term seconds +after the server stops issuing leases, there are no current leases left. +As such, server crash recovery does not require any state recovery. After +rebooting, the server refuses to service any RPCs except for writes until +write_slack seconds after the last lease would have expired\**. +.(f +\** The last lease expiry time may be safely estimated as +"boottime+maximum_lease_term+clock_skew" for machines that cannot store +it in nonvolatile RAM. +.)f +By then, the server would not have any outstanding leases to recover the +state of and the clients have had at least write_slack seconds to push dirty +writes to the server and get the server sync'd up to date. After this, the +server simply services requests in a manner similar to NFS. +In an effort to minimize the effect of "recovery storms" [Baker91], +the server replies \fBtry_again_later\fR to the RPCs it is not +yet ready to service. +.pp +After a client crashes, the server may have to wait for a lease to timeout +before servicing a request if write sharing of a file with a cachable lease +on the client is about to occur. +As for the client, it simply starts up getting any leases it now needs. Any +outstanding leases for that client on the server prior to the crash will either be renewed or expire +via timeout. +.pp +Certain network partitioning failures are more problematic. If a client to +server network connection is severed just before a write caching lease expires, +the client cannot push the dirty writes to the server. After the lease expires +on the server, the server permits other clients to access the file with the +potential of getting stale data. Unfortunately I believe this failure scenario +is intrinsic in any delay write caching scheme unless the server is required to +wait \fBforever\fR for a client to regain contact\**. +.(f +\** Gray and Cheriton avoid this problem by using a \fBwrite through\fR policy. +.)f +Since the write caching lease has expired on the client, +it will sync up with the +server as soon as the network connection has been re-established. +.pp +There is another failure condition that can occur when the server is congested. +The worst case scenario would have the client pushing dirty writes to the server +but a large request queue on the server delays these writes for more than +\fBwrite_slack\fR seconds. It is hoped that a congestion control scheme using +the \fBtry_again_later\fR RPC reply after booting combined with +the following lease termination rule for write caching leases +can minimize the risk of this occurrence. +A write caching lease is only terminated on the server when there are have +been no writes to the file and the server has not been overloaded during +the previous write_slack seconds. The server has not been overloaded +is approximated by a test for sleeping nfsd(s) at the end of the write_slack +period. +.sh 2 "Server Disk Full" +.pp +There is a serious unresolved problem for delayed write caching with respect to +server disk space allocation. +When the disk on the file server is full, delayed write RPCs can fail +due to "out of space". +For NFS, this occurrence results in an error return from the close system +call on the file, since the dirty blocks are pushed on close. +Processes writing important files can check for this error return +to ensure that the file was written successfully. +For NQNFS, the dirty blocks are not pushed on close and as such the client +may not attempt the write RPC until after the process has done the close +which implies no error return from the close. +For the current prototype, +the only solution is to modify programs writing important +file(s) to call fsync and check for an error return from it instead of close. +.sh 2 "Protocol Details" +.pp +The protocol specification is identical to that of NFS [Sun89] except for +the following changes. +.ip \(bu +RPC Information +.(l + Program Number 300105 + Version Number 1 +.)l +.ip \(bu +Readdir_and_Lookup RPC +.(l + struct readdirlookargs { + fhandle file; + nfscookie cookie; + unsigned count; + unsigned duration; + }; + + struct entry { + unsigned cachable; + unsigned duration; + modifyrev rev; + fhandle entry_fh; + nqnfs_fattr entry_attrib; + unsigned fileid; + filename name; + nfscookie cookie; + entry *nextentry; + }; + + union readdirlookres switch (stat status) { + case NFS_OK: + struct { + entry *entries; + bool eof; + } readdirlookok; + default: + void; + }; + + readdirlookres + NQNFSPROC_READDIRLOOK(readdirlookargs) = 18; +.)l +Reads entries in a directory in a manner analogous to the NFSPROC_READDIR RPC +in NFS, but returns the file handle and attributes of each entry as well. +This allows the attribute and lookup caches to be primed. +.ip \(bu +Get Lease RPC +.(l + struct getleaseargs { + fhandle file; + cachetype readwrite; + unsigned duration; + }; + + union getleaseres switch (stat status) { + case NFS_OK: + bool cachable; + unsigned duration; + modifyrev rev; + nqnfs_fattr attributes; + default: + void; + }; + + getleaseres + NQNFSPROC_GETLEASE(getleaseargs) = 19; +.)l +Gets a lease for "file" valid for "duration" seconds from when the lease +was issued on the server\**. +.(f +\** To be safe, the client may only assume that the lease is valid +for ``duration'' seconds from when the RPC request was sent to the server. +.)f +The lease permits client caching if "cachable" is true. +The modify revision level and attributes for the file are also returned. +.ip \(bu +Eviction Message +.(l + void + NQNFSPROC_EVICTED (fhandle) = 21; +.)l +This message is sent from the server to the client. When the client receives +the message, it should flush data associated with the file represented by +"fhandle" from its caches and then send the \fBVacated Message\fR back to +the server. Flushing includes pushing any dirty writes via. write RPCs. +.ip \(bu +Vacated Message +.(l + void + NQNFSPROC_VACATED (fhandle) = 20; +.)l +This message is sent from the client to the server in response to the +\fBEviction Message\fR. See above. +.ip \(bu +Access RPC +.(l + struct accessargs { + fhandle file; + bool read_access; + bool write_access; + bool exec_access; + }; + + stat + NQNFSPROC_ACCESS(accessargs) = 22; +.)l +The access RPC does permission checking on the server for the given type +of access required by the client for the file. +Use of this RPC avoids accessibility problems caused by client->server uid +mapping. +.ip \(bu +Piggybacked Get Lease Request +.pp +The piggybacked get lease request is functionally equivalent to the Get Lease +RPC except that is attached to one of the other NQNFS RPC requests as follows. +A getleaserequest is prepended to all of the request arguments for NQNFS +and a getleaserequestres is inserted in all NFS result structures just after +the "stat" field only if "stat == NFS_OK". +.(l + union getleaserequest switch (cachetype type) { + case NQLREAD: + case NQLWRITE: + unsigned duration; + default: + void; + }; + + union getleaserequestres switch (cachetype type) { + case NQLREAD: + case NQLWRITE: + bool cachable; + unsigned duration; + modifyrev rev; + default: + void; + }; +.)l +The get lease request applies to the file that the attached RPC operates on +and the file attributes remain in the same location as for the NFS RPC reply +structure. +.ip \(bu +Three additional "stat" values +.pp +Three additional values have been added to the enumerated type "stat". +.(l + NQNFS_EXPIRED=500 + NQNFS_TRYLATER=501 + NQNFS_AUTHERR=502 +.)l +The "expired" value indicates that a lease has expired. +The "try later" +value is returned by the server when it wishes the client to retry the +RPC request after a short delay. It is used during crash recovery (Section 2) +and may also be useful for server congestion control. +The "authetication error" value is returned for kerberized mount points to +indicate that there is no cached authentication mapping and a Kerberos ticket +for the principal is required. +.sh 2 "Data Types" +.ip \(bu +cachetype +.(l + enum cachetype { + NQLNONE = 0, + NQLREAD = 1, + NQLWRITE = 2 + }; +.)l +Type of lease requested. NQLNONE is used to indicate no piggybacked lease +request. +.ip \(bu +modifyrev +.(l + typedef unsigned hyper modifyrev; +.)l +The "modifyrev" is an unsigned quadword integer value that is never zero +and increases every time the corresponding file is modified on the server. +.ip \(bu +nqnfs_time +.(l + struct nqnfs_time { + unsigned seconds; + unsigned nano_seconds; + }; +.)l +For NQNFS times are handled at nano second resolution instead of micro second +resolution for NFS. +.ip \(bu +nqnfs_fattr +.(l + struct nqnfs_fattr { + ftype type; + unsigned mode; + unsigned nlink; + unsigned uid; + unsigned gid; + unsigned hyper size; + unsigned blocksize; + unsigned rdev; + unsigned hyper bytes; + unsigned fsid; + unsigned fileid; + nqnfs_time atime; + nqnfs_time mtime; + nqnfs_time ctime; + unsigned flags; + unsigned generation; + modifyrev rev; + }; +.)l +The nqnfs_fattr structure is modified from the NFS fattr so that it stores +the file size as a 64bit quantity and the storage occupied as a 64bit number +of bytes. It also has fields added for the 4.4BSD va_flags and va_gen fields +as well as the file's modify rev level. +.ip \(bu +nqnfs_sattr +.(l + struct nqnfs_sattr { + unsigned mode; + unsigned uid; + unsigned gid; + unsigned hyper size; + nqnfs_time atime; + nqnfs_time mtime; + unsigned flags; + unsigned rdev; + }; +.)l +The nqnfs_sattr structure is modified from the NFS sattr structure in the +same manner as fattr. +.lp +The arguments to several of the NFS RPCs have been modified as well. Mostly, +these are minor changes to use 64bit file offsets or similar. The modified +argument structures follow. +.ip \(bu +Lookup RPC +.(l + struct lookup_diropargs { + unsigned duration; + fhandle dir; + filename name; + }; + + union lookup_diropres switch (stat status) { + case NFS_OK: + struct { + union getleaserequestres lookup_lease; + fhandle file; + nqnfs_fattr attributes; + } lookup_diropok; + default: + void; + }; + +.)l +The additional "duration" argument tells the server to get a lease for the +name being looked up if it is non-zero and the lease is specified +in "lookup_lease". +.ip \(bu +Read RPC +.(l + struct nqnfs_readargs { + fhandle file; + unsigned hyper offset; + unsigned count; + }; +.)l +.ip \(bu +Write RPC +.(l + struct nqnfs_writeargs { + fhandle file; + unsigned hyper offset; + bool append; + nfsdata data; + }; +.)l +The "append" argument is true for apeend only write operations. +.ip \(bu +Get Filesystem Attributes RPC +.(l + union nqnfs_statfsres (stat status) { + case NFS_OK: + struct { + unsigned tsize; + unsigned bsize; + unsigned blocks; + unsigned bfree; + unsigned bavail; + unsigned files; + unsigned files_free; + } info; + default: + void; + }; +.)l +The "files" field is the number of files in the file system and the "files_free" +is the number of additional files that can be created. +.sh 1 "Summary" +.pp +The configuration and tuning of an NFS environment tends to be a bit of a +mystic art, but hopefully this paper along with the man pages and other +reading will be helpful. Good Luck. diff --git a/share/doc/smm/06.nfs/Makefile b/share/doc/smm/06.nfs/Makefile new file mode 100644 index 000000000000..a15d81d5b868 --- /dev/null +++ b/share/doc/smm/06.nfs/Makefile @@ -0,0 +1,5 @@ +VOLUME= smm/06.nfs +SRCS= 0.t 1.t 2.t ref.t +MACROS= -me + +.include <bsd.doc.mk> diff --git a/share/doc/smm/06.nfs/ref.t b/share/doc/smm/06.nfs/ref.t new file mode 100644 index 000000000000..9619e95181ec --- /dev/null +++ b/share/doc/smm/06.nfs/ref.t @@ -0,0 +1,117 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" This document is derived from software contributed to Berkeley by +.\" Rick Macklem at The University of Guelph. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.sh 1 "Bibliography" +.ip [Baker91] 16 +Mary Baker and John Ousterhout, Availability in the Sprite Distributed +File System, In \fIOperating System Review\fR, (25)2, pg. 95-98, +April 1991. +.ip [Baker91a] 16 +Mary Baker, Private Email Communication, May 1991. +.ip [Burrows88] 16 +Michael Burrows, Efficient Data Sharing, Technical Report #153, +Computer Laboratory, University of Cambridge, Dec. 1988. +.ip [Gray89] 16 +Cary G. Gray and David R. Cheriton, Leases: An Efficient Fault-Tolerant +Mechanism for Distributed File Cache Consistency, In \fIProc. of the +Twelfth ACM Symposium on Operating Systems Principals\fR, Litchfield Park, +AZ, Dec. 1989. +.ip [Howard88] 16 +John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, +M. Satyanarayanan, Robert N. Sidebotham and Michael J. West, +Scale and Performance in a Distributed File System, \fIACM Trans. on +Computer Systems\fR, (6)1, pg 51-81, Feb. 1988. +.ip [Juszczak89] 16 +Chet Juszczak, Improving the Performance and Correctness of an NFS Server, +In \fIProc. Winter 1989 USENIX Conference,\fR pg. 53-63, San Diego, CA, January 1989. +.ip [Keith90] 16 +Bruce E. Keith, Perspectives on NFS File Server Performance Characterization, +In \fIProc. Summer 1990 USENIX Conference\fR, pg. 267-277, Anaheim, CA, +June 1990. +.ip [Kent87] 16 +Christopher. A. Kent, \fICache Coherence in Distributed Systems\fR, +Research Report 87/4, +Digital Equipment Corporation Western Research Laboratory, April 1987. +.ip [Kent87a] 16 +Christopher. A. Kent and Jeffrey C. Mogul, +\fIFragmentation Considered Harmful\fR, Research Report 87/3, +Digital Equipment Corporation Western Research Laboratory, Dec. 1987. +.ip [Macklem91] 16 +Rick Macklem, Lessons Learned Tuning the 4.3BSD Reno Implementation of the +NFS Protocol, In \fIProc. Winter USENIX Conference\fR, pg. 53-64, +Dallas, TX, January 1991. +.ip [Nelson88] 16 +Michael N. Nelson, Brent B. Welch, and John K. Ousterhout, Caching in the +Sprite Network File System, \fIACM Transactions on Computer Systems\fR (6)1 +pg. 134-154, February 1988. +.ip [Nowicki89] 16 +Bill Nowicki, Transport Issues in the Network File System, In +\fIComputer Communication Review\fR, pg. 16-20, Vol. 19, Number 2, April 1989. +.ip [Ousterhout90] 16 +John K. Ousterhout, Why Aren't Operating Systems Getting Faster As Fast as +Hardware? In \fIProc. Summer 1990 USENIX Conference\fR, pg. 247-256, Anaheim, +CA, June 1990. +.ip [Pendry93] 16 +Jan-Simon Pendry, 4.4 BSD Automounter Reference Manual, In +\fIsrc/usr.sbin/amd/doc directory of 4.4 BSD distribution tape\fR. +.ip [Reid90] 16 +Jim Reid, N(e)FS: the Protocol is the Problem, In +\fIProc. Summer 1990 UKUUG Conference\fR, +London, England, July 1990. +.ip [Sandberg85] 16 +Russel Sandberg, David Goldberg, Steve Kleiman, Dan Walsh, and Bob Lyon, +Design and Implementation of the Sun Network filesystem, In \fIProc. Summer +1985 USENIX Conference\fR, pages 119-130, Portland, OR, June 1985. +.ip [Schroeder85] 16 +Michael D. Schroeder, David K. Gifford and Roger M. Needham, A Caching +File System For A Programmer's Workstation, In \fIProc. of the Tenth +ACM Symposium on Operating Systems Principals\fR, pg. 25-34, Orcas Island, +WA, Dec. 1985. +.ip [Srinivasan89] 16 +V. Srinivasan and Jeffrey. C. Mogul, \fISpritely NFS: Implementation and +Performance of Cache-Consistency Protocols\fR, Research Report 89/5, +Digital Equipment Corporation Western Research Laboratory, May 1989. +.ip [Steiner88] 16 +Jennifer G. Steiner, Clifford Neuman and Jeffrey I. Schiller, +Kerberos: An Authentication Service for Open Network Systems, In +\fIProc. Winter 1988 USENIX Conference\fR, Dallas, TX, February 1988. +.ip [Stern] 16 +Hal Stern, \fIManaging NFS and NIS\fR, O'Reilly and Associates, +ISBN 0-937175-75-7. +.ip [Sun87] 16 +Sun Microsystems Inc., \fIXDR: External Data Representation Standard\fR, +RFC1014, Network Information Center, SRI International, June 1987. +.ip [Sun88] 16 +Sun Microsystems Inc., \fIRPC: Remote Procedure Call Protocol Specification Version 2\fR, +RFC1057, Network Information Center, SRI International, June 1988. +.ip [Sun89] 16 +Sun Microsystems Inc., \fINFS: Network File System Protocol Specification\fR, +ARPANET Working Group Requests for Comment, DDN Network Information Center, +SRI International, Menlo Park, CA, March 1989, RFC-1094. diff --git a/share/doc/smm/07.lpd/0.t b/share/doc/smm/07.lpd/0.t new file mode 100644 index 000000000000..b1896248e0b3 --- /dev/null +++ b/share/doc/smm/07.lpd/0.t @@ -0,0 +1,62 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.if n .ND +.TL +4.3BSD Line Printer Spooler Manual +.EH 'SMM:7-%''4.3BSD Line Printer Spooler Manual' +.OH '4.3BSD Line Printer Spooler Manual''SMM:7-%' +.AU +Ralph Campbell +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.AB +.FS +* UNIX is a trademark of Bell Laboratories. +.FE +This document describes the structure and installation procedure +for the line printer spooling system +developed for the 4.3BSD version +of the UNIX* operating system. +.de D? +.ie \\n(.$>1 Revised \\$1 \\$2 \\$3 +.el DRAFT of \n(mo/\n(dy/\n(yr +.. +.sp 2 +.LP +.D? June 8, 1993 +.AE +.de IR +\fI\\$1\fP\\$2 +.. +.de DT +.TA 8 16 24 32 40 48 56 64 72 80 +.. diff --git a/share/doc/smm/07.lpd/1.t b/share/doc/smm/07.lpd/1.t new file mode 100644 index 000000000000..9e76d9ac5134 --- /dev/null +++ b/share/doc/smm/07.lpd/1.t @@ -0,0 +1,71 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.NH 1 +Overview +.PP +The line printer system supports: +.IP \(bu 3 +multiple printers, +.IP \(bu 3 +multiple spooling queues, +.IP \(bu 3 +both local and remote +printers, and +.IP \(bu 3 +printers attached via serial lines that require +line initialization such as the baud rate. +.LP +Raster output devices +such as a Varian or Versatec, and laser printers such as an Imagen, +are also supported by the line printer system. +.PP +The line printer system consists mainly of the +following files and commands: +.DS +.TS +l l. +/etc/printcap printer configuration and capability data base +/usr/lib/lpd line printer daemon, does all the real work +/usr/ucb/lpr program to enter a job in a printer queue +/usr/ucb/lpq spooling queue examination program +/usr/ucb/lprm program to delete jobs from a queue +/etc/lpc program to administer printers and spooling queues +/dev/printer socket on which lpd listens +.TE +.DE +The file /etc/printcap is a master data base describing line +printers directly attached to a machine and, also, printers +accessible across a network. The manual page entry +.IR printcap (5) +provides the authoritative definition of +the format of this data base, as well as +specifying default values for important items +such as the directory in which spooling is performed. +This document introduces some of the +information that may be placed +.IR printcap . diff --git a/share/doc/smm/07.lpd/2.t b/share/doc/smm/07.lpd/2.t new file mode 100644 index 000000000000..0f1346c832a8 --- /dev/null +++ b/share/doc/smm/07.lpd/2.t @@ -0,0 +1,135 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.NH 1 +Commands +.NH 2 +lpd \- line printer daemon +.PP +The program +.IR lpd (8), +usually invoked at boot time from the /etc/rc file, acts as +a master server for coordinating and controlling +the spooling queues configured in the printcap file. +When +.I lpd +is started it makes a single pass through the +.I printcap +database restarting any printers that have jobs. +In normal operation +.I lpd +listens for service requests on multiple sockets, +one in the UNIX domain (named ``/dev/printer'') for +local requests, and one in the Internet domain +(under the ``printer'' service specification) +for requests for printer access from off machine; +see \fIsocket\fP\|(2) and \fIservices\fP\|(5) +for more information on sockets and service +specifications, respectively. +.I Lpd +spawns a copy of itself to process the request; the master daemon +continues to listen for new requests. +.PP +Clients communicate with +.I lpd +using a simple transaction oriented protocol. +Authentication of remote clients is done based +on the ``privilege port'' scheme employed by +\fIrshd\fP\|(8C) and \fIrcmd\fP\|(3X). +The following table shows the requests +understood by +.IR lpd . +In each request the first byte indicates the +``meaning'' of the request, followed by the name +of the printer to which it should be applied. Additional +qualifiers may follow, depending on the request. +.DS +.TS +l l. +Request Interpretation +_ +^Aprinter\en check the queue for jobs and print any found +^Bprinter\en receive and queue a job from another machine +^Cprinter [users ...] [jobs ...]\en return short list of current queue state +^Dprinter [users ...] [jobs ...]\en return long list of current queue state +^Eprinter person [users ...] [jobs ...]\en remove jobs from a queue +.TE +.DE +.PP +The \fIlpr\fP\|(1) command +is used by users to enter a print job in a local queue and to notify +the local +.I lpd +that there are new jobs in the spooling area. +.I Lpd +either schedules the job to be printed locally, or if +printing remotely, attempts to forward +the job to the appropriate machine. +If the printer cannot be opened or the destination +machine is unreachable, the job will remain queued until it is +possible to complete the work. +.NH 2 +lpq \- show line printer queue +.PP +The \fIlpq\fP\|(1) +program works recursively backwards displaying the queue of the machine with +the printer and then the queue(s) of the machine(s) that lead to it. +.I Lpq +has two forms of output: in the default, short, format it +gives a single line of output per queued job; in the long +format it shows the list of files, and their sizes, that +comprise a job. +.NH 2 +lprm \- remove jobs from a queue +.PP +The \fIlprm\fP\|(1) command deletes jobs from a spooling +queue. If necessary, \fIlprm\fP will first kill off a +running daemon that is servicing the queue and restart +it after the required files are removed. When removing +jobs destined for a remote printer, \fIlprm\fP acts +similarly to \fIlpq\fP except it first checks locally +for jobs to remove and then +tries to remove files in queues off-machine. +.NH 2 +lpc \- line printer control program +.PP +The +.IR lpc (8) +program is used by the system administrator to control the +operation of the line printer system. +For each line printer configured in /etc/printcap, +.I lpc +may be used to: +.IP \(bu +disable or enable a printer, +.IP \(bu +disable or enable a printer's spooling queue, +.IP \(bu +rearrange the order of jobs in a spooling queue, +.IP \(bu +find the status of printers, and their associated +spooling queues and printer daemons. diff --git a/share/doc/smm/07.lpd/3.t b/share/doc/smm/07.lpd/3.t new file mode 100644 index 000000000000..3fe23791db0b --- /dev/null +++ b/share/doc/smm/07.lpd/3.t @@ -0,0 +1,67 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.NH 1 +Access control +.PP +The printer system maintains protected spooling areas so that +users cannot circumvent printer accounting or +remove files other than their own. +The strategy used to maintain protected +spooling areas is as follows: +.IP \(bu 3 +The spooling area is writable only by a \fIdaemon\fP user +and \fIdaemon\fP group. +.IP \(bu 3 +The \fIlpr\fP program runs set-user-id to \fIroot\fP and +set-group-id to group \fIdaemon\fP. The \fIroot\fP access permits +reading any file required. Accessibility is verified +with an \fIaccess\fP\|(2) call. The group ID +is used in setting up proper ownership of files +in the spooling area for \fIlprm\fP. +.IP \(bu 3 +Control files in a spooling area are made with \fIdaemon\fP +ownership and group ownership \fIdaemon\fP. Their mode is 0660. +This insures control files are not modified by a user +and that no user can remove files except through \fIlprm\fP. +.IP \(bu 3 +The spooling programs, +\fIlpd\fP, \fIlpq\fP, and \fIlprm\fP run set-user-id to \fIroot\fP +and set-group-id to group \fIdaemon\fP to access spool files and printers. +.IP \(bu 3 +The printer server, \fIlpd\fP, +uses the same verification procedures as \fIrshd\fP\|(8C) +in authenticating remote clients. The host on which a client +resides must be present in the file /etc/hosts.equiv or /etc/hosts.lpd and +the request message must come from a reserved port number. +.PP +In practice, none of \fIlpd\fP, \fIlpq\fP, or +\fIlprm\fP would have to run as user \fIroot\fP if remote +spooling were not supported. In previous incarnations of +the printer system \fIlpd\fP ran set-user-id to \fIdaemon\fP, +set-group-id to group \fIspooling\fP, and \fIlpq\fP and \fIlprm\fP ran +set-group-id to group \fIspooling\fP. diff --git a/share/doc/smm/07.lpd/4.t b/share/doc/smm/07.lpd/4.t new file mode 100644 index 000000000000..86294043725e --- /dev/null +++ b/share/doc/smm/07.lpd/4.t @@ -0,0 +1,200 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.NH 1 +Setting up +.PP +The 4.3BSD release comes with the necessary programs +installed and with the default line printer queue +created. If the system must be modified, the +makefile in the directory /usr/src/usr.lib/lpr +should be used in recompiling and reinstalling +the necessary programs. +.PP +The real work in setting up is to create the +.I printcap +file and any printer filters for printers not supported +in the distribution system. +.NH 2 +Creating a printcap file +.PP +The +.I printcap +database contains one or more entries per printer. +A printer should have a separate spooling directory; +otherwise, jobs will be printed on +different printers depending on which printer daemon starts first. +This section describes how to create entries for printers that do not +conform to the default printer description (an LP-11 style interface to a +standard, band printer). +.NH 3 +Printers on serial lines +.PP +When a printer is connected via a serial communication line +it must have the proper baud rate and terminal modes set. +The following example is for a DecWriter III printer connected +locally via a 1200 baud serial line. +.DS +.DT +lp|LA-180 DecWriter III:\e + :lp=/dev/lp:br#1200:fs#06320:\e + :tr=\ef:of=/usr/lib/lpf:lf=/usr/adm/lpd-errs: +.DE +The +.B lp +entry specifies the file name to open for output. Here it could +be left out since ``/dev/lp'' is the default. +The +.B br +entry sets the baud rate for the tty line and the +.B fs +entry sets CRMOD, no parity, and XTABS (see \fItty\fP\|(4)). +The +.B tr +entry indicates that a form-feed should be printed when the queue +empties so the paper can be torn off without turning the printer off-line and +pressing form feed. +The +.B of +entry specifies the filter program +.I lpf +should be used for printing the files; +more will be said about filters later. +The last entry causes errors +to be written to the file ``/usr/adm/lpd-errs'' +instead of the console. Most errors from \fIlpd\fP are logged using +\fIsyslogd\fP\|(8) and will not be logged in the specified file. The +filters should use \fIsyslogd\fP to report errors; only those that +write to standard error output will end up with errors in the \fBlf\fP file. +(Occasionally errors sent to standard error output have not appeared +in the log file; the use of \fIsyslogd\fP is highly recommended.) +.NH 3 +Remote printers +.PP +Printers that reside on remote hosts should have an empty +.B lp +entry. +For example, the following printcap entry would send output to the printer +named ``lp'' on the machine ``ucbvax''. +.DS +.DT +lp|default line printer:\e + :lp=:rm=ucbvax:rp=lp:sd=/usr/spool/vaxlpd: +.DE +The +.B rm +entry is the name of the remote machine to connect to; this name must +be a known host name for a machine on the network. +The +.B rp +capability indicates +the name of the printer on the remote machine is ``lp''; +here it could be left out since this is the default value. +The +.B sd +entry specifies ``/usr/spool/vaxlpd'' +as the spooling directory instead of the +default value of ``/usr/spool/lpd''. +.NH 2 +Output filters +.PP +Filters are used to handle device dependencies and to +do accounting functions. The output filtering of +.B of +is used when accounting is +not being done or when all text data must be passed through a filter. +It is not intended to do accounting since it is started only once, +all text files are filtered through it, and no provision is made for passing +owners' login name, identifying the beginning and ending of jobs, etc. +The other filters (if specified) are started for each file +printed and do accounting if there is an +.B af +entry. +If entries for both +.B of +and other filters are specified, +the output filter is used only to print the banner page; +it is then stopped to allow other filters access to the printer. +An example of a printer that requires output filters +is the Benson-Varian. +.DS +.DT +va|varian|Benson-Varian:\e + :lp=/dev/va0:sd=/usr/spool/vad:of=/usr/lib/vpf:\e + :tf=/usr/lib/rvcat:mx#2000:pl#58:px=2112:py=1700:tr=\ef: +.DE +The +.B tf +entry specifies ``/usr/lib/rvcat'' as the filter to be +used in printing \fItroff\fP\|(1) output. +This filter is needed to set the device into print mode +for text, and plot mode for printing +.I troff +files and raster images (see \fIva\fP\|(4V)). +Note that the page length is set to 58 lines by the +.B pl +entry for 8.5" by 11" fan-fold paper. +To enable accounting, the varian entry would be +augmented with an +.B af +filter as shown below. +.DS +.DT +va|varian|Benson-Varian:\e + :lp=/dev/va0:sd=/usr/spool/vad:of=/usr/lib/vpf:\e + :if=/usr/lib/vpf:tf=/usr/lib/rvcat:af=/usr/adm/vaacct:\e + :mx#2000:pl#58:px=2112:py=1700:tr=\ef: +.DE +.NH 2 +Access Control +.PP +Local access to printer queues is controlled with the +.B rg +printcap entry. +.DS + :rg=lprgroup: +.DE +Users must be in the group +.I lprgroup +to submit jobs to the specified printer. +The default is to allow all users access. +Note that once the files are in the local queue, they can be printed +locally or forwarded to another host depending on the configuration. +.PP +Remote access is controlled by listing the hosts in either the file +/etc/hosts.equiv or /etc/hosts.lpd, one host per line. Note that +.IR rsh (1) +and +.IR rlogin (1) +use /etc/hosts.equiv to determine which hosts are equivalent for allowing logins +without passwords. The file /etc/hosts.lpd is only used to control +which hosts have line printer access. +Remote access can be further restricted to only allow remote users with accounts +on the local host to print jobs by using the \fBrs\fP printcap entry. +.DS + :rs: +.DE diff --git a/share/doc/smm/07.lpd/5.t b/share/doc/smm/07.lpd/5.t new file mode 100644 index 000000000000..c4d9ad23083d --- /dev/null +++ b/share/doc/smm/07.lpd/5.t @@ -0,0 +1,110 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.NH 1 +Output filter specifications +.PP +The filters supplied with 4.3BSD +handle printing and accounting for most common +line printers, the Benson-Varian, the wide (36") and +narrow (11") Versatec printer/plotters. For other devices or accounting +methods, it may be necessary to create a new filter. +.PP +Filters are spawned by \fIlpd\fP +with their standard input the data to be printed, and standard output +the printer. The standard error is attached to the +.B lf +file for logging errors or \fIsyslogd\fP may be used for logging errors. +A filter must return a 0 exit +code if there were no errors, 1 if the job should be reprinted, +and 2 if the job should be thrown away. +When \fIlprm\fP +sends a kill signal to the \fIlpd\fP process controlling +printing, it sends a SIGINT signal +to all filters and descendents of filters. +This signal can be trapped by filters that need +to do cleanup operations such as +deleting temporary files. +.PP +Arguments passed to a filter depend on its type. +The +.B of +filter is called with the following arguments. +.DS +\fIfilter\fP \fB\-w\fPwidth \fB\-l\fPlength +.DE +The \fIwidth\fP and \fIlength\fP values come from the +.B pw +and +.B pl +entries in the printcap database. +The +.B if +filter is passed the following parameters. +.DS +\fIfilter\fP [\|\fB\-c\fP\|] \fB\-w\fPwidth \fB\-l\fPlength \fB\-i\fPindent \fB\-n\fP login \fB\-h\fP host accounting_file +.DE +The +.B \-c +flag is optional, and only supplied when control characters +are to be passed uninterpreted to the printer (when using the +.B \-l +option of +.I lpr +to print the file). +The +.B \-w +and +.B \-l +parameters are the same as for the +.B of +filter. +The +.B \-n +and +.B \-h +parameters specify the login name and host name of the job owner. +The last argument is the name of the accounting file from +.IR printcap . +.PP +All other filters are called with the following arguments: +.DS +\fIfilter\fP \fB\-x\fPwidth \fB\-y\fPlength \fB\-n\fP login \fB\-h\fP host accounting_file +.DE +The +.B \-x +and +.B \-y +options specify the horizontal and vertical page +size in pixels (from the +.B px +and +.B py +entries in the printcap file). +The rest of the arguments are the same as for the +.B if +filter. diff --git a/share/doc/smm/07.lpd/6.t b/share/doc/smm/07.lpd/6.t new file mode 100644 index 000000000000..fed3f2dfabaf --- /dev/null +++ b/share/doc/smm/07.lpd/6.t @@ -0,0 +1,88 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.NH 1 +Line printer Administration +.PP +The +.I lpc +program provides local control over line printer activity. +The major commands and their intended use will be described. +The command format and remaining commands are described in +.IR lpc (8). +.LP +\fBabort\fP and \fBstart\fP +.IP +.I Abort +terminates an active spooling daemon on the local host immediately and +then disables printing (preventing new daemons from being started by +.IR lpr ). +This is normally used to forcibly restart a hung line printer daemon +(i.e., \fIlpq\fP reports that there is a daemon present but nothing is +happening). It does not remove any jobs from the queue +(use the \fIlprm\fP command instead). +.I Start +enables printing and requests \fIlpd\fP to start printing jobs. +.LP +\fBenable\fP and \fBdisable\fP +.IP +\fIEnable\fP and \fIdisable\fP allow spooling in the local queue to be +turned on/off. +This will allow/prevent +.I lpr +from putting new jobs in the spool queue. It is frequently convenient +to turn spooling off while testing new line printer filters since the +.I root +user can still use +.I lpr +to put jobs in the queue but no one else can. +The other main use is to prevent users from putting jobs in the queue +when the printer is expected to be unavailable for a long time. +.LP +\fBrestart\fP +.IP +.I Restart +allows ordinary users to restart printer daemons when +.I lpq +reports that there is no daemon present. +.LP +\fBstop\fP +.IP +.I Stop +halts a spooling daemon after the current job completes; +this also disables printing. This is a clean way to shutdown a +printer to do maintenance, etc. Note that users can still enter jobs in a +spool queue while a printer is +.IR stopped . +.LP +\fBtopq\fP +.IP +.I Topq +places jobs at the top of a printer queue. This can be used +to reorder high priority jobs since +.I lpr +only provides first-come-first-serve ordering of jobs. diff --git a/share/doc/smm/07.lpd/7.t b/share/doc/smm/07.lpd/7.t new file mode 100644 index 000000000000..d7b668192930 --- /dev/null +++ b/share/doc/smm/07.lpd/7.t @@ -0,0 +1,220 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.NH 1 +Troubleshooting +.PP +There are several messages that may be generated by the +the line printer system. This section +categorizes the most common and explains the cause +for their generation. Where the message implies a failure, +directions are given to remedy the problem. +.PP +In the examples below, the name +.I printer +is the name of the printer from the +.I printcap +database. +.NH 2 +LPR +.SH +lpr: \fIprinter\fP\|: unknown printer +.IP +The +.I printer +was not found in the +.I printcap +database. Usually this is a typing mistake; however, it may indicate +a missing or incorrect entry in the /etc/printcap file. +.SH +lpr: \fIprinter\fP\|: jobs queued, but cannot start daemon. +.IP +The connection to +.I lpd +on the local machine failed. +This usually means the printer server started at +boot time has died or is hung. Check the local socket +/dev/printer to be sure it still exists (if it does not exist, +there is no +.I lpd +process running). +Usually it is enough to get a super-user to type the following to +restart +.IR lpd . +.DS +% /usr/lib/lpd +.DE +You can also check the state of the master printer daemon with the following. +.DS +% ps l`cat /usr/spool/lpd.lock` +.DE +.IP +Another possibility is that the +.I lpr +program is not set-user-id to \fIroot\fP, set-group-id to group \fIdaemon\fP. +This can be checked with +.DS +% ls \-lg /usr/ucb/lpr +.DE +.SH +lpr: \fIprinter\fP\|: printer queue is disabled +.IP +This means the queue was turned off with +.DS +% lpc disable \fIprinter\fP +.DE +to prevent +.I lpr +from putting files in the queue. This is normally +done by the system manager when a printer is +going to be down for a long time. The +printer can be turned back on by a super-user with +.IR lpc . +.NH 2 +LPQ +.SH +waiting for \fIprinter\fP to become ready (offline ?) +.IP +The printer device could not be opened by the daemon. +This can happen for several reasons, +the most common is that the printer is turned off-line. +This message can also be generated if the printer is out +of paper, the paper is jammed, etc. +The actual reason is dependent on the meaning +of error codes returned by system device driver. +Not all printers supply enough information +to distinguish when a printer is off-line or having +trouble (e.g. a printer connected through a serial line). +Another possible cause of this message is +some other process, such as an output filter, +has an exclusive open on the device. Your only recourse +here is to kill off the offending program(s) and +restart the printer with +.IR lpc . +.SH +\fIprinter\fP is ready and printing +.IP +The +.I lpq +program checks to see if a daemon process exists for +.I printer +and prints the file \fIstatus\fP located in the spooling directory. +If the daemon is hung, a super user can use +.I lpc +to abort the current daemon and start a new one. +.SH +waiting for \fIhost\fP to come up +.IP +This implies there is a daemon trying to connect to the remote +machine named +.I host +to send the files in the local queue. +If the remote machine is up, +.I lpd +on the remote machine is probably dead or +hung and should be restarted as mentioned for +.IR lpr . +.SH +sending to \fIhost\fP +.IP +The files should be in the process of being transferred to the remote +.IR host . +If not, the local daemon should be aborted and started with +.IR lpc . +.SH +Warning: \fIprinter\fP is down +.IP +The printer has been marked as being unavailable with +.IR lpc . +.SH +Warning: no daemon present +.IP +The \fIlpd\fP process overseeing +the spooling queue, as specified in the ``lock'' file +in that directory, does not exist. This normally occurs +only when the daemon has unexpectedly died. +The error log file for the printer and the \fIsyslogd\fP logs +should be checked for a +diagnostic from the deceased process. +To restart an \fIlpd\fP, use +.DS +% lpc restart \fIprinter\fP +.DE +.SH +no space on remote; waiting for queue to drain +.IP +This implies that there is insufficient disk space on the remote. +If the file is large enough, there will never be enough space on +the remote (even after the queue on the remote is empty). The solution here +is to move the spooling queue or make more free space on the remote. +.NH 2 +LPRM +.SH +lprm: \fIprinter\fP\|: cannot restart printer daemon +.IP +This case is the same as when +.I lpr +prints that the daemon cannot be started. +.NH 2 +LPD +.PP +The +.I lpd +program can log many different messages using \fIsyslogd\fP\|(8). +Most of these messages are about files that can not +be opened and usually imply that the +.I printcap +file or the protection modes of the files are +incorrect. Files may also be inaccessible if people +manually manipulate the line printer system (i.e. they +bypass the +.I lpr +program). +.PP +In addition to messages generated by +.IR lpd , +any of the filters that +.I lpd +spawns may log messages using \fIsyslogd\fP or to the error log file +(the file specified in the \fBlf\fP entry in \fIprintcap\fP\|). +.NH 2 +LPC +.PP +.SH +couldn't start printer +.IP +This case is the same as when +.I lpr +reports that the daemon cannot be started. +.SH +cannot examine spool directory +.IP +Error messages beginning with ``cannot ...'' are usually because of +incorrect ownership or protection mode of the lock file, spooling +directory or the +.I lpc +program. diff --git a/share/doc/smm/07.lpd/Makefile b/share/doc/smm/07.lpd/Makefile new file mode 100644 index 000000000000..a92ce0bd3bbb --- /dev/null +++ b/share/doc/smm/07.lpd/Makefile @@ -0,0 +1,6 @@ +VOLUME= smm/07.lpd +SRCS= 0.t 1.t 2.t 3.t 4.t 5.t 6.t 7.t +MACROS= -ms +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/smm/07.lpd/spell.ok b/share/doc/smm/07.lpd/spell.ok new file mode 100644 index 000000000000..bf31319943d6 --- /dev/null +++ b/share/doc/smm/07.lpd/spell.ok @@ -0,0 +1,70 @@ +Aprinter +Bprinter +CRMOD +Cprinter +DecWriter +Dprinter +Eprinter +LPC +LPD +Lpd +Manual''SMM:5 +SIGINT +SMM:5 +Topq +XTABS +adm +af +br +daemon +daemons +dev +f:of +fs +hosts.equiv +hosts.lpd +lf +lg +lib +lp:br +lp:sd +lpc +lpd +lpd.lock +lpf +lpf:lf +lprgroup +makefile +mx +offline +pl +printcap +pw +py +rc +rcmd +rg +rlogin +rp +rs +rsh +rshd +rvcat +rvcat:af +rvcat:mx +sd +src +syslogd +tf +topq +ucb +ucbvax +ucbvax:rp +usr.lib +va0:sd +vaacct +vad:of +varian +vaxlpd +vpf +vpf:tf diff --git a/share/doc/smm/08.sendmailop/Makefile b/share/doc/smm/08.sendmailop/Makefile new file mode 100644 index 000000000000..ba0eb96de955 --- /dev/null +++ b/share/doc/smm/08.sendmailop/Makefile @@ -0,0 +1,8 @@ +VOLUME= smm/08.sendmailop +SRCS= op.me +MACROS= -me +USE_PIC= +USE_EQN= +SRCDIR= ${SRCTOP}/contrib/sendmail/doc/op + +.include <bsd.doc.mk> diff --git a/share/doc/smm/11.timedop/Makefile b/share/doc/smm/11.timedop/Makefile new file mode 100644 index 000000000000..a78b1204ca47 --- /dev/null +++ b/share/doc/smm/11.timedop/Makefile @@ -0,0 +1,5 @@ +VOLUME= smm/11.timedop +SRCS= timed.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/smm/11.timedop/timed.ms b/share/doc/smm/11.timedop/timed.ms new file mode 100644 index 000000000000..6d00adf288b0 --- /dev/null +++ b/share/doc/smm/11.timedop/timed.ms @@ -0,0 +1,273 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.TL +Timed Installation and Operation Guide +.AU +Riccardo Gusella, Stefano Zatti, James M. Bloom +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.AU +Kirk Smith +.AI +Engineering Computer Network +Department of Electrical Engineering +Purdue University +West Lafayette, IN 47906 +.FS +This work was sponsored by the Defense Advanced Research Projects Agency +(DoD), monitored by the Naval Electronics Systems +Command under contract No. N00039-84-C-0089, and by the CSELT +Corporation of Italy. +The views and conclusions contained in this document are those of the +authors and should not be interpreted as representing official policies, +either expressed or implied, of the Defense Research Projects Agency, +of the US Government, or of CSELT. +.FE +.LP +.EH 'SMM:11-%''Timed Installation and Operation' +.OH 'Timed Installation and Operation''SMM:11-%' +.SH +Introduction +.PP +The clock synchronization service for +the UNIX 4.3BSD operating system is composed of a collection of +time daemons (\fItimed\fP) running on the machines in a local +area network. +The algorithms implemented by the service is based on a master-slave scheme. +The time daemons communicate with each other using the +\fITime Synchronization Protocol\fP (TSP) which +is built on the DARPA UDP protocol and described in detail in [4]. +.PP +A time daemon has a twofold function. +First, it supports the synchronization of the clocks +of the various hosts in a local area network. +Second, it starts (or takes part in) the election that occurs +among slave time daemons when, for any reason, the master disappears. +The synchronization mechanism and the election procedure +employed by the program \fItimed\fP are described +in other documents [1,2,3]. +The next paragraphs are a brief overview of how the time daemon works. +This document is mainly concerned with the administrative and technical +issues of running \fItimed\fP at a particular site. +.PP +A \fImaster time daemon\fP measures the time +differences between the clock of the machine on which it +is running and those of all other machines. +The master computes the \fInetwork time\fP as the average of the +times provided by nonfaulty clocks.\** +.FS +A clock is considered to be faulty when its value +is more than a small specified +interval apart from the majority of the clocks +of the other machines [1,2]. +.FE +It then sends to each \fIslave time daemon\fP the +correction that should be performed on the clock of its machine. +This process is repeated periodically. +Since the correction is expressed as a time difference rather than an +absolute time, transmission delays do not interfere with +the accuracy of the synchronization. +When a machine comes up and joins the network, +it starts a slave time daemon which +will ask the master for the correct time and will reset the machine's clock +before any user activity can begin. +The time daemons are able to maintain a single network time in spite of +the drift of clocks away from each other. +The present implementation keeps processor clocks synchronized +within 20 milliseconds. +.PP +To ensure that the service provided is continuous and reliable, +it is necessary to implement an election algorithm to elect a +new master should the machine running the current master crash, the master +terminate (for example, because of a run-time error), or +the network be partitioned. +Under our algorithm, slaves are able to realize when the master has +stopped functioning and to elect a new master from among themselves. +It is important to note that, since the failure of the master results +only in a gradual divergence of clock values, the election +need not occur immediately. +.PP +The machines that are gateways between distinct local area +networks require particular care. +A time daemon on such machines may act as a \fIsubmaster\fP. +This artifact depends on the current inability of +transmission protocols to broadcast a message on a network +other than the one to which the broadcasting machine is connected. +The submaster appears as a slave on one network, and as a master +on one or more of the other networks to which it is connected. +.PP +A submaster classifies each network as one of three types. +A \fIslave network\fP is a network on which the submaster acts as a slave. +There can only be one slave network. +A \fImaster network\fP is a network on which the submaster acts as a master. +An \fIignored network\fP is any other network which already has a valid master. +The submaster tries periodically to become master on an ignored +network, but gives up immediately if a master already exists. +.SH +Guidelines +.PP +While the synchronization algorithm is quite general, the election +one, requiring a broadcast mechanism, puts constraints on +the kind of network on which time daemons can run. +The time daemon will only work on networks with broadcast capability +augmented with point-to-point links. +Machines that are only connected to point-to-point, +non-broadcast networks may not use the time daemon. +.PP +If we exclude submasters, there will normally be, at most, one master time +daemon in a local area internetwork. +During an election, only one of the slave time daemons +will become the new master. +However, because of the characteristics of its machine, +a slave can be prevented from becoming the master. +Therefore, a subset of machines must be designated as potential +master time daemons. +A master time daemon will require CPU resources +proportional to the number of slaves, in general, more than +a slave time daemon, so it may be advisable to limit master time +daemons to machines with more powerful processors or lighter loads. +Also, machines with inaccurate clocks should not be used as masters. +This is a purely administrative decision: an organization may +well allow all of its machines to run master time daemons. +.PP +At the administrative level, a time daemon on a machine +with multiple network interfaces, may be told to ignore all +but one network or to ignore one network. +This is done with the \fI\-n network\fP and \fI\-i network\fP +options respectively at start-up time. +Typically, the time daemon would be instructed to ignore all but +the networks belonging to the local administrative control. +.PP +There are some limitations to the current +implementation of the time daemon. +It is expected that these limitations will be removed in future releases. +The constant NHOSTS in /usr/src/etc/timed/globals.h limits the +maximum number of machines that may be directly controlled by one +master time daemon. +The current maximum is 29 (NHOSTS \- 1). +The constant must be changed and the program recompiled if a site wishes to +run \fItimed\fP on a larger (inter)network. +.PP +In addition, there is a \fIpathological situation\fP to +be avoided at all costs, that might occur when +time daemons run on multiply-connected local area networks. +In this case, as we have seen, time daemons running on gateway machines +will be submasters and they will act on some of those +networks as master time daemons. +Consider machines A and B that are both gateways between +networks X and Y. +If time daemons were started on both A and B without constraints, it would be +possible for submaster time daemon A to be a slave on network X +and the master on network Y, while submaster time daemon B is a slave on +network Y and the master on network X. +This \fIloop\fP of master time daemons will not function properly +or guarantee a unique time on both networks, and will cause +the submasters to use large amounts of system resources in the form +of network bandwidth and CPU time. +In fact, this kind of \fIloop\fP can also be generated with more +than two master time daemons, +when several local area networks are interconnected. +.SH +Installation +.PP +In order to start the time daemon on a given machine, +the following lines should be +added to the \fIlocal daemons\fP section in the file \fI/etc/rc.local\fP: +.sp 2 +.in 1i +.nf +if [ -f /etc/timed ]; then + /etc/timed \fIflags\fP & echo -n ' timed' >/dev/console +fi +.fi +.in -1i +.sp +.LP +In any case, they must appear after the network +is configured via ifconfig(8). +.PP +Also, the file \fI/etc/services\fP should contain the following +line: +.sp 2 +.ti 1i +timed 525/udp timeserver +.sp +.LP +The \fIflags\fP are: +.IP "-n network" 13 +to consider the named network. +.IP "-i network" +to ignore the named network. +.IP -t +to place tracing information in \fI/usr/adm/timed.log\fP. +.IP -M +to allow this time daemon to become a master. +A time daemon run without this option will be forced in the state of +slave during an election. +.SH +Daily Operation +.PP +\fITimedc(8)\fP is used to control the operation of the time daemon. +It may be used to: +.IP \(bu +measure the differences between machines' clocks, +.IP \(bu +find the location where the master \fItimed\fP is running, +.IP \(bu +cause election timers on several machines to expire at the same time, +.IP \(bu +enable or disable tracing of messages received by \fItimed\fP. +.LP +See the manual page on \fItimed\fP\|(8) and \fItimedc\fP\|(8) +for more detailed information. +.PP +The \fIdate(1)\fP command can be used to set the network date. +In order to set the time on a single machine, the \fI-n\fP flag +can be given to date(1). +.bp +.SH +References +.IP 1. +R. Gusella and S. Zatti, +\fITEMPO: A Network Time Controller for Distributed Berkeley UNIX System\fP, +USENIX Summer Conference Proceedings, Salt Lake City, June 1984. +.IP 2. +R. Gusella and S. Zatti, \fIClock Synchronization in a Local Area Network\fP, +University of California, Berkeley, Technical Report, \fIto appear\fP. +.IP 3. +R. Gusella and S. Zatti, +\fIAn Election Algorithm for a Distributed Clock Synchronization Program\fP, +University of California, Berkeley, CS Technical Report #275, Dec. 1985. +.IP 4. +R. Gusella and S. Zatti, +\fIThe Berkeley UNIX 4.3BSD Time Synchronization Protocol\fP, +UNIX Programmer's Manual, 4.3 Berkeley Software Distribution, Volume 2c. diff --git a/share/doc/smm/12.timed/Makefile b/share/doc/smm/12.timed/Makefile new file mode 100644 index 000000000000..e8498e8ccc90 --- /dev/null +++ b/share/doc/smm/12.timed/Makefile @@ -0,0 +1,8 @@ +VOLUME= smm/12.timed +SRCS= timed.ms +EXTRA= date loop time unused +MACROS= -ms +USE_SOELIM= +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/smm/12.timed/date b/share/doc/smm/12.timed/date new file mode 100644 index 000000000000..fec3dba0253f --- /dev/null +++ b/share/doc/smm/12.timed/date @@ -0,0 +1,47 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ft B +.TS +center; +ce | ce | ce | ce +| c | c | c | s | +| c s s s |. +Byte 1 Byte 2 Byte 3 Byte 4 += +Type Version No. Sequence No. +_ +Seconds of Time to Set +_ +Microseconds of Time to Set +_ +Machine Name +_ +\&. . . +_ +.TE +.ft R diff --git a/share/doc/smm/12.timed/loop b/share/doc/smm/12.timed/loop new file mode 100644 index 000000000000..28e2874de228 --- /dev/null +++ b/share/doc/smm/12.timed/loop @@ -0,0 +1,48 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ft B +.TS +center; +ce | ce | ce | ce +| c | c | c | s | +| c | c s s | +| c s s s |. +Byte 1 Byte 2 Byte 3 Byte 4 += +Type Version No. Sequence No. +_ +Hop Count ( unused ) +_ +( unused ) +_ +Machine Name +_ +\&. . . +_ +.TE +.ft R diff --git a/share/doc/smm/12.timed/spell.ok b/share/doc/smm/12.timed/spell.ok new file mode 100644 index 000000000000..8ecfe15d8528 --- /dev/null +++ b/share/doc/smm/12.timed/spell.ok @@ -0,0 +1,34 @@ +ACK +ADJTIME +Adjtime +CS +CSELT +Candidature +DATEACK +DoD +Gusella +MASTERACK +MASTERREQ +MASTERUP +MSITE +MSITEREQ +Protocol''SMM:22 +Riccardo +SETDATE +SETDATEREQ +SETTIME +SLAVEUP +SMM:22 +Stefano +TRACEOFF +TRACEON +TSP +Timedc +UDP +USENIX +Zatti +candidature +ce +daemon +daemons +timedc diff --git a/share/doc/smm/12.timed/time b/share/doc/smm/12.timed/time new file mode 100644 index 000000000000..ba9fcba7a06d --- /dev/null +++ b/share/doc/smm/12.timed/time @@ -0,0 +1,47 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ft B +.TS +center; +ce | ce | ce | ce +| c | c | c | s | +| c s s s |. +Byte 1 Byte 2 Byte 3 Byte 4 += +Type Version No. Sequence No. +_ +Seconds of Adjustment +_ +Microseconds of Adjustment +_ +Machine Name +_ +\&. . . +_ +.TE +.ft R diff --git a/share/doc/smm/12.timed/timed.ms b/share/doc/smm/12.timed/timed.ms new file mode 100644 index 000000000000..28bcdbb9c0cc --- /dev/null +++ b/share/doc/smm/12.timed/timed.ms @@ -0,0 +1,455 @@ +.\" +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.TL +The Berkeley +.UX +.br +Time Synchronization Protocol +.AU +Riccardo Gusella, Stefano Zatti, and James M. Bloom +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.FS +This work was sponsored by the Defense Advanced Research Projects Agency +(DoD), monitored by the Naval Electronics Systems +Command under contract No. N00039-84-C-0089, and by the Italian CSELT +Corporation. +The views and conclusions contained in this document are those of the +authors and should not be interpreted as representing official policies, +either expressed or implied, of the Defense Research Projects Agency, +of the US Government, or of CSELT. +.FE +.LP +.OH 'The Berkeley UNIX Time Synchronization Protocol''SMM:12-%' +.EH 'SMM:12-%''The Berkeley UNIX Time Synchronization Protocol' +.SH +Introduction +.PP +The Time Synchronization Protocol (TSP) +has been designed for specific use by the program \fItimed\fP, +a local area network clock synchronizer for +the UNIX 4.3BSD operating +system. +Timed is built on the DARPA UDP protocol [4] and +is based on a master slave scheme. +.PP +TSP serves a dual purpose. +First, it supports messages for the synchronization of the clocks +of the various hosts in a local area network. +Second, it supports messages for the election that occurs +among slave time daemons when, for any reason, the master disappears. +The synchronization mechanism and the election procedure +employed by the program timed are described +in other documents [1,2,3]. +.PP +Briefly, the synchronization software, which works in a +local area network, consists of a collection of \fItime daemons\fP +(one per machine) and is based on a master-slave +structure. +The present implementation keeps processor clocks synchronized +within 20 milliseconds. +A \fImaster time daemon\fP measures the time +difference between the clock of the machine on which it +is running and those of all other machines. The current implementation +uses ICMP \fITime Stamp Requests\fP [5] to measure the clock difference +between machines. +The master computes the \fInetwork time\fP as the average of the +times provided by nonfaulty clocks.\** +.FS +A clock is considered to be faulty when its value +is more than a small specified +interval apart from the majority of the clocks +of the machines on the same network. +See [1,2] for more details. +.FE +It then sends to each \fIslave time daemon\fP the +correction that should be performed on the clock of its machine. +This process is repeated periodically. +Since the correction is expressed as a time difference rather than an +absolute time, transmission delays do not interfere with synchronization. +When a machine comes up and joins the network, +it starts a slave time daemon, which +will ask the master for the correct time and will reset the machine's clock +before any user activity can begin. +The time daemons therefore maintain a single network time in spite of +the drift of clocks away from each other. +.PP +Additionally, a time daemon on gateway machines may run as +a \fIsubmaster\fP. +A submaster time daemon functions as a slave on one network that +already has a master and as master on other networks. +In addition, a submaster is responsible for propagating broadcast +packets from one network to the other. +.PP +To ensure that service provided is continuous and reliable, +it is necessary to implement an election algorithm that will elect a +new master should the machine running the current master crash, the master +terminate (for example, because of a run-time error), or the network be +partitioned. +Under our algorithm, slaves are able to realize when the master has +stopped functioning and to elect a new master from among themselves. +It is important to note that since the failure of the master results +only in a gradual divergence of clock values, the election +need not occur immediately. +.PP +All the communication occurring among time daemons uses the TSP +protocol. +While some messages need not be sent in a reliable way, +most communication in TSP requires reliability not provided by the underlying +protocol. +Reliability is achieved by the use of acknowledgements, sequence numbers, and +retransmission when message losses occur. +When a message that requires acknowledgment is not acknowledged after +multiple attempts, +the time daemon that has sent the message will assume that the +addressee is down. +This document will not describe the details of how reliability is +implemented, but will only point out when +a message type requires a reliable transport mechanism. +.PP +The message format in TSP is the same for all message types; +however, in some instances, one or more fields are not used. +The next section describes the message format. +The following sections describe +in detail the different message types, their use and the contents +of each field. NOTE: The message format is likely to change in +future versions of timed. +.sp 2 +.SH +Message Format +.PP +All fields are based upon 8-bit bytes. Fields should be sent in +network byte order if they are more than one byte long. +The structure of a TSP message is the following: +.IP 1) +A one byte message type. +.IP 2) +A one byte version number, specifying the protocol version which the +message uses. +.IP 3) +A two byte sequence number to be used for recognizing duplicate messages +that occur when messages are retransmitted. +.IP 4) +Eight bytes of packet specific data. This field contains two 4 byte time +values, a one byte hop count, or may be unused depending on the type +of the packet. +.IP 5) +A zero-terminated string of up to 256 \s-2ASCII\s+2 characters with the name of +the machine sending the message. +.PP +The following charts describe the message types, +show their fields, and explain their usages. +For the purpose of the following discussion, a time daemon can +be considered to be in +one of three states: slave, master, or candidate for election to master. +Also, the term \fIbroadcast\fP refers to +the sending of a message to all active time daemons. +.sp 1 +.SH +Adjtime Message +.so time +.LP +Type: TSP_ADJTIME (1) +.sp 1 +.PP +The master sends this message to a slave to communicate +the difference between +the clock of the slave and +the network time the master has just computed. +The slave will accordingly +adjust the time of its machine. +This message requires an acknowledgment. +.sp 1 +.SH +Acknowledgment Message +.so unused +.LP +Type: TSP_ACK (2) +.sp 1 +.PP +Both the master and the slaves use this message for +acknowledgment only. +It is used in several different contexts, for example +in reply to an Adjtime message. +.sp 1 +.SH +Master Request Message +.so unused +.LP +Type: TSP_MASTERREQ (3) +.sp 1 +.PP +A newly-started time daemon broadcasts this message to +locate a master. No other action is implied by this packet. +It requires a Master Acknowledgment. +.sp 1 +.SH +Master Acknowledgement +.so unused +.LP +Type: TSP_MASTERACK (4) +.sp 1 +.PP +The master sends this message to acknowledge the Master Request message +and the Conflict Resolution Message. +.sp 1 +.SH +Set Network Time Message +.so date +.LP +Type: TSP_SETTIME (5) +.sp 1 +.PP +The master sends this message to slave time daemons to set their time. +This packet is sent to newly started time daemons and when the network +date is changed. +It contains the master's time as an approximation of the network time. +It requires an acknowledgment. +The next +synchronization round will eliminate the small time difference +caused by the random delay in the communication channel. +.sp 1 +.SH +Master Active Message +.so unused +.LP +Type: TSP_MASTERUP (6) +.sp 1 +.PP +The master broadcasts this message to +solicit the names of the active slaves. +Slaves will reply with a Slave Active message. +.sp 1 +.SH +Slave Active Message +.so unused +.LP +Type: TSP_SLAVEUP (7) +.sp 1 +.PP +A slave sends this message to the master in answer to a Master Active message. +This message is also sent when a new slave starts up to inform the master that +it wants to be synchronized. +.sp 1 +.SH +Master Candidature Message +.so unused +.LP +Type: TSP_ELECTION (8) +.sp 1 +.PP +A slave eligible to become a master broadcasts this message when its election +timer expires. +The message declares that the slave wishes to become the new master. +.sp 1 +.SH +Candidature Acceptance Message +.so unused +.LP +Type: TSP_ACCEPT (9) +.sp 1 +.PP +A slave sends this message to accept the candidature of the time daemon +that has broadcast an Election message. +The candidate will add the slave's name to the list of machines that it +will control should it become the master. +.sp 1 +.SH +Candidature Rejection Message +.so unused +.LP +Type: TSP_REFUSE (10) +.sp 1 +.PP +After a slave accepts the candidature of a time daemon, it will reply +to any election messages from other slaves +with this message. +This rejects any candidature other than the first received. +.sp 1 +.SH +Multiple Master Notification Message +.so unused +.LP +Type: TSP_CONFLICT (11) +.sp 1 +.PP +When two or more masters reply to a Master Request message, the slave +uses this message to inform one of them that more than one master exists. +.sp 1 +.SH +Conflict Resolution Message +.so unused +.LP +Type: TSP_RESOLVE (12) +.sp 1 +.PP +A master which has been informed of the existence of other masters +broadcasts this message to determine who the other masters are. +.sp 1 +.SH +Quit Message +.so unused +.LP +Type: TSP_QUIT (13) +.sp 1 +.PP +This message is sent by the master in three different contexts: +1) to a candidate that broadcasts a Master Candidature message, +2) to another master when notified of its existence, +3) to another master if a loop is detected. +In all cases, the recipient time daemon will become a slave. +This message requires an acknowledgement. +.sp 1 +.SH +Set Date Message +.so date +.LP +Type: TSP_SETDATE (22) +.sp 1 +.PP +The program \fIdate\fP\|(1) sends this message to the local time daemon +when a super-user wants to set the network date. +If the local time daemon is the master, it will set the date; +if it is a slave, it will communicate the desired date to the master. +.sp 1 +.SH +Set Date Request Message +.so date +.LP +Type: TSP_SETDATEREQ (23) +.sp 1 +.PP +A slave that has received a Set Date message will communicate the +desired date to the master using this message. +.sp 1 +.SH +Set Date Acknowledgment Message +.so unused +.LP +Type: TSP_DATEACK (16) +.sp 1 +.PP +The master sends this message to a slave in acknowledgment of a +Set Date Request Message. +The same message is sent by the local time daemon to the program +\fIdate(1)\fP to confirm that the network date has been set by the +master. +.sp 1 +.SH +Start Tracing Message +.so unused +.LP +Type: TSP_TRACEON (17) +.sp 1 +.PP +The controlling program \fItimedc\fP sends this message to the local +time daemon to start the recording in a system file of +all messages received. +.sp 1 +.SH +Stop Tracing Message +.so unused +.LP +Type: TSP_TRACEOFF (18) +.sp 1 +.PP +\fITimedc\fP sends this message to the local +time daemon to stop the recording of +messages received. +.sp 1 +.SH +Master Site Message +.so unused +.LP +Type: TSP_MSITE (19) +.sp 1 +.PP +\fITimedc\fP sends this message to the local time daemon to find out +where the master is running. +.sp 1 +.SH +Remote Master Site Message +.so unused +.LP +Type: TSP_MSITEREQ (20) +.sp 1 +.PP +A local time daemon broadcasts this message to find the location +of the master. +It then uses the Acknowledgement message to +communicate this location to \fItimedc\fP. +.sp 1 +.SH +Test Message +.so unused +.LP +Type: TSP_TEST (21) +.sp 1 +.PP +For testing purposes, \fItimedc\fP sends this message to a slave +to cause its election timer to expire. NOTE: \fItimed\fP +is not normally compiled to support this. +.sp 1 +.SH +.SH +Loop Detection Message +.so loop +.LP +Type: TSP_LOOP (24) +.sp 1 +.PP +This packet is initiated by all masters occasionally to attempt to detect loops. +All submasters forward this packet onto the networks over which they are master. +If a master receives a packet it sent out initially, +it knows that a loop exists and tries to correct the problem. +.SH +References +.IP 1. +R. Gusella and S. Zatti, +\fITEMPO: A Network Time Controller for Distributed Berkeley UNIX System\fP, +USENIX Summer Conference Proceedings, Salt Lake City, June 1984. +.IP 2. +R. Gusella and S. Zatti, \fIClock Synchronization in a Local Area Network\fP, +University of California, Berkeley, Technical Report, \fIto appear\fP. +.IP 3. +R. Gusella and S. Zatti, +\fIAn Election Algorithm for a Distributed Clock Synchronization Program\fP, +University of California, Berkeley, CS Technical Report #275, Dec. 1985. +.IP 4. +Postel, J., \fIUser Datagram Protocol\fP, RFC 768. +Network Information Center, SRI International, Menlo Park, California, +August 1980. +.IP 5. +Postel, J., \fIInternet Control Message Protocol\fP, RFC 792. +Network Information Center, SRI International, Menlo Park, California, +September 1981. diff --git a/share/doc/smm/12.timed/unused b/share/doc/smm/12.timed/unused new file mode 100644 index 000000000000..2f6657663007 --- /dev/null +++ b/share/doc/smm/12.timed/unused @@ -0,0 +1,47 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ft B +.TS +center; +ce | ce | ce | ce +| c | c | c | s | +| c s s s |. +Byte 1 Byte 2 Byte 3 Byte 4 += +Type Version No. Sequence No. +_ +( unused ) +_ +( unused ) +_ +Machine Name +_ +\&. . . +_ +.TE +.ft R diff --git a/share/doc/smm/18.net/0.t b/share/doc/smm/18.net/0.t new file mode 100644 index 000000000000..fa445aee653a --- /dev/null +++ b/share/doc/smm/18.net/0.t @@ -0,0 +1,178 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.de IR +\fI\\$1\fP\\$2 +.. +.if n .ND +.TL +Networking Implementation Notes +.br +4.4BSD Edition +.AU +Samuel J. Leffler, William N. Joy, Robert S. Fabry, and Michael J. Karels +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.AB +.FS +* UNIX is a trademark of Bell Laboratories. +.FE +This report describes the internal structure of the +networking facilities developed for the 4.4BSD version +of the UNIX* operating system +for the VAX\(dg. These facilities +.FS +\(dg DEC, VAX, DECnet, and UNIBUS are trademarks of +Digital Equipment Corporation. +.FE +are based on several central abstractions which +structure the external (user) view of network communication +as well as the internal (system) implementation. +.PP +The report documents the internal structure of the networking system. +The ``Berkeley Software Architecture Manual, 4.4BSD Edition'' (PSD:5) +provides a description of the user interface to the networking facilities. +.sp +.LP +Revised June 10, 1993 +.AE +.LP +.\".de PT +.\".lt \\n(LLu +.\".pc % +.\".nr PN \\n% +.\".tl '\\*(LH'\\*(CH'\\*(RH' +.\".lt \\n(.lu +.\".. +.\".ds RH Contents +.OH 'Networking Implementation Notes''SMM:18-%' +.EH 'SMM:18-%''Networking Implementation Notes' +.bp +.ce +.B "TABLE OF CONTENTS" +.LP +.sp 1 +.nf +.B "1. Introduction" +.LP +.sp .5v +.nf +.B "2. Overview" +.LP +.sp .5v +.nf +.B "3. Goals +.LP +.sp .5v +.nf +.B "4. Internal address representation" +.LP +.sp .5v +.nf +.B "5. Memory management" +.LP +.sp .5v +.nf +.B "6. Internal layering +6.1. Socket layer +6.1.1. Socket state +6.1.2. Socket data queues +6.1.3. Socket connection queuing +6.2. Protocol layer(s) +6.3. Network-interface layer +6.3.1. UNIBUS interfaces +.LP +.sp .5v +.nf +.B "7. Socket/protocol interface" +.LP +.sp .5v +.nf +.B "8. Protocol/protocol interface" +8.1. pr_output +8.2. pr_input +8.3. pr_ctlinput +8.4. pr_ctloutput +.LP +.sp .5v +.nf +.B "9. Protocol/network-interface interface" +9.1. Packet transmission +9.2. Packet reception +.LP +.sp .5v +.nf +.B "10. Gateways and routing issues +10.1. Routing tables +10.2. Routing table interface +10.3. User level routing policies +.LP +.sp .5v +.nf +.B "11. Raw sockets" +11.1. Control blocks +11.2. Input processing +11.3. Output processing +.LP +.sp .5v +.nf +.B "12. Buffering and congestion control" +12.1. Memory management +12.2. Protocol buffering policies +12.3. Queue limiting +12.4. Packet forwarding +.LP +.sp .5v +.nf +.B "13. Out of band data" +.LP +.sp .5v +.nf +.B "14. Trailer protocols" +.LP +.sp .5v +.nf +.B Acknowledgements +.LP +.sp .5v +.nf +.B References +.bp +.de _d +.if t .ta .6i 2.1i 2.6i +.\" 2.94 went to 2.6, 3.64 to 3.30 +.if n .ta .84i 2.6i 3.30i +.. +.de _f +.if t .ta .5i 1.25i 2.5i +.\" 3.5i went to 3.8i +.if n .ta .7i 1.75i 3.8i +.. diff --git a/share/doc/smm/18.net/1.t b/share/doc/smm/18.net/1.t new file mode 100644 index 000000000000..7a33a312fa11 --- /dev/null +++ b/share/doc/smm/18.net/1.t @@ -0,0 +1,60 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\".ds RH Introduction +.br +.ne 2i +.NH +\s+2Introduction\s0 +.PP +This report describes the internal structure of +facilities added to the +4.2BSD version of the UNIX operating system for +the VAX, +as modified in the 4.4BSD release. +The system facilities provide +a uniform user interface to networking +within UNIX. In addition, the implementation +introduces a structure for network communications which may be +used by system implementors in adding new networking +facilities. The internal structure is not visible +to the user, rather it is intended to aid implementors +of communication protocols and network services by +providing a framework which +promotes code sharing and minimizes implementation effort. +.PP +The reader is expected to be familiar with the C programming +language and system interface, as described in the +\fIBerkeley Software Architecture Manual, 4.4BSD Edition\fP [Joy86]. +Basic understanding of network +communication concepts is assumed; where required +any additional ideas are introduced. +.PP +The remainder of this document +provides a description of the system internals, +avoiding, when possible, those portions which are utilized only +by the interprocess communication facilities. diff --git a/share/doc/smm/18.net/2.t b/share/doc/smm/18.net/2.t new file mode 100644 index 000000000000..0880f22957c0 --- /dev/null +++ b/share/doc/smm/18.net/2.t @@ -0,0 +1,79 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH Overview +.br +.ne 2i +.NH +\s+2Overview\s0 +.PP +If we consider +the International Standards Organization's (ISO) +Open System Interconnection (OSI) model of +network communication [ISO81] [Zimmermann80], +the networking facilities +described here correspond to a portion of the +session layer (layer 3) and all of the transport and +network layers (layers 2 and 1, respectively). +.PP +The network layer provides possibly imperfect +data transport services with minimal addressing +structure. +Addressing at this level is normally host to host, +with implicit or explicit routing optionally supported +by the communicating agents. +.PP +At the transport +layer the notions of reliable transfer, data sequencing, +flow control, and service addressing are normally +included. Reliability is usually managed by +explicit acknowledgement of data delivered. Failure +to acknowledge a transfer results in retransmission of +the data. Sequencing may be handled by tagging +each message handed to the network layer by a +\fIsequence number\fP and maintaining +state at the endpoints of communication to utilize +received sequence numbers in reordering data which +arrives out of order. +.PP +The session layer facilities may provide forms of +addressing which are mapped into formats required +by the transport layer, service authentication +and client authentication, etc. Various systems +also provide services such as data encryption and +address and protocol translation. +.PP +The following sections begin by describing some of the common +data structures and utility routines, then examine +the internal layering. The contents of each layer +and its interface are considered. Certain of the +interfaces are protocol implementation specific. For +these cases examples have been drawn from the Internet [Cerf78] +protocol family. Later sections cover routing issues, +the design of the raw socket interface and other +miscellaneous topics. diff --git a/share/doc/smm/18.net/3.t b/share/doc/smm/18.net/3.t new file mode 100644 index 000000000000..6e530cdab3a8 --- /dev/null +++ b/share/doc/smm/18.net/3.t @@ -0,0 +1,53 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH Goals +.br +.ne 2i +.NH +\s+2Goals\s0 +.PP +The networking system was designed with the goal of supporting +multiple \fIprotocol families\fP and addressing styles. This required +information to be ``hidden'' in common data structures which +could be manipulated by all the pieces of the system, but which +required interpretation only by the protocols which ``controlled'' +it. The system described here attempts to minimize +the use of shared data structures to those kept by a suite of +protocols (a \fIprotocol family\fP), and those used for rendezvous +between ``synchronous'' and ``asynchronous'' portions of the +system (e.g. queues of data packets are filled at interrupt +time and emptied based on user requests). +.PP +A major goal of the system was to provide a framework within +which new protocols and hardware could be easily be supported. +To this end, a great deal of effort has been extended to +create utility routines which hide many of the more +complex and/or hardware dependent chores of networking. +Later sections describe the utility routines and the underlying +data structures they manipulate. diff --git a/share/doc/smm/18.net/4.t b/share/doc/smm/18.net/4.t new file mode 100644 index 000000000000..adf3b85e0bcf --- /dev/null +++ b/share/doc/smm/18.net/4.t @@ -0,0 +1,61 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH "Address representation +.br +.ne 2i +.NH +\s+2Internal address representation\s0 +.PP +Common to all portions of the system are two data structures. +These structures are used to represent +addresses and various data objects. +Addresses, internally are described by the \fIsockaddr\fP structure, +.DS +._f +struct sockaddr { + short sa_family; /* data format identifier */ + char sa_data[14]; /* address */ +}; +.DE +All addresses belong to one or more \fIaddress families\fP +which define their format and interpretation. +The \fIsa_family\fP field indicates the address family to which the address +belongs, and the \fIsa_data\fP field contains the actual data value. +The size of the data field, 14 bytes, was selected based on a study +of current address formats.* +Specific address formats use private structure definitions +that define the format of the data field. +The system interface supports larger address structures, +although address-family-independent support facilities, for example routing +and raw socket interfaces, provide only 14 bytes for address storage. +Protocols that do not use those facilities (e.g, the current Unix domain) +may use larger data areas. +.FS +* Later versions of the system may support variable length addresses. +.FE diff --git a/share/doc/smm/18.net/5.t b/share/doc/smm/18.net/5.t new file mode 100644 index 000000000000..07cc0d050d3a --- /dev/null +++ b/share/doc/smm/18.net/5.t @@ -0,0 +1,178 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH "Memory management +.br +.ne 2i +.NH +\s+2Memory management\s0 +.PP +A single mechanism is used for data storage: memory buffers, or +\fImbuf\fP's. An mbuf is a structure of the form: +.DS +._f +struct mbuf { + struct mbuf *m_next; /* next buffer in chain */ + u_long m_off; /* offset of data */ + short m_len; /* amount of data in this mbuf */ + short m_type; /* mbuf type (accounting) */ + u_char m_dat[MLEN]; /* data storage */ + struct mbuf *m_act; /* link in higher-level mbuf list */ +}; +.DE +The \fIm_next\fP field is used to chain mbufs together on linked +lists, while the \fIm_act\fP field allows lists of mbuf chains to be +accumulated. By convention, the mbufs common to a single object +(for example, a packet) are chained together with the \fIm_next\fP +field, while groups of objects are linked via the \fIm_act\fP +field (possibly when in a queue). +.PP +Each mbuf has a small data area for storing information, \fIm_dat\fP. +The \fIm_len\fP field indicates the amount of data, while the \fIm_off\fP +field is an offset to the beginning of the data from the base of the +mbuf. Thus, for example, the macro \fImtod\fP, which converts a pointer +to an mbuf to a pointer to the data stored in the mbuf, has the form +.DS +._d +#define mtod(\fIx\fP,\fIt\fP) ((\fIt\fP)((int)(\fIx\fP) + (\fIx\fP)->m_off)) +.DE +(note the \fIt\fP parameter, a C type cast, which is used to cast +the resultant pointer for proper assignment). +.PP +In addition to storing data directly in the mbuf's data area, data +of page size may be also be stored in a separate area of memory. +The mbuf utility routines maintain +a pool of pages for this purpose and manipulate a private page map +for such pages. +An mbuf with an external data area may be recognized by the larger +offset to the data area; +this is formalized by the macro M_HASCL(\fIm\fP), which is true +if the mbuf whose address is \fIm\fP has an external page cluster. +An array of reference counts on pages is also maintained +so that copies of pages may be made without core to core +copying (copies are created simply by duplicating the reference to the data +and incrementing the associated reference counts for the pages). +Separate data pages are currently used only +when copying data from a user process into the kernel, +and when bringing data in at the hardware level. Routines which +manipulate mbufs are not normally aware whether data is stored directly in +the mbuf data array, or if it is kept in separate pages. +.PP +The following may be used to allocate and free mbufs: +.LP +m = m_get(wait, type); +.br +MGET(m, wait, type); +.IP +The subroutine \fIm_get\fP and the macro \fIMGET\fP +each allocate an mbuf, placing its address in \fIm\fP. +The argument \fIwait\fP is either M_WAIT or M_DONTWAIT according +to whether allocation should block or fail if no mbuf is available. +The \fItype\fP is one of the predefined mbuf types for use in accounting +of mbuf allocation. +.IP "MCLGET(m);" +This macro attempts to allocate an mbuf page cluster +to associate with the mbuf \fIm\fP. +If successful, the length of the mbuf is set to CLSIZE, +the size of the page cluster. +.LP +n = m_free(m); +.br +MFREE(m,n); +.IP +The routine \fIm_free\fP and the macro \fIMFREE\fP +each free a single mbuf, \fIm\fP, and any associated external storage area, +placing a pointer to its successor in the chain it heads, if any, in \fIn\fP. +.IP "m_freem(m);" +This routine frees an mbuf chain headed by \fIm\fP. +.PP +The following utility routines are available for manipulating mbuf +chains: +.IP "m = m_copy(m0, off, len);" +.br +The \fIm_copy\fP routine create a copy of all, or part, of a +list of the mbufs in \fIm0\fP. \fILen\fP bytes of data, starting +\fIoff\fP bytes from the front of the chain, are copied. +Where possible, reference counts on pages are used instead +of core to core copies. The original mbuf chain must have at +least \fIoff\fP + \fIlen\fP bytes of data. If \fIlen\fP is +specified as M_COPYALL, all the data present, offset +as before, is copied. +.IP "m_cat(m, n);" +.br +The mbuf chain, \fIn\fP, is appended to the end of \fIm\fP. +Where possible, compaction is performed. +.IP "m_adj(m, diff);" +.br +The mbuf chain, \fIm\fP is adjusted in size by \fIdiff\fP +bytes. If \fIdiff\fP is non-negative, \fIdiff\fP bytes +are shaved off the front of the mbuf chain. If \fIdiff\fP +is negative, the alteration is performed from back to front. +No space is reclaimed in this operation; alterations are +accomplished by changing the \fIm_len\fP and \fIm_off\fP +fields of mbufs. +.IP "m = m_pullup(m0, size);" +.br +After a successful call to \fIm_pullup\fP, the mbuf at +the head of the returned list, \fIm\fP, is guaranteed +to have at least \fIsize\fP +bytes of data in contiguous memory within the data area of the mbuf +(allowing access via a pointer, obtained using the \fImtod\fP macro, +and allowing the mbuf to be located from a pointer to the data area +using \fIdtom\fP, defined below). +If the original data was less than \fIsize\fP bytes long, +\fIlen\fP was greater than the size of an mbuf data +area (112 bytes), or required resources were unavailable, +\fIm\fP is 0 and the original mbuf chain is deallocated. +.IP +This routine is particularly useful when verifying packet +header lengths on reception. For example, if a packet is +received and only 8 of the necessary 16 bytes required +for a valid packet header are present at the head of the list +of mbufs representing the packet, the remaining 8 bytes +may be ``pulled up'' with a single \fIm_pullup\fP call. +If the call fails the invalid packet will have been discarded. +.PP +By insuring that mbufs always reside on 128 byte boundaries, +it is always possible to locate the mbuf associated with a data +area by masking off the low bits of the virtual address. +This allows modules to store data structures in mbufs and +pass them around without concern for locating the original +mbuf when it comes time to free the structure. +Note that this works only with objects stored in the internal data +buffer of the mbuf. +The \fIdtom\fP macro is used to convert a pointer into an mbuf's +data area to a pointer to the mbuf, +.DS +#define dtom(x) ((struct mbuf *)((int)x & ~(MSIZE-1))) +.DE +.PP +Mbufs are used for dynamically allocated data structures such as +sockets as well as memory allocated for packets and headers. Statistics are +maintained on mbuf usage and can be viewed by users using the +\fInetstat\fP\|(1) program. diff --git a/share/doc/smm/18.net/6.t b/share/doc/smm/18.net/6.t new file mode 100644 index 000000000000..57bd036ffba8 --- /dev/null +++ b/share/doc/smm/18.net/6.t @@ -0,0 +1,658 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH "Internal layering +.br +.ne 2i +.NH +\s+2Internal layering\s0 +.PP +The internal structure of the network system is divided into +three layers. These +layers correspond to the services provided by the socket +abstraction, those provided by the communication protocols, +and those provided by the hardware interfaces. The communication +protocols are normally layered into two or more individual +cooperating layers, though they are collectively viewed +in the system as one layer providing services supportive +of the appropriate socket abstraction. +.PP +The following sections describe the properties of each layer +in the system and the interfaces to which each must conform. +.NH 2 +Socket layer +.PP +The socket layer deals with the interprocess communication +facilities provided by the system. A socket is a bidirectional +endpoint of communication which is ``typed'' by the semantics +of communication it supports. The system calls described in +the \fIBerkeley Software Architecture Manual\fP [Joy86] +are used to manipulate sockets. +.PP +A socket consists of the following data structure: +.DS +._f +struct socket { + short so_type; /* generic type */ + short so_options; /* from socket call */ + short so_linger; /* time to linger while closing */ + short so_state; /* internal state flags */ + caddr_t so_pcb; /* protocol control block */ + struct protosw *so_proto; /* protocol handle */ + struct socket *so_head; /* back pointer to accept socket */ + struct socket *so_q0; /* queue of partial connections */ + short so_q0len; /* partials on so_q0 */ + struct socket *so_q; /* queue of incoming connections */ + short so_qlen; /* number of connections on so_q */ + short so_qlimit; /* max number queued connections */ + struct sockbuf so_rcv; /* receive queue */ + struct sockbuf so_snd; /* send queue */ + short so_timeo; /* connection timeout */ + u_short so_error; /* error affecting connection */ + u_short so_oobmark; /* chars to oob mark */ + short so_pgrp; /* pgrp for signals */ +}; +.DE +.PP +Each socket contains two data queues, \fIso_rcv\fP and \fIso_snd\fP, +and a pointer to routines which provide supporting services. +The type of the socket, +\fIso_type\fP is defined at socket creation time and used in selecting +those services which are appropriate to support it. The supporting +protocol is selected at socket creation time and recorded in +the socket data structure for later use. Protocols are defined +by a table of procedures, the \fIprotosw\fP structure, which will +be described in detail later. A pointer to a protocol-specific +data structure, +the ``protocol control block,'' is also present in the socket structure. +Protocols control this data structure, which normally includes a +back pointer to the parent socket structure to allow easy +lookup when returning information to a user +(for example, placing an error number in the \fIso_error\fP +field). The other entries in the socket structure are used in +queuing connection requests, validating user requests, storing +socket characteristics (e.g. +options supplied at the time a socket is created), and maintaining +a socket's state. +.PP +Processes ``rendezvous at a socket'' in many instances. For instance, +when a process wishes to extract data from a socket's receive queue +and it is empty, or lacks sufficient data to satisfy the request, +the process blocks, supplying the address of the receive queue as +a ``wait channel' to be used in notification. When data arrives +for the process and is placed in the socket's queue, the blocked +process is identified by the fact it is waiting ``on the queue.'' +.NH 3 +Socket state +.PP +A socket's state is defined from the following: +.DS +.ta \w'#define 'u +\w'SS_ISDISCONNECTING 'u +\w'0x000 'u +#define SS_NOFDREF 0x001 /* no file table ref any more */ +#define SS_ISCONNECTED 0x002 /* socket connected to a peer */ +#define SS_ISCONNECTING 0x004 /* in process of connecting to peer */ +#define SS_ISDISCONNECTING 0x008 /* in process of disconnecting */ +#define SS_CANTSENDMORE 0x010 /* can't send more data to peer */ +#define SS_CANTRCVMORE 0x020 /* can't receive more data from peer */ +#define SS_RCVATMARK 0x040 /* at mark on input */ + +#define SS_PRIV 0x080 /* privileged */ +#define SS_NBIO 0x100 /* non-blocking ops */ +#define SS_ASYNC 0x200 /* async i/o notify */ +.DE +.PP +The state of a socket is manipulated both by the protocols +and the user (through system calls). +When a socket is created, the state is defined based on the type of socket. +It may change as control actions are performed, for example connection +establishment. +It may also change according to the type of +input/output the user wishes to perform, as indicated by options +set with \fIfcntl\fP. ``Non-blocking'' I/O implies that +a process should never be blocked to await resources. Instead, any +call which would block returns prematurely +with the error EWOULDBLOCK, or the service request may be partially +fulfilled, e.g. a request for more data than is present. +.PP +If a process requested ``asynchronous'' notification of events +related to the socket, the SIGIO signal is posted to the process +when such events occur. +An event is a change in the socket's state; +examples of such occurrences are: space +becoming available in the send queue, new data available in the +receive queue, connection establishment or disestablishment, etc. +.PP +A socket may be marked ``privileged'' if it was created by the +super-user. Only privileged sockets may +bind addresses in privileged portions of an address space +or use ``raw'' sockets to access lower levels of the network. +.NH 3 +Socket data queues +.PP +A socket's data queue contains a pointer to the data stored in +the queue and other entries related to the management of +the data. The following structure defines a data queue: +.DS +._f +struct sockbuf { + u_short sb_cc; /* actual chars in buffer */ + u_short sb_hiwat; /* max actual char count */ + u_short sb_mbcnt; /* chars of mbufs used */ + u_short sb_mbmax; /* max chars of mbufs to use */ + u_short sb_lowat; /* low water mark */ + short sb_timeo; /* timeout */ + struct mbuf *sb_mb; /* the mbuf chain */ + struct proc *sb_sel; /* process selecting read/write */ + short sb_flags; /* flags, see below */ +}; +.DE +.PP +Data is stored in a queue as a chain of mbufs. +The actual count of data characters as well as high and low water marks are +used by the protocols in controlling the flow of data. +The amount of buffer space (characters of mbufs and associated data pages) +is also recorded along with the limit on buffer allocation. +The socket routines cooperate in implementing the flow control +policy by blocking a process when it requests to send data and +the high water mark has been reached, or when it requests to +receive data and less than the low water mark is present +(assuming non-blocking I/O has not been specified).* +.FS +* The low-water mark is always presumed to be 0 +in the current implementation. +.FE +.PP +When a socket is created, the supporting protocol ``reserves'' space +for the send and receive queues of the socket. +The limit on buffer allocation is set somewhat higher than the limit +on data characters +to account for the granularity of buffer allocation. +The actual storage associated with a +socket queue may fluctuate during a socket's lifetime, but it is assumed +that this reservation will always allow a protocol to acquire enough memory +to satisfy the high water marks. +.PP +The timeout and select values are manipulated by the socket routines +in implementing various portions of the interprocess communications +facilities and will not be described here. +.PP +Data queued at a socket is stored in one of two styles. +Stream-oriented sockets queue data with no addresses, headers +or record boundaries. +The data are in mbufs linked through the \fIm_next\fP field. +Buffers containing access rights may be present within the chain +if the underlying protocol supports passage of access rights. +Record-oriented sockets, including datagram sockets, +queue data as a list of packets; the sections of packets are distinguished +by the types of the mbufs containing them. +The mbufs which comprise a record are linked through the \fIm_next\fP field; +records are linked from the \fIm_act\fP field of the first mbuf +of one packet to the first mbuf of the next. +Each packet begins with an mbuf containing the ``from'' address +if the protocol provides it, +then any buffers containing access rights, and finally any buffers +containing data. +If a record contains no data, +no data buffers are required unless neither address nor access rights +are present. +.PP +A socket queue has a number of flags used in synchronizing access +to the data and in acquiring resources: +.DS +._d +#define SB_LOCK 0x01 /* lock on data queue (so_rcv only) */ +#define SB_WANT 0x02 /* someone is waiting to lock */ +#define SB_WAIT 0x04 /* someone is waiting for data/space */ +#define SB_SEL 0x08 /* buffer is selected */ +#define SB_COLL 0x10 /* collision selecting */ +.DE +The last two flags are manipulated by the system in implementing +the select mechanism. +.NH 3 +Socket connection queuing +.PP +In dealing with connection oriented sockets (e.g. SOCK_STREAM) +the two ends are considered distinct. One end is termed +\fIactive\fP, and generates connection requests. The other +end is called \fIpassive\fP and accepts connection requests. +.PP +From the passive side, a socket is marked with +SO_ACCEPTCONN when a \fIlisten\fP call is made, +creating two queues of sockets: \fIso_q0\fP for connections +in progress and \fIso_q\fP for connections already made and +awaiting user acceptance. +As a protocol is preparing incoming connections, it creates +a socket structure queued on \fIso_q0\fP by calling the routine +\fIsonewconn\fP(). When the connection +is established, the socket structure is then transferred +to \fIso_q\fP, making it available for an \fIaccept\fP. +.PP +If an SO_ACCEPTCONN socket is closed with sockets on either +\fIso_q0\fP or \fIso_q\fP, these sockets are dropped, +with notification to the peers as appropriate. +.NH 2 +Protocol layer(s) +.PP +Each socket is created in a communications domain, +which usually implies both an addressing structure (address family) +and a set of protocols which implement various socket types within the domain +(protocol family). +Each domain is defined by the following structure: +.DS +.ta .5i +\w'struct 'u +\w'(*dom_externalize)(); 'u +struct domain { + int dom_family; /* PF_xxx */ + char *dom_name; + int (*dom_init)(); /* initialize domain data structures */ + int (*dom_externalize)(); /* externalize access rights */ + int (*dom_dispose)(); /* dispose of internalized rights */ + struct protosw *dom_protosw, *dom_protoswNPROTOSW; + struct domain *dom_next; +}; +.DE +.PP +At boot time, each domain configured into the kernel +is added to a linked list of domain. +The initialization procedure of each domain is then called. +After that time, the domain structure is used to locate protocols +within the protocol family. +It may also contain procedure references +for externalization of access rights at the receiving socket +and the disposal of access rights that are not received. +.PP +Protocols are described by a set of entry points and certain +socket-visible characteristics, some of which are used in +deciding which socket type(s) they may support. +.PP +An entry in the ``protocol switch'' table exists for each +protocol module configured into the system. It has the following form: +.DS +.ta .5i +\w'struct 'u +\w'domain *pr_domain; 'u +struct protosw { + short pr_type; /* socket type used for */ + struct domain *pr_domain; /* domain protocol a member of */ + short pr_protocol; /* protocol number */ + short pr_flags; /* socket visible attributes */ +/* protocol-protocol hooks */ + int (*pr_input)(); /* input to protocol (from below) */ + int (*pr_output)(); /* output to protocol (from above) */ + int (*pr_ctlinput)(); /* control input (from below) */ + int (*pr_ctloutput)(); /* control output (from above) */ +/* user-protocol hook */ + int (*pr_usrreq)(); /* user request */ +/* utility hooks */ + int (*pr_init)(); /* initialization routine */ + int (*pr_fasttimo)(); /* fast timeout (200ms) */ + int (*pr_slowtimo)(); /* slow timeout (500ms) */ + int (*pr_drain)(); /* flush any excess space possible */ +}; +.DE +.PP +A protocol is called through the \fIpr_init\fP entry before any other. +Thereafter it is called every 200 milliseconds through the +\fIpr_fasttimo\fP entry and +every 500 milliseconds through the \fIpr_slowtimo\fP for timer based actions. +The system will call the \fIpr_drain\fP entry if it is low on space and +this should throw away any non-critical data. +.PP +Protocols pass data between themselves as chains of mbufs using +the \fIpr_input\fP and \fIpr_output\fP routines. \fIPr_input\fP +passes data up (towards +the user) and \fIpr_output\fP passes it down (towards the network); control +information passes up and down on \fIpr_ctlinput\fP and \fIpr_ctloutput\fP. +The protocol is responsible for the space occupied by any of the +arguments to these entries and must either pass it onward or dispose of it. +(On output, the lowest level reached must free buffers storing the arguments; +on input, the highest level is responsible for freeing buffers.) +.PP +The \fIpr_usrreq\fP routine interfaces protocols to the socket +code and is described below. +.PP +The \fIpr_flags\fP field is constructed from the following values: +.DS +.ta \w'#define 'u +\w'PR_CONNREQUIRED 'u +8n +#define PR_ATOMIC 0x01 /* exchange atomic messages only */ +#define PR_ADDR 0x02 /* addresses given with messages */ +#define PR_CONNREQUIRED 0x04 /* connection required by protocol */ +#define PR_WANTRCVD 0x08 /* want PRU_RCVD calls */ +#define PR_RIGHTS 0x10 /* passes capabilities */ +.DE +Protocols which are connection-based specify the PR_CONNREQUIRED +flag so that the socket routines will never attempt to send data +before a connection has been established. If the PR_WANTRCVD flag +is set, the socket routines will notify the protocol when the user +has removed data from the socket's receive queue. This allows +the protocol to implement acknowledgement on user receipt, and +also update windowing information based on the amount of space +available in the receive queue. The PR_ADDR field indicates that any +data placed in the socket's receive queue will be preceded by the +address of the sender. The PR_ATOMIC flag specifies that each \fIuser\fP +request to send data must be performed in a single \fIprotocol\fP send +request; it is the protocol's responsibility to maintain record +boundaries on data to be sent. The PR_RIGHTS flag indicates that the +protocol supports the passing of capabilities; this is currently +used only by the protocols in the UNIX protocol family. +.PP +When a socket is created, the socket routines scan the protocol +table for the domain +looking for an appropriate protocol to support the type of +socket being created. The \fIpr_type\fP field contains one of the +possible socket types (e.g. SOCK_STREAM), while the \fIpr_domain\fP +is a back pointer to the domain structure. +The \fIpr_protocol\fP field contains the protocol number of the +protocol, normally a well-known value. +.NH 2 +Network-interface layer +.PP +Each network-interface configured into a system defines a +path through which packets may be sent and received. +Normally a hardware device is associated with this interface, +though there is no requirement for this (for example, all +systems have a software ``loopback'' interface used for +debugging and performance analysis). +In addition to manipulating the hardware device, an interface +module is responsible +for encapsulation and decapsulation of any link-layer header +information required to deliver a message to its destination. +The selection of which interface to use in delivering packets +is a routing decision carried out at a +higher level than the network-interface layer. +An interface may have addresses in one or more address families. +The address is set at boot time using an \fIioctl\fP on a socket +in the appropriate domain; this operation is implemented by the protocol +family, after verifying the operation through the device \fIioctl\fP entry. +.PP +An interface is defined by the following structure, +.DS +.ta .5i +\w'struct 'u +\w'ifaddr *if_addrlist; 'u +struct ifnet { + char *if_name; /* name, e.g. ``en'' or ``lo'' */ + short if_unit; /* sub-unit for lower level driver */ + short if_mtu; /* maximum transmission unit */ + short if_flags; /* up/down, broadcast, etc. */ + short if_timer; /* time 'til if_watchdog called */ + struct ifaddr *if_addrlist; /* list of addresses of interface */ + struct ifqueue if_snd; /* output queue */ + int (*if_init)(); /* init routine */ + int (*if_output)(); /* output routine */ + int (*if_ioctl)(); /* ioctl routine */ + int (*if_reset)(); /* bus reset routine */ + int (*if_watchdog)(); /* timer routine */ + int if_ipackets; /* packets received on interface */ + int if_ierrors; /* input errors on interface */ + int if_opackets; /* packets sent on interface */ + int if_oerrors; /* output errors on interface */ + int if_collisions; /* collisions on csma interfaces */ + struct ifnet *if_next; +}; +.DE +Each interface address has the following form: +.DS +.ta \w'#define 'u +\w'struct 'u +\w'struct 'u +\w'sockaddr ifa_addr; 'u-\w'struct 'u +struct ifaddr { + struct sockaddr ifa_addr; /* address of interface */ + union { + struct sockaddr ifu_broadaddr; + struct sockaddr ifu_dstaddr; + } ifa_ifu; + struct ifnet *ifa_ifp; /* back-pointer to interface */ + struct ifaddr *ifa_next; /* next address for interface */ +}; +.ta \w'#define 'u +\w'ifa_broadaddr 'u +\w'ifa_ifu.ifu_broadaddr 'u +#define ifa_broadaddr ifa_ifu.ifu_broadaddr /* broadcast address */ +#define ifa_dstaddr ifa_ifu.ifu_dstaddr /* other end of p-to-p link */ +.DE +The protocol generally maintains this structure as part of a larger +structure containing additional information concerning the address. +.PP +Each interface has a send queue and routines used for +initialization, \fIif_init\fP, and output, \fIif_output\fP. +If the interface resides on a system bus, the routine \fIif_reset\fP +will be called after a bus reset has been performed. +An interface may also +specify a timer routine, \fIif_watchdog\fP; +if \fIif_timer\fP is non-zero, it is decremented once per second +until it reaches zero, at which time the watchdog routine is called. +.PP +The state of an interface and certain characteristics are stored in +the \fIif_flags\fP field. The following values are possible: +.DS +._d +#define IFF_UP 0x1 /* interface is up */ +#define IFF_BROADCAST 0x2 /* broadcast is possible */ +#define IFF_DEBUG 0x4 /* turn on debugging */ +#define IFF_LOOPBACK 0x8 /* is a loopback net */ +#define IFF_POINTOPOINT 0x10 /* interface is point-to-point link */ +#define IFF_NOTRAILERS 0x20 /* avoid use of trailers */ +#define IFF_RUNNING 0x40 /* resources allocated */ +#define IFF_NOARP 0x80 /* no address resolution protocol */ +.DE +If the interface is connected to a network which supports transmission +of \fIbroadcast\fP packets, the IFF_BROADCAST flag will be set and +the \fIifa_broadaddr\fP field will contain the address to be used in +sending or accepting a broadcast packet. If the interface is associated +with a point-to-point hardware link (for example, a DEC DMR-11), the +IFF_POINTOPOINT flag will be set and \fIifa_dstaddr\fP will contain the +address of the host on the other side of the connection. These addresses +and the local address of the interface, \fIif_addr\fP, are used in +filtering incoming packets. The interface sets IFF_RUNNING after +it has allocated system resources and posted an initial read on the +device it manages. This state bit is used to avoid multiple allocation +requests when an interface's address is changed. The IFF_NOTRAILERS +flag indicates the interface should refrain from using a \fItrailer\fP +encapsulation on outgoing packets, or (where per-host negotiation +of trailers is possible) that trailer encapsulations should not be requested; +\fItrailer\fP protocols are described +in section 14. The IFF_NOARP flag indicates the interface should not +use an ``address resolution protocol'' in mapping internetwork addresses +to local network addresses. +.PP +Various statistics are also stored in the interface structure. These +may be viewed by users using the \fInetstat\fP(1) program. +.PP +The interface address and flags may be set with the SIOCSIFADDR and +SIOCSIFFLAGS \fIioctl\fP\^s. SIOCSIFADDR is used initially to define each +interface's address; SIOGSIFFLAGS can be used to mark +an interface down and perform site-specific configuration. +The destination address of a point-to-point link is set with SIOCSIFDSTADDR. +Corresponding operations exist to read each value. +Protocol families may also support operations to set and read the broadcast +address. +In addition, the SIOCGIFCONF \fIioctl\fP retrieves a list of interface +names and addresses for all interfaces and protocols on the host. +.NH 3 +UNIBUS interfaces +.PP +All hardware related interfaces currently reside on the UNIBUS. +Consequently a common set of utility routines for dealing +with the UNIBUS has been developed. Each UNIBUS interface +utilizes a structure of the following form: +.DS +.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR]; 'u +struct ifubinfo { + short iff_uban; /* uba number */ + short iff_hlen; /* local net header length */ + struct uba_regs *iff_uba; /* uba regs, in vm */ + short iff_flags; /* used during uballoc's */ +}; +.DE +Additional structures are associated with each receive and transmit buffer, +normally one each per interface; for read, +.DS +.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR]; 'u +struct ifrw { + caddr_t ifrw_addr; /* virt addr of header */ + short ifrw_bdp; /* unibus bdp */ + short ifrw_flags; /* type, etc. */ +#define IFRW_W 0x01 /* is a transmit buffer */ + int ifrw_info; /* value from ubaalloc */ + int ifrw_proto; /* map register prototype */ + struct pte *ifrw_mr; /* base of map registers */ +}; +.DE +and for write, +.DS +.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR]; 'u +struct ifxmt { + struct ifrw ifrw; + caddr_t ifw_base; /* virt addr of buffer */ + struct pte ifw_wmap[IF_MAXNUBAMR]; /* base pages for output */ + struct mbuf *ifw_xtofree; /* pages being dma'd out */ + short ifw_xswapd; /* mask of clusters swapped */ + short ifw_nmr; /* number of entries in wmap */ +}; +.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR]; 'u +#define ifw_addr ifrw.ifrw_addr +#define ifw_bdp ifrw.ifrw_bdp +#define ifw_flags ifrw.ifrw_flags +#define ifw_info ifrw.ifrw_info +#define ifw_proto ifrw.ifrw_proto +#define ifw_mr ifrw.ifrw_mr +.DE +One of each of these structures is conveniently packaged for interfaces +with single buffers for each direction, as follows: +.DS +.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR]; 'u +struct ifuba { + struct ifubinfo ifu_info; + struct ifrw ifu_r; + struct ifxmt ifu_xmt; +}; +.ta \w'#define 'u +\w'ifw_xtofree 'u +#define ifu_uban ifu_info.iff_uban +#define ifu_hlen ifu_info.iff_hlen +#define ifu_uba ifu_info.iff_uba +#define ifu_flags ifu_info.iff_flags +#define ifu_w ifu_xmt.ifrw +#define ifu_xtofree ifu_xmt.ifw_xtofree +.DE +.PP +The \fIif_ubinfo\fP structure contains the general information needed +to characterize the I/O-mapped buffers for the device. +In addition, there is a structure describing each buffer, including +UNIBUS resources held by the interface. +Sufficient memory pages and bus map registers are allocated to each buffer +upon initialization according to the maximum packet size and header length. +The kernel virtual address of the buffer is held in \fIifrw_addr\fP, +and the map registers begin +at \fIifrw_mr\fP. UNIBUS map register \fIifrw_mr\fP\^[\-1] +maps the local network header +ending on a page boundary. UNIBUS data paths are +reserved for read and for +write, given by \fIifrw_bdp\fP. The prototype of the map +registers for read and for write is saved in \fIifrw_proto\fP. +.PP +When write transfers are not at least half-full pages on page boundaries, +the data are just copied into the pages mapped on the UNIBUS +and the transfer is started. +If a write transfer is at least half a page long and on a page +boundary, UNIBUS page table entries are swapped to reference +the pages, and then the initial pages are +remapped from \fIifw_wmap\fP when the transfer completes. +The mbufs containing the mapped pages are placed on the \fIifw_xtofree\fP +queue to be freed after transmission. +.PP +When read transfers give at least half a page of data to be input, page +frames are allocated from a network page list and traded +with the pages already containing the data, mapping the allocated +pages to replace the input pages for the next UNIBUS data input. +.PP +The following utility routines are available for use in +writing network interface drivers; all use the +structures described above. +.LP +if_ubaminit(ifubinfo, uban, hlen, nmr, ifr, nr, ifx, nx); +.br +if_ubainit(ifuba, uban, hlen, nmr); +.IP +\fIif_ubaminit\fP allocates resources on UNIBUS adapter \fIuban\fP, +storing the information in the \fIifubinfo\fP, \fIifrw\fP and \fIifxmt\fP +structures referenced. +The \fIifr\fP and \fIifx\fP parameters are pointers to arrays +of \fIifrw\fP and \fIifxmt\fP structures whose dimensions +are \fInr\fP and \fInx\fP, respectively. +\fIif_ubainit\fP is a simpler, backwards-compatible interface used +for hardware with single buffers of each type. +They are called only at boot time or after a UNIBUS reset. +One data path (buffered or unbuffered, +depending on the \fIifu_flags\fP field) is allocated for each buffer. +The \fInmr\fP parameter indicates +the number of UNIBUS mapping registers required to map a maximal +sized packet onto the UNIBUS, while \fIhlen\fP specifies the size +of a local network header, if any, which should be mapped separately +from the data (see the description of trailer protocols in chapter 14). +Sufficient UNIBUS mapping registers and pages of memory are allocated +to initialize the input data path for an initial read. For the output +data path, mapping registers and pages of memory are also allocated +and mapped onto the UNIBUS. The pages associated with the output +data path are held in reserve in the event a write requires copying +non-page-aligned data (see \fIif_wubaput\fP below). +If \fIif_ubainit\fP is called with memory pages already allocated, +they will be used instead of allocating new ones (this normally +occurs after a UNIBUS reset). +A 1 is returned when allocation and initialization are successful, +0 otherwise. +.LP +m = if_ubaget(ifubinfo, ifr, totlen, off0, ifp); +.br +m = if_rubaget(ifuba, totlen, off0, ifp); +.IP +\fIif_ubaget\fP and \fIif_rubaget\fP pull input data +out of an interface receive buffer and into an mbuf chain. +The first interface passes pointers to the \fIifubinfo\fP structure +for the interface and the \fIifrw\fP structure for the receive buffer; +the second call may be used for single-buffered devices. +\fItotlen\fP specifies the length of data to be obtained, not counting the +local network header. If \fIoff0\fP is non-zero, it indicates +a byte offset to a trailing local network header which should be +copied into a separate mbuf and prepended to the front of the resultant mbuf +chain. When the data amount to at least a half a page, +the previously mapped data pages are remapped +into the mbufs and swapped with fresh pages, thus avoiding +any copy. +The receiving interface is recorded as \fIifp\fP, a pointer to an \fIifnet\fP +structure, for the use of the receiving network protocol. +A 0 return value indicates a failure to allocate resources. +.LP +if_wubaput(ifubinfo, ifx, m); +.br +if_wubaput(ifuba, m); +.IP +\fIif_ubaput\fP and \fIif_wubaput\fP map a chain of mbufs +onto a network interface in preparation for output. +The first interface is used by devices with multiple transmit buffers. +The chain includes any local network +header, which is copied so that it resides in the mapped and +aligned I/O space. +Page-aligned data that are page-aligned in the output buffer +are mapped to the UNIBUS in place of the normal buffer page, +and the corresponding mbuf is placed on a queue to be freed after transmission. +Any other mbufs which contained non-page-sized +data portions are copied to the I/O space and then freed. +Pages mapped from a previous output operation (no longer needed) +are unmapped. diff --git a/share/doc/smm/18.net/7.t b/share/doc/smm/18.net/7.t new file mode 100644 index 000000000000..8650709389b8 --- /dev/null +++ b/share/doc/smm/18.net/7.t @@ -0,0 +1,250 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.br +.ne 30v +.\".ds RH "Socket/protocol interface +.NH +\s+2Socket/protocol interface\s0 +.PP +The interface between the socket routines and the communication +protocols is through the \fIpr_usrreq\fP routine defined in the +protocol switch table. The following requests to a protocol +module are possible: +.DS +._d +#define PRU_ATTACH 0 /* attach protocol */ +#define PRU_DETACH 1 /* detach protocol */ +#define PRU_BIND 2 /* bind socket to address */ +#define PRU_LISTEN 3 /* listen for connection */ +#define PRU_CONNECT 4 /* establish connection to peer */ +#define PRU_ACCEPT 5 /* accept connection from peer */ +#define PRU_DISCONNECT 6 /* disconnect from peer */ +#define PRU_SHUTDOWN 7 /* won't send any more data */ +#define PRU_RCVD 8 /* have taken data; more room now */ +#define PRU_SEND 9 /* send this data */ +#define PRU_ABORT 10 /* abort (fast DISCONNECT, DETATCH) */ +#define PRU_CONTROL 11 /* control operations on protocol */ +#define PRU_SENSE 12 /* return status into m */ +#define PRU_RCVOOB 13 /* retrieve out of band data */ +#define PRU_SENDOOB 14 /* send out of band data */ +#define PRU_SOCKADDR 15 /* fetch socket's address */ +#define PRU_PEERADDR 16 /* fetch peer's address */ +#define PRU_CONNECT2 17 /* connect two sockets */ +/* begin for protocols internal use */ +#define PRU_FASTTIMO 18 /* 200ms timeout */ +#define PRU_SLOWTIMO 19 /* 500ms timeout */ +#define PRU_PROTORCV 20 /* receive from below */ +#define PRU_PROTOSEND 21 /* send to below */ +.DE +A call on the user request routine is of the form, +.DS +._f +error = (*protosw[].pr_usrreq)(so, req, m, addr, rights); +int error; struct socket *so; int req; struct mbuf *m, *addr, *rights; +.DE +The mbuf data chain \fIm\fP is supplied for output operations +and for certain other operations where it is to receive a result. +The address \fIaddr\fP is supplied for address-oriented requests +such as PRU_BIND and PRU_CONNECT. +The \fIrights\fP parameter is an optional pointer to an mbuf +chain containing user-specified capabilities (see the \fIsendmsg\fP +and \fIrecvmsg\fP system calls). The protocol is responsible for +disposal of the data mbuf chains on output operations. +A non-zero return value gives a +UNIX error number which should be passed to higher level software. +The following paragraphs describe each +of the requests possible. +.IP PRU_ATTACH +.br +When a protocol is bound to a socket (with the \fIsocket\fP +system call) the protocol module is called with this +request. It is the responsibility of the protocol module to +allocate any resources necessary. +The ``attach'' request +will always precede any of the other requests, and should not +occur more than once. +.IP PRU_DETACH +.br +This is the antithesis of the attach request, and is used +at the time a socket is deleted. The protocol module may +deallocate any resources assigned to the socket. +.IP PRU_BIND +.br +When a socket is initially created it has no address bound +to it. This request indicates that an address should be bound to +an existing socket. The protocol module must verify that the +requested address is valid and available for use. +.IP PRU_LISTEN +.br +The ``listen'' request indicates the user wishes to listen +for incoming connection requests on the associated socket. +The protocol module should perform any state changes needed +to carry out this request (if possible). A ``listen'' request +always precedes a request to accept a connection. +.IP PRU_CONNECT +.br +The ``connect'' request indicates the user wants to establish +an association. The \fIaddr\fP parameter supplied describes +the peer to be connected to. The effect of a connect request +may vary depending on the protocol. Virtual circuit protocols, +such as TCP [Postel81b], use this request to initiate establishment of a +TCP connection. Datagram protocols, such as UDP [Postel80], simply +record the peer's address in a private data structure and use +it to tag all outgoing packets. There are no restrictions +on how many times a connect request may be used after an attach. +If a protocol supports the notion of \fImulti-casting\fP, it +is possible to use multiple connects to establish a multi-cast +group. Alternatively, an association may be broken by a +PRU_DISCONNECT request, and a new association created with a +subsequent connect request; all without destroying and creating +a new socket. +.IP PRU_ACCEPT +.br +Following a successful PRU_LISTEN request and the arrival +of one or more connections, this request is made to +indicate the user +has accepted the first connection on the queue of +pending connections. The protocol module should fill +in the supplied address buffer with the address of the +connected party. +.IP PRU_DISCONNECT +.br +Eliminate an association created with a PRU_CONNECT request. +.IP PRU_SHUTDOWN +.br +This call is used to indicate no more data will be sent and/or +received (the \fIaddr\fP parameter indicates the direction of +the shutdown, as encoded in the \fIsoshutdown\fP system call). +The protocol may, at its discretion, deallocate any data +structures related to the shutdown and/or notify a connected peer +of the shutdown. +.IP PRU_RCVD +.br +This request is made only if the protocol entry in the protocol +switch table includes the PR_WANTRCVD flag. +When a user removes data from the receive queue this request +will be sent to the protocol module. It may be used to trigger +acknowledgements, refresh windowing information, initiate +data transfer, etc. +.IP PRU_SEND +.br +Each user request to send data is translated into one or more +PRU_SEND requests (a protocol may indicate that a single user +send request must be translated into a single PRU_SEND request by +specifying the PR_ATOMIC flag in its protocol description). +The data to be sent is presented to the protocol as a list of +mbufs and an address is, optionally, supplied in the \fIaddr\fP +parameter. The protocol is responsible for preserving the data +in the socket's send queue if it is not able to send it immediately, +or if it may need it at some later time (e.g. for retransmission). +.IP PRU_ABORT +.br +This request indicates an abnormal termination of service. The +protocol should delete any existing association(s). +.IP PRU_CONTROL +.br +The ``control'' request is generated when a user performs a +UNIX \fIioctl\fP system call on a socket (and the ioctl is not +intercepted by the socket routines). It allows protocol-specific +operations to be provided outside the scope of the common socket +interface. The \fIaddr\fP parameter contains a pointer to a static +kernel data area where relevant information may be obtained or returned. +The \fIm\fP parameter contains the actual \fIioctl\fP request code +(note the non-standard calling convention). +The \fIrights\fP parameter contains a pointer to an \fIifnet\fP structure +if the \fIioctl\fP operation pertains to a particular network interface. +.IP PRU_SENSE +.br +The ``sense'' request is generated when the user makes an \fIfstat\fP +system call on a socket; it requests status of the associated socket. +This currently returns a standard \fIstat\fP structure. +It typically contains only the +optimal transfer size for the connection (based on buffer size, +windowing information and maximum packet size). +The \fIm\fP parameter contains a pointer +to a static kernel data area where the status buffer should be placed. +.IP PRU_RCVOOB +.br +Any ``out-of-band'' data presently available is to be returned. An +mbuf is passed to the protocol module, and the protocol +should either place +data in the mbuf or attach new mbufs to the one supplied if there is +insufficient space in the single mbuf. +An error may be returned if out-of-band data is not (yet) available +or has already been consumed. +The \fIaddr\fP parameter contains any options such as MSG_PEEK +to examine data without consuming it. +.IP PRU_SENDOOB +.br +Like PRU_SEND, but for out-of-band data. +.IP PRU_SOCKADDR +.br +The local address of the socket is returned, if any is currently +bound to it. The address (with protocol specific format) is returned +in the \fIaddr\fP parameter. +.IP PRU_PEERADDR +.br +The address of the peer to which the socket is connected is returned. +The socket must be in a SS_ISCONNECTED state for this request to +be made to the protocol. The address format (protocol specific) is +returned in the \fIaddr\fP parameter. +.IP PRU_CONNECT2 +.br +The protocol module is supplied two sockets and requested to +establish a connection between the two without binding any +addresses, if possible. This call is used in implementing +the +.IR socketpair (2) +system call. +.PP +The following requests are used internally by the protocol modules +and are never generated by the socket routines. In certain instances, +they are handed to the \fIpr_usrreq\fP routine solely for convenience +in tracing a protocol's operation (e.g. PRU_SLOWTIMO). +.IP PRU_FASTTIMO +.br +A ``fast timeout'' has occurred. This request is made when a timeout +occurs in the protocol's \fIpr_fastimo\fP routine. The \fIaddr\fP +parameter indicates which timer expired. +.IP PRU_SLOWTIMO +.br +A ``slow timeout'' has occurred. This request is made when a timeout +occurs in the protocol's \fIpr_slowtimo\fP routine. The \fIaddr\fP +parameter indicates which timer expired. +.IP PRU_PROTORCV +.br +This request is used in the protocol-protocol interface, not by the +routines. It requests reception of data destined for the protocol and +not the user. No protocols currently use this facility. +.IP PRU_PROTOSEND +.br +This request allows a protocol to send data destined for another +protocol module, not a user. The details of how data is marked +``addressed to protocol'' instead of ``addressed to user'' are +left to the protocol modules. No protocols currently use this facility. diff --git a/share/doc/smm/18.net/8.t b/share/doc/smm/18.net/8.t new file mode 100644 index 000000000000..0f9abc4ecd27 --- /dev/null +++ b/share/doc/smm/18.net/8.t @@ -0,0 +1,160 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH "Protocol/protocol interface +.br +.ne 2i +.NH +\s+2Protocol/protocol interface\s0 +.PP +The interface between protocol modules is through the \fIpr_usrreq\fP, +\fIpr_input\fP, \fIpr_output\fP, \fIpr_ctlinput\fP, and +\fIpr_ctloutput\fP routines. The calling conventions for all +but the \fIpr_usrreq\fP routine are expected to be specific to +the protocol +modules and are not guaranteed to be consistent across protocol +families. We +will examine the conventions used for some of the Internet +protocols in this section as an example. +.NH 2 +pr_output +.PP +The Internet protocol UDP uses the convention, +.DS +error = udp_output(inp, m); +int error; struct inpcb *inp; struct mbuf *m; +.DE +where the \fIinp\fP, ``\fIin\fP\^ternet +\fIp\fP\^rotocol \fIc\fP\^ontrol \fIb\fP\^lock'', +passed between modules conveys per connection state information, and +the mbuf chain contains the data to be sent. UDP +performs consistency checks, appends its header, calculates a +checksum, etc. before passing the packet on. +UDP is based on the Internet Protocol, IP [Postel81a], as its transport. +UDP passes a packet to the IP module for output as follows: +.DS +error = ip_output(m, opt, ro, flags); +int error; struct mbuf *m, *opt; struct route *ro; int flags; +.DE +.PP +The call to IP's output routine is more complicated than that for +UDP, as befits the additional work the IP module must do. +The \fIm\fP parameter is the data to be sent, and the \fIopt\fP +parameter is an optional list of IP options which should +be placed in the IP packet header. The \fIro\fP parameter is +is used in making routing decisions (and passing them back to the +caller for use in subsequent calls). The +final parameter, \fIflags\fP contains flags indicating whether the +user is allowed to transmit a broadcast packet +and if routing is to be performed. The broadcast flag may +be inconsequential if the underlying hardware does not support the +notion of broadcasting. +.PP +All output routines return 0 on success and a UNIX error number +if a failure occurred which could be detected immediately +(no buffer space available, no route to destination, etc.). +.NH 2 +pr_input +.PP +Both UDP and TCP use the following calling convention, +.DS +(void) (*protosw[].pr_input)(m, ifp); +struct mbuf *m; struct ifnet *ifp; +.DE +Each mbuf list passed is a single packet to be processed by +the protocol module. +The interface from which the packet was received is passed as the second +parameter. +.PP +The IP input routine is a VAX software interrupt level routine, +and so is not called with any parameters. It instead communicates +with network interfaces through a queue, \fIipintrq\fP, which is +identical in structure to the queues used by the network interfaces +for storing packets awaiting transmission. +The software interrupt is enabled by the network interfaces +when they place input data on the input queue. +.NH 2 +pr_ctlinput +.PP +This routine is used to convey ``control'' information to a +protocol module (i.e. information which might be passed to the +user, but is not data). +.PP +The common calling convention for this routine is, +.DS +(void) (*protosw[].pr_ctlinput)(req, addr); +int req; struct sockaddr *addr; +.DE +The \fIreq\fP parameter is one of the following, +.DS +.ta \w'#define 'u +\w'PRC_UNREACH_NEEDFRAG 'u +8n +#define PRC_IFDOWN 0 /* interface transition */ +#define PRC_ROUTEDEAD 1 /* select new route if possible */ +#define PRC_QUENCH 4 /* some said to slow down */ +#define PRC_MSGSIZE 5 /* message size forced drop */ +#define PRC_HOSTDEAD 6 /* normally from IMP */ +#define PRC_HOSTUNREACH 7 /* ditto */ +#define PRC_UNREACH_NET 8 /* no route to network */ +#define PRC_UNREACH_HOST 9 /* no route to host */ +#define PRC_UNREACH_PROTOCOL 10 /* dst says bad protocol */ +#define PRC_UNREACH_PORT 11 /* bad port # */ +#define PRC_UNREACH_NEEDFRAG 12 /* IP_DF caused drop */ +#define PRC_UNREACH_SRCFAIL 13 /* source route failed */ +#define PRC_REDIRECT_NET 14 /* net routing redirect */ +#define PRC_REDIRECT_HOST 15 /* host routing redirect */ +#define PRC_REDIRECT_TOSNET 14 /* redirect for type of service & net */ +#define PRC_REDIRECT_TOSHOST 15 /* redirect for tos & host */ +#define PRC_TIMXCEED_INTRANS 18 /* packet lifetime expired in transit */ +#define PRC_TIMXCEED_REASS 19 /* lifetime expired on reass q */ +#define PRC_PARAMPROB 20 /* header incorrect */ +.DE +while the \fIaddr\fP parameter is the address to which the condition applies. +Many of the requests have obviously been +derived from ICMP (the Internet Control Message Protocol [Postel81c]), +and from error messages defined in the 1822 host/IMP convention +[BBN78]. Mapping tables exist to convert +control requests to UNIX error codes which are delivered +to a user. +.NH 2 +pr_ctloutput +.PP +This is the routine that implements per-socket options at the protocol +level for \fIgetsockopt\fP and \fIsetsockopt\fP. +The calling convention is, +.DS +error = (*protosw[].pr_ctloutput)(op, so, level, optname, mp); +int op; struct socket *so; int level, optname; struct mbuf **mp; +.DE +where \fIop\fP is one of PRCO_SETOPT or PRCO_GETOPT, +\fIso\fP is the socket from whence the call originated, +and \fIlevel\fP and \fIoptname\fP are the protocol level and option name +supplied by the user. +The results of a PRCO_GETOPT call are returned in an mbuf whose address +is placed in \fImp\fP before return. +On a PRCO_SETOPT call, \fImp\fP contains the address of an mbuf +containing the option data; the mbuf should be freed before return. diff --git a/share/doc/smm/18.net/9.t b/share/doc/smm/18.net/9.t new file mode 100644 index 000000000000..02e12b9603e1 --- /dev/null +++ b/share/doc/smm/18.net/9.t @@ -0,0 +1,118 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH "Protocol/network-interface +.br +.ne 2i +.NH +\s+2Protocol/network-interface interface\s0 +.PP +The lowest layer in the set of protocols which comprise a +protocol family must interface itself to one or more network +interfaces in order to transmit and receive +packets. It is assumed that +any routing decisions have been made before handing a packet +to a network interface, in fact this is absolutely necessary +in order to locate any interface at all (unless, of course, +one uses a single ``hardwired'' interface). There are two +cases with which to be concerned, transmission of a packet +and receipt of a packet; each will be considered separately. +.NH 2 +Packet transmission +.PP +Assuming a protocol has a handle on an interface, \fIifp\fP, +a (struct ifnet\ *), +it transmits a fully formatted packet with the following call, +.DS +error = (*ifp->if_output)(ifp, m, dst) +int error; struct ifnet *ifp; struct mbuf *m; struct sockaddr *dst; +.DE +The output routine for the network interface transmits the packet +\fIm\fP to the \fIdst\fP address, or returns an error indication +(a UNIX error number). In reality transmission may +not be immediate or successful; normally the output +routine simply queues the packet on its send queue and primes +an interrupt driven routine to actually transmit the packet. +For unreliable media, such as the Ethernet, ``successful'' +transmission simply means that the packet has been placed on the cable +without a collision. On the other hand, an 1822 interface guarantees +proper delivery or an error indication for each message transmitted. +The model employed in the networking system attaches no promises +of delivery to the packets handed to a network interface, and thus +corresponds more closely to the Ethernet. Errors returned by the +output routine are only those that can be detected immediately, +and are normally trivial in nature (no buffer space, +address format not handled, etc.). +No indication is received if errors are detected after the call has returned. +.NH 2 +Packet reception +.PP +Each protocol family must have one or more ``lowest level'' protocols. +These protocols deal with internetwork addressing and are responsible +for the delivery of incoming packets to the proper protocol processing +modules. In the PUP model [Boggs78] these protocols are termed Level +1 protocols, +in the ISO model, network layer protocols. In this system each such +protocol module has an input packet queue assigned to it. Incoming +packets received by a network interface are queued for the protocol +module, and a VAX software interrupt is posted to initiate processing. +.PP +Three macros are available for queuing and dequeuing packets: +.IP "IF_ENQUEUE(ifq, m)" +.br +This places the packet \fIm\fP at the tail of the queue \fIifq\fP. +.IP "IF_DEQUEUE(ifq, m)" +.br +This places a pointer to the packet at the head of queue \fIifq\fP +in \fIm\fP +and removes the packet from the queue. +A zero value will be returned in \fIm\fP if the queue is empty. +.IP "IF_DEQUEUEIF(ifq, m, ifp)" +.br +Like IF_DEQUEUE, this removes the next packet from the head of a queue +and returns it in \fIm\fP. +A pointer to the interface on which the packet was received +is placed in \fIifp\fP, a (struct ifnet\ *). +.IP "IF_PREPEND(ifq, m)" +.br +This places the packet \fIm\fP at the head of the queue \fIifq\fP. +.PP +Each queue has a maximum length associated with it as a simple form +of congestion control. The macro IF_QFULL(ifq) returns 1 if the queue +is filled, in which case the macro IF_DROP(ifq) should be used to +increment the count of the number of packets dropped, and the offending +packet is dropped. For example, the following code fragment is commonly +found in a network interface's input routine, +.DS +._f +if (IF_QFULL(inq)) { + IF_DROP(inq); + m_freem(m); +} else + IF_ENQUEUE(inq, m); +.DE diff --git a/share/doc/smm/18.net/Makefile b/share/doc/smm/18.net/Makefile new file mode 100644 index 000000000000..beb0b147771a --- /dev/null +++ b/share/doc/smm/18.net/Makefile @@ -0,0 +1,5 @@ +VOLUME= smm/18.net +SRCS= 0.t 1.t 2.t 3.t 4.t 5.t 6.t 7.t 8.t 9.t a.t b.t c.t d.t e.t f.t +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/smm/18.net/a.t b/share/doc/smm/18.net/a.t new file mode 100644 index 000000000000..35c5f5d9aa13 --- /dev/null +++ b/share/doc/smm/18.net/a.t @@ -0,0 +1,213 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH "Gateways and routing +.br +.ne 2i +.NH +\s+2Gateways and routing issues\s0 +.PP +The system has been designed with the expectation that it will +be used in an internetwork environment. The ``canonical'' +environment was envisioned to be a collection of local area +networks connected at one or more points through hosts with +multiple network interfaces (one on each local area network), +and possibly a connection to a long haul network (for example, +the ARPANET). In such an environment, issues of +gatewaying and packet routing become very important. Certain +of these issues, such as congestion +control, have been handled in a simplistic manner or specifically +not addressed. +Instead, where possible, the network system +attempts to provide simple mechanisms upon which more involved +policies may be implemented. As some of these problems become +better understood, the solutions developed will be incorporated +into the system. +.PP +This section will describe the facilities provided for packet +routing. The simplistic mechanisms provided for congestion +control are described in chapter 12. +.NH 2 +Routing tables +.PP +The network system maintains a set of routing tables for +selecting a network interface to use in delivering a +packet to its destination. These tables are of the form: +.DS +.ta \w'struct 'u +\w'u_long 'u +\w'sockaddr rt_gateway; 'u +struct rtentry { + u_long rt_hash; /* hash key for lookups */ + struct sockaddr rt_dst; /* destination net or host */ + struct sockaddr rt_gateway; /* forwarding agent */ + short rt_flags; /* see below */ + short rt_refcnt; /* no. of references to structure */ + u_long rt_use; /* packets sent using route */ + struct ifnet *rt_ifp; /* interface to give packet to */ +}; +.DE +.PP +The routing information is organized in two separate tables, one +for routes to a host and one for routes to a network. The +distinction between hosts and networks is necessary so +that a single mechanism may be used +for both broadcast and multi-drop type networks, and +also for networks built from point-to-point links (e.g +DECnet [DEC80]). +.PP +Each table is organized as a hashed set of linked lists. +Two 32-bit hash values are calculated by routines defined for +each address family; one based on the destination being +a host, and one assuming the target is the network portion +of the address. Each hash value is used to +locate a hash chain to search (by taking the value modulo the +hash table size) and the entire 32-bit value is then +used as a key in scanning the list of routes. Lookups are +applied first to the routing +table for hosts, then to the routing table for networks. +If both lookups fail, a final lookup is made for a ``wildcard'' +route (by convention, network 0). +The first appropriate route discovered is used. +By doing this, routes to a specific host on a network may be +present as well as routes to the network. This also allows a +``fall back'' network route to be defined to a ``smart'' gateway +which may then perform more intelligent routing. +.PP +Each routing table entry contains a destination (the desired final destination), +a gateway to which to send the packet, +and various flags which indicate the route's status and type (host or +network). A count +of the number of packets sent using the route is kept, along +with a count of ``held references'' to the dynamically +allocated structure to insure that memory reclamation +occurs only when the route is not in use. Finally, a pointer to the +a network interface is kept; packets sent using +the route should be handed to this interface. +.PP +Routes are typed in two ways: either as host or network, and as +``direct'' or ``indirect''. The host/network +distinction determines how to compare the \fIrt_dst\fP field +during lookup. If the route is to a network, only a packet's +destination network is compared to the \fIrt_dst\fP entry stored +in the table. If the route is to a host, the addresses must +match bit for bit. +.PP +The distinction between ``direct'' and ``indirect'' routes indicates +whether the destination is directly connected to the source. +This is needed when performing local network encapsulation. If +a packet is destined for a peer at a host or network which is +not directly connected to the source, the internetwork packet +header will +contain the address of the eventual destination, while +the local network header will address the intervening +gateway. Should the destination be directly connected, these addresses +are likely to be identical, or a mapping between the two exists. +The RTF_GATEWAY flag indicates that the route is to an ``indirect'' +gateway agent, and that the local network header should be filled in +from the \fIrt_gateway\fP field instead of +from the final internetwork destination address. +.PP +It is assumed that multiple routes to the same destination will not +be present; only one of multiple routes, that most recently installed, +will be used. +.PP +Routing redirect control messages are used to dynamically +modify existing routing table entries as well as dynamically +create new routing table entries. On hosts where exhaustive +routing information is too expensive to maintain (e.g. work +stations), the +combination of wildcard routing entries and routing redirect +messages can be used to provide a simple routing management +scheme without the use of a higher level policy process. +Current connections may be rerouted after notification of the protocols +by means of their \fIpr_ctlinput\fP entries. +Statistics are kept by the routing table routines +on the use of routing redirect messages and their +affect on the routing tables. These statistics may be viewed using +.IR netstat (1). +.PP +Status information other than routing redirect control messages +may be used in the future, but at present they are ignored. +Likewise, more intelligent ``metrics'' may be used to describe +routes in the future, possibly based on bandwidth and monetary +costs. +.NH 2 +Routing table interface +.PP +A protocol accesses the routing tables through +three routines, +one to allocate a route, one to free a route, and one +to process a routing redirect control message. +The routine \fIrtalloc\fP performs route allocation; it is +called with a pointer to the following structure containing +the desired destination: +.DS +._f +struct route { + struct rtentry *ro_rt; + struct sockaddr ro_dst; +}; +.DE +The route returned is assumed ``held'' by the caller until +released with an \fIrtfree\fP call. Protocols which implement +virtual circuits, such as TCP, hold onto routes for the duration +of the circuit's lifetime, while connection-less protocols, +such as UDP, allocate and free routes whenever their destination address +changes. +.PP +The routine \fIrtredirect\fP is called to process a routing redirect +control message. It is called with a destination address, +the new gateway to that destination, and the source of the redirect. +Redirects are accepted only from the current router for the destination. +If a non-wildcard route +exists to the destination, the gateway entry in the route is modified +to point at the new gateway supplied. Otherwise, a new routing +table entry is inserted reflecting the information supplied. Routes +to interfaces and routes to gateways which are not directly accessible +from the host are ignored. +.NH 2 +User level routing policies +.PP +Routing policies implemented in user processes manipulate the +kernel routing tables through two \fIioctl\fP calls. The +commands SIOCADDRT and SIOCDELRT add and delete routing entries, +respectively; the tables are read through the /dev/kmem device. +The decision to place policy decisions in a user process implies +that routing table updates may lag a bit behind the identification of +new routes, or the failure of existing routes, but this period +of instability is normally very small with proper implementation +of the routing process. Advisory information, such as ICMP +error messages and IMP diagnostic messages, may be read from +raw sockets (described in the next section). +.PP +Several routing policy processes have already been implemented. The +system standard +``routing daemon'' uses a variant of the Xerox NS Routing Information +Protocol [Xerox82] to maintain up-to-date routing tables in our local +environment. Interaction with other existing routing protocols, +such as the Internet EGP (Exterior Gateway Protocol), has been +accomplished using a similar process. diff --git a/share/doc/smm/18.net/b.t b/share/doc/smm/18.net/b.t new file mode 100644 index 000000000000..a62e45342822 --- /dev/null +++ b/share/doc/smm/18.net/b.t @@ -0,0 +1,139 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH "Raw sockets +.br +.ne 2i +.NH +\s+2Raw sockets\s0 +.PP +A raw socket is an object which allows users direct access +to a lower-level protocol. Raw sockets are intended for knowledgeable +processes which wish to take advantage of some protocol +feature not directly accessible through the normal interface, or +for the development of new protocols built atop existing lower level +protocols. For example, a new version of TCP might be developed at the +user level by utilizing a raw IP socket for delivery of packets. +The raw IP socket interface attempts to provide an identical interface +to the one a protocol would have if it were resident in the kernel. +.PP +The raw socket support is built around a generic raw socket interface, +(possibly) augmented by protocol-specific processing routines. +This section will describe the core of the raw socket interface. +.NH 2 +Control blocks +.PP +Every raw socket has a protocol control block of the following form: +.DS +.ta \w'struct 'u +\w'caddr_t 'u +\w'sockproto rcb_proto; 'u +struct rawcb { + struct rawcb *rcb_next; /* doubly linked list */ + struct rawcb *rcb_prev; + struct socket *rcb_socket; /* back pointer to socket */ + struct sockaddr rcb_faddr; /* destination address */ + struct sockaddr rcb_laddr; /* socket's address */ + struct sockproto rcb_proto; /* protocol family, protocol */ + caddr_t rcb_pcb; /* protocol specific stuff */ + struct mbuf *rcb_options; /* protocol specific options */ + struct route rcb_route; /* routing information */ + short rcb_flags; +}; +.DE +All the control blocks are kept on a doubly linked list for +performing lookups during packet dispatch. Associations may +be recorded in the control block and used by the output routine +in preparing packets for transmission. +The \fIrcb_proto\fP structure contains the protocol family and protocol +number with which the raw socket is associated. +The protocol, family and addresses are +used to filter packets on input; this will be described in more +detail shortly. If any protocol-specific information is required, +it may be attached to the control block using the \fIrcb_pcb\fP +field. +Protocol-specific options for transmission in outgoing packets +may be stored in \fIrcb_options\fP. +.PP +A raw socket interface is datagram oriented. That is, each send +or receive on the socket requires a destination address. This +address may be supplied by the user or stored in the control block +and automatically installed in the outgoing packet by the output +routine. Since it is not possible to determine whether an address +is present or not in the control block, two flags, RAW_LADDR and +RAW_FADDR, indicate if a local and foreign address are present. +Routing is expected to be performed by the underlying protocol +if necessary. +.NH 2 +Input processing +.PP +Input packets are ``assigned'' to raw sockets based on a simple +pattern matching scheme. Each network interface or protocol +gives unassigned packets +to the raw input routine with the call: +.DS +raw_input(m, proto, src, dst) +struct mbuf *m; struct sockproto *proto, struct sockaddr *src, *dst; +.DE +The data packet then has a generic header prepended to it of the +form +.DS +._f +struct raw_header { + struct sockproto raw_proto; + struct sockaddr raw_dst; + struct sockaddr raw_src; +}; +.DE +and it is placed in a packet queue for the ``raw input protocol'' module. +Packets taken from this queue are copied into any raw sockets that +match the header according to the following rules, +.IP 1) +The protocol family of the socket and header agree. +.IP 2) +If the protocol number in the socket is non-zero, then it agrees +with that found in the packet header. +.IP 3) +If a local address is defined for the socket, the address format +of the local address is the same as the destination address's and +the two addresses agree bit for bit. +.IP 4) +The rules of 3) are applied to the socket's foreign address and the packet's +source address. +.LP +A basic assumption is that addresses present in the +control block and packet header (as constructed by the network +interface and any raw input protocol module) are in a canonical +form which may be ``block compared''. +.NH 2 +Output processing +.PP +On output the raw \fIpr_usrreq\fP routine +passes the packet and a pointer to the raw control block to the +raw protocol output routine for any processing required before +it is delivered to the appropriate network interface. The +output routine is normally the only code required to implement +a raw socket interface. diff --git a/share/doc/smm/18.net/c.t b/share/doc/smm/18.net/c.t new file mode 100644 index 000000000000..49245b81e1a0 --- /dev/null +++ b/share/doc/smm/18.net/c.t @@ -0,0 +1,145 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH "Buffering and congestion control +.br +.ne 2i +.NH +\s+2Buffering and congestion control\s0 +.PP +One of the major factors in the performance of a protocol is +the buffering policy used. Lack of a proper buffering policy +can force packets to be dropped, cause falsified windowing +information to be emitted by protocols, fragment host memory, +degrade the overall host performance, etc. Due to problems +such as these, most systems allocate a fixed pool of memory +to the networking system and impose +a policy optimized for ``normal'' network operation. +.PP +The networking system developed for UNIX is little different in this +respect. At boot time a fixed amount of memory is allocated by +the networking system. At later times more system memory +may be requested as the need arises, but at no time is +memory ever returned to the system. It is possible to +garbage collect memory from the network, but difficult. In +order to perform this garbage collection properly, some +portion of the network will have to be ``turned off'' as +data structures are updated. The interval over which this +occurs must kept small compared to the average inter-packet +arrival time, or too much traffic may +be lost, impacting other hosts on the network, as well as +increasing load on the interconnecting mediums. In our +environment we have not experienced a need for such compaction, +and thus have left the problem unresolved. +.PP +The mbuf structure was introduced in chapter 5. In this +section a brief description will be given of the allocation +mechanisms, and policies used by the protocols in performing +connection level buffering. +.NH 2 +Memory management +.PP +The basic memory allocation routines manage a private page map, +the size of which determines the maximum amount of memory +that may be allocated by the network. +A small amount of memory is allocated at boot time +to initialize the mbuf and mbuf page cluster free lists. +When the free lists are exhausted, more memory is requested +from the system memory allocator if space remains in the map. +If memory cannot be allocated, +callers may block awaiting free memory, +or the failure may be reflected to the caller immediately. +The allocator will not block awaiting free map entries, however, +as exhaustion of the page map usually indicates that buffers have been lost +due to a ``leak.'' +The private page table is used by the network buffer management +routines in remapping pages to +be logically contiguous as the need arises. In addition, an +array of reference counts parallels the page table and is used +when multiple references to a page are present. +.PP +Mbufs are 128 byte structures, 8 fitting in a 1Kbyte +page of memory. When data is placed in mbufs, +it is copied or remapped into logically contiguous pages of +memory from the network page pool if possible. +Data smaller than half of the size +of a page is copied into one or more 112 byte mbuf data areas. +.NH 2 +Protocol buffering policies +.PP +Protocols reserve fixed amounts of +buffering for send and receive queues at socket creation time. These +amounts define the high and low water marks used by the socket routines +in deciding when to block and unblock a process. The reservation +of space does not currently +result in any action by the memory management +routines. +.PP +Protocols which provide connection level flow control do this +based on the amount of space in the associated socket queues. That +is, send windows are calculated based on the amount of free space +in the socket's receive queue, while receive windows are adjusted +based on the amount of data awaiting transmission in the send queue. +Care has been taken to avoid the ``silly window syndrome'' described +in [Clark82] at both the sending and receiving ends. +.NH 2 +Queue limiting +.PP +Incoming packets from the network are always received unless +memory allocation fails. However, each Level 1 protocol +input queue +has an upper bound on the queue's length, and any packets +exceeding that bound are discarded. It is possible for a host to be +overwhelmed by excessive network traffic (for instance a host +acting as a gateway from a high bandwidth network to a low bandwidth +network). As a ``defensive'' mechanism the queue limits may be +adjusted to throttle network traffic load on a host. +Consider a host willing to devote some percentage of +its machine to handling network traffic. +If the cost of handling an +incoming packet can be calculated so that an acceptable +``packet handling rate'' +can be determined, then input queue lengths may be dynamically +adjusted based on a host's network load and the number of packets +awaiting processing. Obviously, discarding packets is +not a satisfactory solution to a problem such as this +(simply dropping packets is likely to increase the load on a network); +the queue lengths were incorporated mainly as a safeguard mechanism. +.NH 2 +Packet forwarding +.PP +When packets can not be forwarded because of memory limitations, +the system attempts to generate a ``source quench'' message. In addition, +any other problems encountered during packet forwarding are also +reflected back to the sender in the form of ICMP packets. This +helps hosts avoid unneeded retransmissions. +.PP +Broadcast packets are never forwarded due to possible dire +consequences. In an early stage of network development, broadcast +packets were forwarded and a ``routing loop'' resulted in network +saturation and every host on the network crashing. diff --git a/share/doc/smm/18.net/d.t b/share/doc/smm/18.net/d.t new file mode 100644 index 000000000000..2f914407b7ff --- /dev/null +++ b/share/doc/smm/18.net/d.t @@ -0,0 +1,67 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH "Out of band data +.br +.ne 2i +.NH +\s+2Out of band data\s0 +.PP +Out of band data is a facility peculiar to the stream socket +abstraction defined. Little agreement appears to exist as +to what its semantics should be. TCP defines the notion of +``urgent data'' as in-line, while the NBS protocols [Burruss81] +and numerous others provide a fully independent logical +transmission channel along which out of band data is to be +sent. +In addition, the amount of the data which may be sent as an out +of band message varies from protocol to protocol; everything +from 1 bit to 16 bytes or more. +.PP +A stream socket's notion of out of band data has been defined +as the lowest reasonable common denominator (at least reasonable +in our minds); +clearly this is subject to debate. Out of band data is expected +to be transmitted out of the normal sequencing and flow control +constraints of the data stream. A minimum of 1 byte of out of +band data and one outstanding out of band message are expected to +be supported by the protocol supporting a stream socket. +It is a protocol's prerogative to support larger-sized messages, or +more than one outstanding out of band message at a time. +.PP +Out of band data is maintained by the protocol and is usually not +stored in the socket's receive queue. +A socket-level option, SO_OOBINLINE, +is provided to force out-of-band data to be placed in the normal +receive queue when urgent data is received; +this sometimes amelioriates problems due to loss of data +when multiple out-of-band +segments are received before the first has been passed to the user. +The PRU_SENDOOB and PRU_RCVOOB +requests to the \fIpr_usrreq\fP routine are used in sending and +receiving data. diff --git a/share/doc/smm/18.net/e.t b/share/doc/smm/18.net/e.t new file mode 100644 index 000000000000..3c64a9727d52 --- /dev/null +++ b/share/doc/smm/18.net/e.t @@ -0,0 +1,123 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH "Trailer protocols +.br +.ne 2i +.NH +\s+2Trailer protocols\s0 +.PP +Core to core copies can be expensive. +Consequently, a great deal of effort was spent +in minimizing such operations. The VAX architecture +provides virtual memory hardware organized in +page units. To cut down on copy operations, data +is kept in page-sized units on page-aligned +boundaries whenever possible. This allows data +to be moved in memory simply by remapping the page +instead of copying. The mbuf and network +interface routines perform page table manipulations +where needed, hiding the complexities of the VAX +virtual memory hardware from higher level code. +.PP +Data enters the system in two ways: from the user, +or from the network (hardware interface). When data +is copied from the user's address space +into the system it is deposited in pages (if sufficient +data is present). +This encourages the user to transmit information in +messages which are a multiple of the system page size. +.PP +Unfortunately, performing a similar operation when taking +data from the network is very difficult. +Consider the format of an incoming packet. A packet +usually contains a local network header followed by +one or more headers used by the high level protocols. +Finally, the data, if any, follows these headers. Since +the header information may be variable length, DMA'ing the eventual +data for the user into a page aligned area of +memory is impossible without +\fIa priori\fP knowledge of the format (e.g., by supporting +only a single protocol header format). +.PP +To allow variable length header information to +be present and still ensure page alignment of data, +a special local network encapsulation may be used. +This encapsulation, termed a \fItrailer protocol\fP [Leffler84], +places the variable length header information after +the data. A fixed size local network +header is then prepended to the resultant packet. +The local network header contains the size of the +data portion (in units of 512 bytes), and a new \fItrailer protocol +header\fP, inserted before the variable length +information, contains the size of the variable length +header information. The following trailer +protocol header is used to store information +regarding the variable length protocol header: +.DS +._f +struct { + short protocol; /* original protocol no. */ + short length; /* length of trailer */ +}; +.DE +.PP +The processing of the trailer protocol is very +simple. On output, the local network header indicates that +a trailer encapsulation is being used. +The header also includes an indication +of the number of data pages present before the trailer +protocol header. The trailer protocol header is +initialized to contain the actual protocol identifier and the +variable length header size, and is appended to the data +along with the variable length header information. +.PP +On input, the interface routines identify the +trailer encapsulation +by the protocol type stored in the local network header, +then calculate the number of +pages of data to find the beginning of the trailer. +The trailing information is copied into a separate +mbuf and linked to the front of the resultant packet. +.PP +Clearly, trailer protocols require cooperation between +source and destination. In addition, they are normally +cost effective only when sizable packets are used. The +current scheme works because the local network encapsulation +header is a fixed size, allowing DMA operations +to be performed at a known offset from the first data page +being received. Should the local network header be +variable length this scheme fails. +.PP +Statistics collected indicate that as much as 200Kb/s +can be gained by using a trailer protocol with +1Kbyte packets. The average size of the variable +length header was 40 bytes (the size of a +minimal TCP/IP packet header). If hardware +supports larger sized packets, even greater gains +may be realized. diff --git a/share/doc/smm/18.net/f.t b/share/doc/smm/18.net/f.t new file mode 100644 index 000000000000..9bf64a795c36 --- /dev/null +++ b/share/doc/smm/18.net/f.t @@ -0,0 +1,111 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.nr H2 1 +.\".ds RH Acknowledgements +.br +.ne 2i +.SH +\s+2Acknowledgements\s0 +.PP +The internal structure of the system is patterned +after the Xerox PUP architecture [Boggs79], while in certain +places the Internet +protocol family has had a great deal of influence in the design. +The use of software interrupts for process invocation +is based on similar facilities found in +the VMS operating system. +Many of the +ideas related to protocol modularity, memory management, and network +interfaces are based on Rob Gurwitz's TCP/IP implementation for the +4.1BSD version of UNIX on the VAX [Gurwitz81]. +Greg Chesson explained his use of trailer encapsulations in Datakit, +instigating their use in our system. +.\".ds RH References +.nr H2 1 +.sp 2 +.ne 2i +.SH +\s+2References\s0 +.LP +.IP [Boggs79] 20 +Boggs, D. R., J. F. Shoch, E. A. Taft, and R. M. Metcalfe; +\fIPUP: An Internetwork Architecture\fP. Report CSL-79-10. +XEROX Palo Alto Research Center, July 1979. +.IP [BBN78] 20 +Bolt Beranek and Newman; +Specification for the Interconnection of Host and IMP. +BBN Technical Report 1822. May 1978. +.IP [Cerf78] 20 +Cerf, V. G.; The Catenet Model for Internetworking. +Internet Working Group, IEN 48. July 1978. +.IP [Clark82] 20 +Clark, D. D.; Window and Acknowledgement Strategy in TCP, RFC-813. +Network Information Center, SRI International. July 1982. +.IP [DEC80] 20 +Digital Equipment Corporation; \fIDECnet DIGITAL Network +Architecture \- General Description\fP. Order No. +AA-K179A-TK. October 1980. +.IP [Gurwitz81] 20 +Gurwitz, R. F.; VAX-UNIX Networking Support Project \- Implementation +Description. Internetwork Working Group, IEN 168. +January 1981. +.IP [ISO81] 20 +International Organization for Standardization. +\fIISO Open Systems Interconnection \- Basic Reference Model\fP. +ISO/TC 97/SC 16 N 719. August 1981. +.IP [Joy86] 20 +Joy, W.; Fabry, R.; Leffler, S.; McKusick, M.; and Karels, M.; +Berkeley Software Architecture Manual, 4.4BSD Edition. +\fIUNIX Programmer's Supplementary Documents\fP, Vol. 1 (PSD:5). +Computer Systems Research Group, +University of California, Berkeley. +May, 1986. +.IP [Leffler84] 20 +Leffler, S.J. and Karels, M.J.; Trailer Encapsulations, RFC-893. +Network Information Center, SRI International. +April 1984. +.IP [Postel80] 20 +Postel, J. User Datagram Protocol, RFC-768. +Network Information Center, SRI International. May 1980. +.IP [Postel81a] 20 +Postel, J., ed. Internet Protocol, RFC-791. +Network Information Center, SRI International. September 1981. +.IP [Postel81b] 20 +Postel, J., ed. Transmission Control Protocol, RFC-793. +Network Information Center, SRI International. September 1981. +.IP [Postel81c] 20 +Postel, J. Internet Control Message Protocol, RFC-792. +Network Information Center, SRI International. September 1981. +.IP [Xerox81] 20 +Xerox Corporation. \fIInternet Transport Protocols\fP. +Xerox System Integration Standard 028112. December 1981. +.IP [Zimmermann80] 20 +Zimmermann, H. OSI Reference Model \- The ISO Model of +Architecture for Open Systems Interconnection. +\fIIEEE Transactions on Communications\fP. Com-28(4); 425-432. +April 1980. diff --git a/share/doc/smm/18.net/spell.ok b/share/doc/smm/18.net/spell.ok new file mode 100644 index 000000000000..f9a387bc75d8 --- /dev/null +++ b/share/doc/smm/18.net/spell.ok @@ -0,0 +1,307 @@ +A,1986A +AA +ACCEPTCONN +ADDR +ARPANET +ASYNC +BBN +BBN78 +Beranek +Boggs +Boggs78 +Boggs79 +Burruss81 +CANTRCVMORE +CANTSENDMORE +CLSIZE +COLL +CONNECT2 +CONNREQUIRED +COPYALL +CSL +Catenet +Cerf +Cerf78 +Chesson +Clark82 +Com +DEC80 +DECnet +DEQUEUE +DEQUEUEIF +DETATCH +DMA +DMA'ing +DMR +DONTWAIT +Datagram +Datakit +EGP +EWOULDBLOCK +Ethernet +FADDR +FASTTIMO +Fabry +GETOPT +Gurwitz +Gurwitz's +Gurwitz81 +HASCL +HOSTDEAD +HOSTUNREACH +ICMP +IEN +IFDOWN +IFF +IFRW +INTRANS +IP +IP's +ISCONNECTED +ISCONNECTING +ISDISCONNECTING +ISO +ISO81 +Ircb +Joy86 +K179A +Karels +LADDR +LOOPBACK +Leffler +Leffler84 +M.J +MAXNUBAMR +MCLGET +MFREE +MGET +MLEN +MSG +MSGSIZE +MSIZE +Mbufs +McKusick +Metcalfe +NBIO +NEEDFRAG +NOARP +NOFDREF +NOTRAILERS +NS +Notes''SMM:15 +OOBINLINE +OSI +PARAMPROB +PEERADDR +PF +POINTOPOINT +PRC +PRCO +PRIV +PROTORCV +PROTOSEND +PRU +PS1:6 +Postel +Postel80 +Postel81a +Postel81b +Postel81c +QFULL +RCVATMARK +RCVD +RCVOOB +RFC +RH +ROUTEDEAD +RTF +S.J +SB +SEL +SENDOOB +SETOPT +SIGIO +SIOCADDRT +SIOCDELRT +SIOCGIFCONF +SIOCSIFADDR +SIOCSIFDSTADDR +SIOCSIFFLAGS +SIOGSIFFLAGS +SLOWTIMO +SMM:15 +SOCKADDR +SRCFAIL +SS +Shoch +TCP +TIMXCEED +TOSHOST +TOSNET +UDP +UNIBUS +VAX +VMS +Vol +WANTRCVD +Xerox81 +Xerox82 +Zimmermann +Zimmermann80 +addr +addrlist +adj +amelioriates +async +bdp +broadaddr +caddr +csma +ctlinput +ctloutput +daemon +dat +datagram +decapsulation +dequeuing +dev +dma'd +dom +dst +dstaddr +dtom +faddr +fastimo +fasttimo +fcntl +freem +fstat +getsockopt +hardwired +hiwat +hlen +ierrors +ifa +ifaddr +iff +ifnet +ifp +ifq +ifqueue +ifr +ifrw +ifrw.ifrw +ifu +ifu.ifu +ifuba +ifubinfo +ifw +ifx +ifxmt +info +info.iff +init +inp +inpcb +inq +ip +ipackets +ipintrq +laddr +len +loopback +lowat +m,n +mb +mbcnt +mbmax +mbuf +mbuf's +mbufs +mp +mr +mtod +mtu +netstat +nmr +nr +nx +oerrors +off0 +ontrol +oob +oobmark +op +opackets +ops +optname +pcb +pgrp +prev +proc +proto +protosw +protoswNPROTOSW +pte +pullup +q0len +qlen +qlimit +rawcb +rcb +rcv +recvmsg +ref +refcnt +regs +req +rerouted +ro +rotocol +rt +rtalloc +rtentry +rtfree +rtredirect +rubaget +sb +sel +sendmsg +setsockopt +slowtimo +snd +sockaddr +sockbuf +socketpair +sockproto +sonewconn +soshutdown +src +ta +ternet +timeo +totlen +uba +ubaalloc +ubaget +ubainit +uballoc's +ubaminit +uban +ubaput +ubinfo +udp +unibus +usrreq +virt +vm +wildcard +wmap +wubaput +x,t +xmt +xmt.ifrw +xmt.ifw +xswapd +xtofree +xxx diff --git a/share/doc/smm/Makefile b/share/doc/smm/Makefile new file mode 100644 index 000000000000..14a57395090e --- /dev/null +++ b/share/doc/smm/Makefile @@ -0,0 +1,34 @@ +.include <src.opts.mk> + +# The following modules do not describe FreeBSD: +# 14.uucpimpl, 15.uucpnet + +# The following modules do not build/install: +# 13.amd (documentation is TeXinfo) +# 16.security 17.password (encumbered) + +SUBDIR= title \ + contents \ + 01.setup \ + 02.config \ + 03.fsck \ + 04.quotas \ + 05.fastfs \ + 06.nfs \ + ${_07.lpd} \ + ${_08.sendmailop} \ + 11.timedop \ + 12.timed \ + 18.net + +.if ${MK_SENDMAIL} != "no" +_08.sendmailop= 08.sendmailop +.endif + +.if ${MK_LPR} != "no" +_07.lpd= 07.lpd +.endif + +SUBDIR_PARALLEL= + +.include <bsd.subdir.mk> diff --git a/share/doc/smm/contents/Makefile b/share/doc/smm/contents/Makefile new file mode 100644 index 000000000000..c529682d6f33 --- /dev/null +++ b/share/doc/smm/contents/Makefile @@ -0,0 +1,6 @@ +VOLUME= smm +DOC= contents +SRCS= contents.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/smm/contents/contents.ms b/share/doc/smm/contents/contents.ms new file mode 100644 index 000000000000..0accf3b48639 --- /dev/null +++ b/share/doc/smm/contents/contents.ms @@ -0,0 +1,188 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.OH '''SMM Contents' +.EH 'SMM Contents''' +.TL +UNIX System Manager's Manual (SMM) +.if !r.U .nr .U 0 +.if \n(.U \{\ +.br +.>> <a href="Title.html">Title.html</a> +.\} +.sp +\s-2 4.4 Berkeley Software Distribution\s+2 +.sp +\fRJune, 1993\fR +.PP +This volume contains manual pages and supplementary documents useful to system +administrators. +The information in these documents applies to +the 4.4BSD system as distributed by U.C. Berkeley. +.SH +Reference Manual \- Section 8 +.tl '''(8)' +.IP +Section 8 of the UNIX Programmer's Manual contains information related to +system operation, administration, and maintenance. +.SH +System Installation and Administration +.IP +.tl 'Installing and Operating 4.4BSD''SMM:1' +.if \n(.U \{\ +.br +.>> <a href="01.setup/paper.html">01.setup/paper.html</a> +.\} +.QP +The definitive reference document for those occasions when +you find you need to start over again. + +.IP +.tl 'Building 4.4BSD Kernels with \fIConfig\fP''SMM:2' +.if \n(.U \{\ +.br +.>> <a href="02.config/paper.html">02.config/paper.html</a> +.\} +.QP +In-depth discussions of the use and operation of the \fIconfig\fP +program, and how to build your very own Unix kernel. + +.IP +.tl 'Fsck \- The UNIX File System Check Program''SMM:3' +.if \n(.U \{\ +.br +.>> <a href="03.fsck/paper.html">03.fsck/paper.html</a> +.\} +.QP +A reference document for using the \fIfsck\fP program during +times of file system distress. + +.IP +.tl 'Disc Quotas in a UNIX Environment''SMM:4' +.if \n(.U \{\ +.br +.>> <a href="04.quotas/paper.html">04.quotas/paper.html</a> +.\} +.QP +A light introduction to the techniques +for limiting the use of disc resources. + +.IP +.tl 'A Fast File System for UNIX''SMM:5' +.if \n(.U \{\ +.br +.>> <a href="05.fastfs/paper.html">05.fastfs/paper.html</a> +.\} +.QP +A description of the 4.4BSD file system organization, +design and implementation. + +.IP +.tl 'The 4.4BSD NFS Implementation''SMM:6' +.if \n(.U \{\ +.br +.>> <a href="06.nfs/paper.html">06.nfs/paper.html</a> +.\} +.QP +An overview of the design, implementation, and use of NFS on 4.4BSD. + +.IP +.tl 'Line Printer Spooler Manual''SMM:7' +.QP +This document describes the structure and installation procedure +for the line printer spooling system. + +.IP +.tl 'Sendmail Installation and Operation Guide''SMM:8' +.if \n(.U \{\ +.br +.>> <a href="08.sendmailop/paper.html">08.sendmailop/paper.html</a> +.\} +.QP +The last word in installing and operating the \fIsendmail\fP program. + +.IP +.tl 'Timed Installation and Operation Guide''SMM:11' +.if \n(.U \{\ +.br +.>> <a href="11.timedop/paper.html">11.timedop/paper.html</a> +.\} +.QP +Describes how to maintain time synchronization between machines +in a local network. + +.IP +.tl 'The Berkeley UNIX Time Synchronization Protocol''SMM:12' +.if \n(.U \{\ +.br +.>> <a href="12.timed/paper.html">12.timed/paper.html</a> +.\} +.QP +The protocols and algorithms used by timed, +the network time synchronization daemon. + +.IP +.tl 'AMD \- The 4.4BSD Automounter''SMM:13' +.QP +Automatically mounting file systems on demand. + +.IP +.tl 'Installation and Operation of UUCP''SMM:14' +.QP +Describes the implementation of uucp; for the installer and administrator. + +.IP +.tl 'A Dial\-Up Network of UNIX Systems''SMM:15' +.QP +Describes UUCP, a program for communicating files between UNIX systems. + +.IP +.tl 'On the Security of UNIX''SMM:16' +.QP +Hints on how to break UNIX, and how to avoid your system being broken. + +.IP +.tl 'Password Security \- A Case History''SMM:17' +.QP +How the bad guys used to be able to break the password algorithm, and why +they cannot now (at least not so easily). + +.IP +.tl 'Networking Implementation Notes, 4.4BSD Edition''SMM:18' +.if \n(.U \{\ +.br +.>> <a href="18.net/paper.html">18.net/paper.html</a> +.\} +.QP +A concise description of the system interfaces used within the +networking subsystem. + +.IP +.tl 'The PERL Programming Language''SMM:19' +.QP +The Practical Extraction and Report Language is ideal for +writing those pesky adminitration scripts. diff --git a/share/doc/smm/title/Makefile b/share/doc/smm/title/Makefile new file mode 100644 index 000000000000..294e37666112 --- /dev/null +++ b/share/doc/smm/title/Makefile @@ -0,0 +1,5 @@ +VOLUME= smm +DOC= Title +SRCS= Title + +.include <bsd.doc.mk> diff --git a/share/doc/smm/title/Title b/share/doc/smm/title/Title new file mode 100644 index 000000000000..37b946dacf0c --- /dev/null +++ b/share/doc/smm/title/Title @@ -0,0 +1,139 @@ +.\" Copyright (c) 1986, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.ps 18 +.vs 22 +.sp 2.75i +.ft B +.ce 2 +UNIX System Manager's Manual +(SMM) +.ps 14 +.vs 16 +.sp |4i +.ce 2 +4.4 Berkeley Software Distribution +.sp |5.75i +.ft R +.ps 12 +.vs 16 +.ce +June, 1993 +.sp |8.2i +.ce 5 +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California +Berkeley, California 94720 +.bp +\& +.sp |1i +.hy 0 +.ps 10 +.vs 12p +Copyright 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993 +The Regents of the University of California. All rights reserved. +.sp 2 +Other than the specific manual pages and documents listed below +as copyrighted by AT&T, +redistribution and use of this manual in source and binary forms, +with or without modification, are permitted provided that the +following conditions are met: +.sp 0.5 +.in +0.2i +.ta 0.2i +.ti -0.2i +1) Redistributions of this manual must retain the copyright +notices on this page, this list of conditions and the following disclaimer. +.ti -0.2i +2) Software or documentation that incorporates part of this manual must +reproduce the copyright notices on this page, this list of conditions and +the following disclaimer in the documentation and/or other materials +provided with the distribution. +.ti -0.2i +3) All advertising materials mentioning features or use of this software +must display the following acknowledgement: +``This product includes software developed by the University of +California, Berkeley and its contributors.'' +.ti -0.2i +4) Neither the name of the University nor the names of its contributors +may be used to endorse or promote products derived from this software +without specific prior written permission. +.in -0.2i +.sp +\fB\s-1THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +SUCH DAMAGE.\s+1\fP +.sp 2 +The Institute of Electrical and Electronics Engineers and the American +National Standards Committee X3, on Information Processing Systems have +given us permission to reprint portions of their documentation. +.sp +In the following statement, the phrase ``this text'' refers to portions +of the system documentation. +.sp 0.5 +``Portions of this text are reprinted and reproduced in +electronic form in 4.4BSD from IEEE Std 1003.1-1988, IEEE +Standard Portable Operating System Interface for Computer Environments +(POSIX), copyright 1988 by the Institute of Electrical and Electronics +Engineers, Inc. In the event of any discrepancy between these versions +and the original IEEE Standard, the original IEEE Standard is the referee +document.'' +.sp +In the following statement, the phrase ``This material'' refers to portions +of the system documentation. +.sp 0.5 +``This material is reproduced with permission from American National +Standards Committee X3, on Information Processing Systems. Computer and +Business Equipment Manufacturers Association (CBEMA), 311 First St., NW, +Suite 500, Washington, DC 20001-2178. The developmental work of +Programming Language C was completed by the X3J11 Technical Committee.'' +.sp 2 +Manual pages cron.8, icheck.8, ncheck.8, and sa.8 +and documents SMM:15, 16, and 17 +are copyright 1979, AT&T Bell Laboratories, Incorporated. +Document SMM:14 is a modification of an earlier document that +is copyrighted 1979 by AT&T Bell Laboratories, Incorporated. +Holders of \x'-1p'UNIX\v'-4p'\s-3TM\s0\v'4p'/32V, +System III, or System V software licenses are +permitted to copy these documents, or any portion of them, +as necessary for licensed use of the software, +provided this copyright notice and statement of permission +are included. +.sp 2 +The views and conclusions contained in this manual are those of the +authors and should not be interpreted as representing official policies, +either expressed or implied, of the Regents of the University of California. |