diff options
Diffstat (limited to 'share/doc/papers/jail/paper.ms')
-rw-r--r-- | share/doc/papers/jail/paper.ms | 438 |
1 files changed, 438 insertions, 0 deletions
diff --git a/share/doc/papers/jail/paper.ms b/share/doc/papers/jail/paper.ms new file mode 100644 index 000000000000..60be9f2748bd --- /dev/null +++ b/share/doc/papers/jail/paper.ms @@ -0,0 +1,438 @@ +.\" +.\" $FreeBSD$ +.\" +.if n .ftr C R +.ig TL +.ds CH " +.nr PI 2n +.nr PS 12 +.nr LL 15c +.nr PO 3c +.nr FM 3.5c +.po 3c +.TL +Jails: Confining the omnipotent root. +.FS +This paper was presented at the 2nd International System Administration and Networking Conference "SANE 2000" May 22-25, 2000 in Maastricht, The Netherlands and is published in the proceedings. +.FE +.AU +Poul-Henning Kamp <phk@FreeBSD.org> +.AU +Robert N. M. Watson <rwatson@FreeBSD.org> +.AI +The FreeBSD Project +.FS +This work was sponsored by \fChttp://www.servetheweb.com/\fP and +donated to the FreeBSD Project for inclusion in the FreeBSD +OS. FreeBSD 4.0-RELEASE was the first release including this +code. +Follow-on work was sponsored by Safeport Network Services, +\fChttp://www.safeport.com/\fP +.FE +.AB +The traditional UNIX security model is simple but inexpressive. +Adding fine-grained access control improves the expressiveness, +but often dramatically increases both the cost of system management +and implementation complexity. +In environments with a more complex management model, with delegation +of some management functions to parties under varying degrees of trust, +the base UNIX model and most natural +extensions are inappropriate at best. +Where multiple mutually un-trusting parties are introduced, +``inappropriate'' rapidly transitions to ``nightmarish'', especially +with regards to data integrity and privacy protection. +.PP +The FreeBSD ``Jail'' facility provides the ability to partition +the operating system environment, while maintaining the simplicity +of the UNIX ``root'' model. +In Jail, users with privilege find that the scope of their requests +is limited to the jail, allowing system administrators to delegate +management capabilities for each virtual machine +environment. +Creating virtual machines in this manner has many potential uses; the +most popular thus far has been for providing virtual machine services +in Internet Service Provider environments. +.AE +.NH +Introduction +.PP +The UNIX access control mechanism is designed for an environment with two +types of users: those with, and without administrative privilege. +Within this framework, every attempt is made to provide an open +system, allowing easy sharing of files and inter-process communication. +As a member of the UNIX family, FreeBSD inherits these +security properties. +Users of FreeBSD in non-traditional UNIX environments must balance +their need for strong application support, high network performance +and functionality, and low total cost of ownership with the need +for alternative security models that are difficult or impossible to +implement with the UNIX security mechanisms. +.PP +One such consideration is the desire to delegate some (but not all) +administrative functions to untrusted or less trusted parties, and +simultaneously impose system-wide mandatory policies on process +interaction and sharing. +Attempting to create such an environment in the current-day FreeBSD +security environment is both difficult and costly: in many cases, +the burden of implementing these policies falls on user +applications, which means an increase in the size and complexity +of the code base, in turn translating to higher development +and maintenance cost, as well as less overall flexibility. +.PP +This abstract risk becomes more clear when applied to a practical, +real-world example: +many web service providers turn to the FreeBSD +operating system to host customer web sites, as it provides a +high-performance, network-centric server environment. +However, these providers have a number of concerns on their plate, both in +terms of protecting the integrity and confidentiality of their own +files and services from their customers, as well as protecting the files +and services of one customer from (accidental or +intentional) access by any other customer. +At the same time, a provider would like to provide +substantial autonomy to customers, allowing them to install and +maintain their own software, and to manage their own services, +such as web servers and other content-related daemon programs. +.PP +This problem space points strongly in the direction of a partitioning +solution, in which customer processes and storage are isolated from those of +other customers, both in terms of accidental disclosure of data or process +information, but also in terms of the ability to modify files or processes +outside of a compartment. +Delegation of management functions within the system must +be possible, but not at the cost of system-wide requirements, including +integrity and privacy protection between partitions. +.PP +However, UNIX-style access control makes it notoriously difficult to +compartmentalise functionality. +While mechanisms such as chroot(2) provide a modest +level compartmentalisation, it is well known +that these mechanisms have serious shortcomings, both in terms of the +scope of their functionality, and effectiveness at what they provide \s-2[CHROOT]\s+2. +.PP +In the case of the chroot(2) call, a process's visibility of +the file system name-space is limited to a single subtree. +However, the compartmentalisation does not extend to the process +or networking spaces and therefore both observation of and interference +with processes outside their compartment is possible. +.PP +To this end, we describe the new FreeBSD ``Jail'' facility, which +provides a strong partitioning solution, leveraging existing +mechanisms, such as chroot(2), to what effectively amounts to a +virtual machine environment. Processes in a jail are provided +full access to the files that they may manipulate, processes they +may influence, and network services they can make use of, and neither +access nor visibility of files, processes or network services outside +their partition. +.PP +Unlike other fine-grained security solutions, Jail does not +substantially increase the policy management requirements for the +system administrator, as each Jail is a virtual FreeBSD environment +permitting local policy to be independently managed, with much the +same properties as the main system itself, making Jail easy to use +for the administrator, and far more compatible with applications. +.NH +Traditional UNIX Security, or, ``God, root, what difference?" \s-2[UF]\s+2. +.PP +The traditional UNIX access model assigns numeric uids to each user of the +system. In turn, each process ``owned'' by a user will be tagged with that +user's uid in an unforgeable manner. The uids serve two purposes: first, +they determine how discretionary access control mechanisms will be applied, and +second, they are used to determine whether special privileges are accorded. +.PP +In the case of discretionary access controls, the primary object protected is +a file. The uid (and related gids indicating group membership) are mapped to +a set of rights for each object, courtesy the UNIX file mode, in effect acting +as a limited form of access control list. Jail is, in general, not concerned +with modifying the semantics of discretionary access control mechanisms, +although there are important implications from a management perspective. +.PP +For the purposes of determining whether special privileges are accorded to a +process, the check is simple: ``is the numeric uid equal to 0 ?''. +If so, the +process is acting with ``super-user privileges'', and all access checks are +granted, in effect allowing the process the ability to do whatever it wants +to \**. +.FS +\&... no matter how patently stupid it may be. +.FE +.PP +For the purposes of human convenience, uid 0 is canonically allocated +to the ``root'' user \s-2[ROOT]\s+2. +For the purposes of jail, this behaviour is extremely relevant: many of +these privileged operations can be used to manage system hardware and +configuration, file system name-space, and special network operations. +.PP +Many limitations to this model are immediately clear: the root user is a +single, concentrated source of privilege that is exposed to many pieces of +software, and as such an immediate target for attacks. In the event of a +compromise of the root capability set, the attacker has complete control over +the system. Even without an attacker, the risks of a single administrative +account are serious: delegating a narrow scope of capability to an +inexperienced administrator is difficult, as the granularity of delegation is +that of all system management abilities. These features make the omnipotent +root account a sharp, efficient and extremely dangerous tool. +.PP +The BSD family of operating systems have implemented the ``securelevel'' +mechanism which allows the administrator to block certain configuration +and management functions from being performed by root, +until the system is restarted and brought up into single-user mode. +While this does provide some amount of protection in the case of a root +compromise of the machine, it does nothing to address the need for +delegation of certain root abilities. +.NH +Other Solutions to the Root Problem +.PP +Many operating systems attempt to address these limitations by providing +fine-grained access controls for system resources \s-2[BIBA]\s+2. +These efforts vary in +degrees of success, but almost all suffer from at least three serious +limitations: +.PP +First, increasing the granularity of security controls increases the +complexity of the administration process, in turn increasing both the +opportunity for incorrect configuration, as well as the demand on +administrator time and resources. In many cases, the increased complexity +results in significant frustration for the administrator, which may result +in two +disastrous types of policy: ``all doors open as it's too much trouble'', and +``trust that the system is secure, when in fact it isn't''. +.PP +The extent of the trouble is best illustrated by the fact that an entire +niche industry has emerged providing tools to manage fine grained security +controls \s-2[UAS]\s+2. +.PP +Second, usefully segregating capabilities and assigning them to running code +and users is very difficult. Many privileged operations in UNIX seem +independent, but are in fact closely related, and the handing out of one +privilege may, in effect, be transitive to the many others. For example, in +some trusted operating systems, a system capability may be assigned to a +running process to allow it to read any file, for the purposes of backup. +However, this capability is, in effect, equivalent to the ability to switch to +any other account, as the ability to access any file provides access to system +keying material, which in turn provides the ability to authenticate as any +user. Similarly, many operating systems attempt to segregate management +capabilities from auditing capabilities. In a number of these operating +systems, however, ``management capabilities'' permit the administrator to +assign ``auditing capabilities'' to itself, or another account, circumventing +the segregation of capability. +.PP +Finally, introducing new security features often involves introducing new +security management APIs. When fine-grained capabilities are introduced to +replace the setuid mechanism in UNIX-like operating systems, applications that +previously did an ``appropriateness check'' to see if they were running as +root before executing must now be changed to know that they need not run as +root. In the case of applications running with privilege and executing other +programs, there is now a new set of privileges that must be voluntarily given +up before executing another program. These change can introduce significant +incompatibility for existing applications, and make life more difficult for +application developers who may not be aware of differing security semantics on +different systems \s-2[POSIX1e]\s+2. +.NH +The Jail Partitioning Solution +.PP +Jail neatly side-steps the majority of these problems through partitioning. +Rather +than introduce additional fine-grained access control mechanism, we partition +a FreeBSD environment (processes, file system, network resources) into a +management environment, and optionally subset Jail environments. In doing so, +we simultaneously maintain the existing UNIX security model, allowing +multiple users and a privileged root user in each jail, while +limiting the scope of root's activities to his jail. +Consequently the administrator of a +FreeBSD machine can partition the machine into separate jails, and provide +access to the super-user account in each of these without losing control of +the over-all environment. +.PP +A process in a partition is referred to as ``in jail''. When a FreeBSD +system is booted up after a fresh install, no processes will be in jail. +When +a process is placed in a jail, it, and any descendents of the process created +after the jail creation, will be in that jail. A process may be in only one +jail, and after creation, it can not leave the jail. +Jails are created when a +privileged process calls the jail(2) syscall, with a description of the jail as an +argument to the call. Each call to jail(2) creates a new jail; the only way +for a new process to enter the jail is by inheriting access to the jail from +another process already in that jail. +Processes may never +leave the jail they created, or were created in. +.KF +.if t .PSPIC jail01.eps 4i +.ce 1 +Fig. 1 \(em Schematic diagram of machine with two configured jails +.sp +.KE +.PP +Membership in a jail involves a number of restrictions: access to the file +name-space is restricted in the style of chroot(2), the ability to bind network +resources is limited to a specific IP address, the ability to manipulate +system resources and perform privileged operations is sharply curtailed, and +the ability to interact with other processes is limited to only processes +inside the same jail. +.PP +Jail takes advantage of the existing chroot(2) behaviour to limit access to the +file system name-space for jailed processes. When a jail is created, it is +bound to a particular file system root. +Processes are unable to manipulate files that they cannot address, +and as such the integrity and confidentiality of files outside of the jail +file system root are protected. Traditional mechanisms for breaking out of +chroot(2) have been blocked. +In the expected and documented configuration, each jail is provided +with its exclusive file system root, and standard FreeBSD directory layout, +but this is not mandated by the implementation. +.PP +Each jail is bound to a single IP address: processes within the jail may not +make use of any other IP address for outgoing or incoming connections; this +includes the ability to restrict what network services a particular jail may +offer. As FreeBSD distinguishes attempts to bind all IP addresses from +attempts to bind a particular address, bind requests for all IP addresses are +redirected to the individual Jail address. Some network functionality +associated with privileged calls are wholesale disabled due to the nature of the +functionality offered, in particular facilities which would allow ``spoofing'' +of IP numbers or disruptive traffic to be generated have been disabled. +.PP +Processes running without root privileges will notice few, if any differences +between a jailed environment or un-jailed environment. Processes running with +root privileges will find that many restrictions apply to the privileged calls +they may make. Some calls will now return an access error \(em for example, an +attempt to create a device node will now fail. Others will have a more +limited scope than normal \(em attempts to bind a reserved port number on all +available addresses will result in binding only the address associated with +the jail. Other calls will succeed as normal: root may read a file owned by +any uid, as long as it is accessible through the jail file system name-space. +.PP +Processes within the jail will find that they are unable to interact or +even verify the existence of +processes outside the jail \(em processes within the jail are +prevented from delivering signals to processes outside the jail, as well as +connecting to those processes with debuggers, or even see them in the +sysctl or process file system monitoring mechanisms. Jail does not prevent, +nor is it intended to prevent, the use of covert channels or communications +mechanisms via accepted interfaces \(em for example, two processes may communicate +via sockets over the IP network interface. Nor does it attempt to provide +scheduling services based on the partition; however, it does prevent calls +that interfere with normal process operation. +.PP +As a result of these attempts to retain the standard FreeBSD API and +framework, almost all applications will run unaffected. Standard system +services such as Telnet, FTP, and SSH all behave normally, as do most third +party applications, including the popular Apache web server. +.NH +Jail Implementation +.PP +Processes running with root privileges in the jail find that there are serious +restrictions on what it is capable of doing \(em in particular, activities that +would extend outside of the jail: +.IP "" 5n +\(bu Modifying the running kernel by direct access and loading kernel +modules is prohibited. +.IP +\(bu Modifying any of the network configuration, interfaces, addresses, and +routing table is prohibited. +.IP +\(bu Mounting and unmounting file systems is prohibited. +.IP +\(bu Creating device nodes is prohibited. +.IP +\(bu Accessing raw, divert, or routing sockets is prohibited. +.IP +\(bu Modifying kernel runtime parameters, such as most sysctl settings, is +prohibited. +.IP +\(bu Changing securelevel-related file flags is prohibited. +.IP +\(bu Accessing network resources not associated with the jail is prohibited. +.PP +Other privileged activities are permitted as long as they are limited to the +scope of the jail: +.IP "" 5n +\(bu Signalling any process within the jail is permitted. +.IP +\(bu Changing the ownership and mode of any file within the jail is permitted, as +long as the file flags permit this. +.IP +\(bu Deleting any file within the jail is permitted, as long as the file flags +permit this. +.IP +\(bu Binding reserved TCP and UDP port numbers on the jails IP address is +permitted. (Attempts to bind TCP and UDP ports using INADDR_ANY will be +redirected to the jails IP address.) +.IP +\(bu Functions which operate on the uid/gid space are all permitted since they +act as labels for filesystem objects of proceses +which are partitioned off by other mechanisms. +.PP +These restrictions on root access limit the scope of root processes, enabling +most applications to run un-hindered, but preventing calls that might allow an +application to reach beyond the jail and influence other processes or +system-wide configuration. +.PP +.so implementation.ms +.so mgt.ms +.so future.ms +.NH +Conclusion +.PP +The jail facility provides FreeBSD with a conceptually simple security +partitioning mechanism, allowing the delegation of administrative rights +within virtual machine partitions. +.PP +The implementation relies on +restricting access within the jail environment to a well-defined subset +of the overall host environment. This includes limiting interaction +between processes, and to files, network resources, and privileged +operations. Administrative overhead is reduced through avoiding +fine-grained access control mechanisms, and maintaining a consistent +administrative interface across partitions and the host environment. +.PP +The jail facility has already seen widespread deployment in particular as +a vehicle for delivering "virtual private server" services. +.PP +The jail code is included in the base system as part of FreeBSD 4.0-RELEASE, +and fully documented in the jail(2) and jail(8) man-pages. +.bp +.SH +Notes & References +.IP \s-2[BIBA]\s+2 .5i +K. J. Biba, Integrity Considerations for Secure +Computer Systems, USAF Electronic Systems Division, 1977 +.IP \s-2[CHROOT]\s+2 .5i +Dr. Marshall Kirk Mckusick, private communication: +``According to the SCCS logs, the chroot call was added by Bill Joy +on March 18, 1982 approximately 1.5 years before 4.2BSD was released. +That was well before we had ftp servers of any sort (ftp did not +show up in the source tree until January 1983). My best guess as +to its purpose was to allow Bill to chroot into the /4.2BSD build +directory and build a system using only the files, include files, +etc contained in that tree. That was the only use of chroot that +I remember from the early days.'' +.IP \s-2[LOTTERY1]\s+2 .5i +David Petrou and John Milford. Proportional-Share Scheduling: +Implementation and Evaluation in a Widely-Deployed Operating System, +December 1997. +.nf +\s-2\fChttp://www.cs.cmu.edu/~dpetrou/papers/freebsd_lottery_writeup98.ps\fP\s+2 +\s-2\fChttp://www.cs.cmu.edu/~dpetrou/code/freebsd_lottery_code.tar.gz\fP\s+2 +.IP \s-2[LOTTERY2]\s+2 .5i +Carl A. Waldspurger and William E. Weihl. Lottery Scheduling: Flexible Proportional-Share Resource Management, Proceedings of the First Symposium on Operating Systems Design and Implementation (OSDI '94), pages 1-11, Monterey, California, November 1994. +.nf +\s-2\fChttp://www.research.digital.com/SRC/personal/caw/papers.html\fP\s+2 +.IP \s-2[POSIX1e]\s+2 .5i +Draft Standard for Information Technology \(em +Portable Operating System Interface (POSIX) \(em +Part 1: System Application Program Interface (API) \(em Amendment: +Protection, Audit and Control Interfaces [C Language] +IEEE Std 1003.1e Draft 17 Editor Casey Schaufler +.IP \s-2[ROOT]\s+2 .5i +Historically other names have been used at times, Zilog for instance +called the super-user account ``zeus''. +.IP \s-2[UAS]\s+2 .5i +One such niche product is the ``UAS'' system to maintain and audit +RACF configurations on MVS systems. +.nf +\s-2\fChttp://www.entactinfo.com/products/uas/\fP\s+2 +.IP \s-2[UF]\s+2 .5i +Quote from the User-Friendly cartoon by Illiad. +.nf +\s-2\fChttp://www.userfriendly.org/cartoons/archives/98nov/19981111.html\fP\s+2 |