aboutsummaryrefslogtreecommitdiff
path: root/sys/kern/kern_jail.c
Commit message (Collapse)AuthorAgeFilesLines
* jail: allow adjustment of host timeMariusz Zaborski2024-06-281-0/+28
| | | | | | | | | | Add a special permission to the jail to adjust and to set the host time. This can be useful if we want to compartmentalize the NTP daemon from the rest of the system. Reviewed by: olce, imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D45545
* Abstract UIO allocation and deallocation.Alfredo Mazzinghi2024-02-101-3/+3
| | | | | | | | | | | | | Introduce the allocuio() and freeuio() functions to allocate and deallocate struct uio. This hides the actual allocator interface, so it is easier to modify the sub-allocation layout of struct uio and the corresponding iovec array. Obtained from: CheriBSD Reviewed by: kib, markj MFC after: 2 weeks Sponsored by: CHaOS, EPSRC grant EP/V000292/1 Differential Revision: https://reviews.freebsd.org/D43711
* jail: expose children.max and children.cur via sysctlJamie Gritton2024-01-261-0/+29
| | | | | Submitted by: Igor Ostapenko <igor.ostapenko_pm.me> Differential Revision: <https://reviews.freebsd.org/D43565>
* jail: add security.jail.mlock_allowedBaptiste Daroussin2024-01-051-0/+4
| | | | | | | | | | when the parameter allow.mlock was added a way for jails to check if the parameter was set or now has not been added, this change covers it. MFC After: 3 days Reviewed by: jamie@ Differential Revision: https://reviews.freebsd.org/D43314
* jail: Ignore errors from copyout() while copying the error stringMark Johnston2023-12-261-2/+2
| | | | | | Reviewed by: zlei, jamie MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D43142
* jail: Don't allow jail_set(2) to resurrect dying jails.Jamie Gritton2023-11-301-114/+145
| | | | | | | | | | | | | | | | | | | | | Currently, a prison in "dying" state (removed but still holding resources) can be brought back to alive state via "jail -d", or the JAIL_DYING flag to jail_set(2). This seemed like a good idea at the time. Its main use was to improve support for specifying the jid when creating a jail, which also seemed like a good idea at the time. But resurrecting a jail that was partway through thr process of shutting down is trouble waiting to happen. This patch deprecates that flag, leaving it as a no-op for creating jails (but still useful for looking at dying jails). It sill allows creating a new jail with the same jid as a dying one, but will renumber the old one in that case. That's imperfect, but allows for current behavior. Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D28150
* cr_canseejailproc(): New privilege, no direct check for UID 0Olivier Certner2023-09-281-0/+1
| | | | | | | | | | | | | | | | | | | | | Use priv_check_cred() with a new privilege (PRIV_SEEJAILPROC) instead of explicitly testing for UID 0 (the former has been the rule for almost 20 years). As a consequence, cr_canseejailproc() now abides by the 'security.bsd.suser_enabled' sysctl and MAC policies. Update the MAC policies Biba and LOMAC, and prison_priv_check() so that they don't deny this privilege. This preserves the existing behavior (the 'root' user is not restricted, even when jailed, unless 'security.bsd.suser_enabled' is not 0) and is consistent with what is done for the related policies/privileges (PRIV_SEEOTHERGIDS, PRIV_SEEOTHERUIDS). Reviewed by: emaste (earlier version), mhorne MFC after: 2 weeks Sponsored by: Kumacom SAS Differential Revision: https://reviews.freebsd.org/D40626
* jail: Add the ability to access system-level filesystem extended attributesShawn Webb2023-09-011-0/+14
| | | | | | | | | | | | | | | Prior to this commit privileged accounts in a jail could not access to the filesystem extended attributes in the system namespace. To control access to the system namespace in a per-jail basis add a new configuration parameter allow.extattr which is off by default. Reported by: zirias Tested by: zirias Obtained from: HardenedBSD Reviewed by: kevans, jamie Differential revision: https://reviews.freebsd.org/D41643 MFC after: 1 week Relnotes: yes
* sys: Remove $FreeBSD$: one-line .c patternWarner Losh2023-08-161-2/+0
| | | | Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
* spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSDWarner Losh2023-05-121-1/+1
| | | | | | | | | The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
* netlink: allow netlink sockets in non-vnet jails.Alexander V. Chernikov2023-03-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change allow to open Netlink sockets in the non-vnet jails, even for unpriviledged processes. The security model largely follows the existing one. To be more specific: * by default, every `NETLINK_ROUTE` command is **NOT** allowed in non-VNET jail UNLESS `RTNL_F_ALLOW_NONVNET_JAIL` flag is specified in the command handler. * All notifications are **disabled** for non-vnet jails (requests to subscribe for the notifications are ignored). This will change to be more fine-grained model once the first netlink provider requiring this gets committed. * Listing interfaces (RTM_GETLINK) is **allowed** w/o limits (**including** interfaces w/o any addresses attached to the jail). The value of this is questionable, but it follows the existing approach. * Listing ARP/NDP neighbours is **forbidden**. This is a **change** from the current approach - currently we list static ARP/ND entries belonging to the addresses attached to the jail. * Listing interface addresses is **allowed**, but the addresses are filtered to match only ones attached to the jail. * Listing routes is **allowed**, but the routes are filtered to provide only host routes matching the addresses attached to the jail. * By default, every `NETLINK_GENERIC` command is **allowed** in non-VNET jail (as sub-families may be unrelated to network at all). It is the goal of the family author to implement the restriction if necessary. Differential Revision: https://reviews.freebsd.org/D39206 MFC after: 1 month
* jail: convert several functions from int to boolMina Galić2023-03-151-16/+21
| | | | | | | | | | | these functions exclusively return (0) and (1), so convert them to bool We also convert some networking related jail functions from int to bool some of which were returning an error that was never used. Differential Revision: https://reviews.freebsd.org/D29659 Reviewed by: imp, jamie (earlier version) Pull Request: https://github.com/freebsd/freebsd-src/pull/663
* kern_jail.c: Remove #ifdefs for VNET_NFSDRick Macklem2023-03-021-9/+2
| | | | | | | | | | | | | The consensus was that VNET_NFSD was not needed. This patch removes it from kern_jail.c. With this patch, support for the "allow.nfsd" jail parameter is enabled in the kernel for kernels built with "options VIMAGE". Reviewed by: markj MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D38808
* jail: Improve readabilityZhenlei Huang2023-02-281-10/+12
| | | | | | | No functional change intended. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D37890
* jail: Use flexible array member within struct prison_ipZhenlei Huang2023-02-281-44/+58
| | | | | | | | | | | | Current implementation utilize off-by-one struct prison_ip to access the IPv[46] addresses. It is error prone and hence comes the regression fix 21ad3e27fabc and ddbf879d79d4. Use flexible array member so that compiler will catch such errors and it will also be easier to review. No functional change intended. Reviewed by: melifaro, glebius Differential Revision: https://reviews.freebsd.org/D37874
* vfs_export: Add mnt_exjail to control exports done in prisonsRick Macklem2023-02-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | If there are multiple instances of mountd(8) (in different prisons), there will be confusion if they manipulate the exports of the same file system. This patch adds mnt_exjail to "struct mount" so that the credentials (and, therefore, the prison) that did the exports for that file system can be recorded. If another prison has already exported the file system, vfs_export() will fail with an error. If mnt_exjail == NULL, the file system has not been exported. mnt_exjail is checked by the NFS server, so that exports done from within a different prison will not be used. The patch also implements vfs_exjail_destroy(), which is called from prison_cleanup() to release all the mnt_exjail credential references, so that the prison can be removed. Mainly to avoid doing a scan of the mountlist for the case where there were no exports done from within the prison, a count of how many file systems have been exported from within the prison is kept in pr_exportcnt. Reviewed by: markj Discussed with: jamie Differential Revision: https://reviews.freebsd.org/D38371 MFC after: 3 months
* jail: Fix redoing ip restrictingZhenlei Huang2023-02-211-2/+4
| | | | | | | | | | | `prison_ip_restrict()` is called in loop FOREACH_PRISON_DESCENDANT_LOCKED. While under low memory, it is still possible that in subsequent rounds `prison_ip_restrict()` succeed and `redo_ip[46]` flip over from true to false, thus leave some prisons's IPv[46] addresses unrestricted. Reviewed by: jamie Fixes: 8bce8d28abe6 jail: Avoid multipurpose return value of function prison_ip_restrict() Differential Revision: https://reviews.freebsd.org/D38697
* jail: Use atomic(9) instead of CK atomicsMark Johnston2023-02-071-2/+2
| | | | | | | | | | | There's no reason to use one over the other here, let's prefer the interface that's used elsewhere in the kernel. No functional change intended. Reviewed by: mjg Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38360
* Revert "vfs_export: Add checks for correct prison when updating exports"Rick Macklem2023-02-041-32/+0
| | | | | | This reverts commit 7926a01ed7ae7cefd81ef4cc2142c35b84d81913. A new patch in D38371 is being considered for doing this.
* vfs_export: Add checks for correct prison when updating exportsRick Macklem2023-02-031-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mountd(8) basically does the following: getmntinfo() for each mount delete_exports using nmount(2) to do the creation/deletion of individual exports. For prison0 (and for other prisons if enforce_statfs == 0) getmntinfo() returns all mount points, including ones being used within other prisons. This can cause confusion if the same file system is specified in the exports(5) file for multiple prisons. This patch adds a perminent identifier to each prison and marks which prison did the exports in a field of the mount structure called mnt_exjail. This field can then be compared to the perminent identifier for the prison that the thread's credentials is in. Also required was a new function called prison_isalive_permid() which returns if the prison is alive, so that the check can be ignored for prisons that have been removed. This prepares the system to allow mountd(8) to run in multiple prisons, including prison0. Future commits will complete the modifications to allow mountd(8) to run in vnet prisons. Until then, these changes should not affect semantics. Reviewed by: markj MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D38144
* prison_check_nfsd: Add check for enforce_statfs != 0Rick Macklem2023-02-021-0/+4
| | | | | | | | | | Since mountd(8) will not be able to do exports when running in a vnet prison if enforce_statfs is set to 0, add a check for this to prison_check_nfsd(). Reviewed by: jamie, markj MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D38189
* jail: Avoid multipurpose return value of function prison_ip_restrict()Zhenlei Huang2023-01-131-42/+29
| | | | | | | | | | | Currently function prison_ip_restrict() returns true if the replacement buffer was used, or no buffer provided and allocation fails and should redo. The logic is confusing and cause possibly infinite loop from eb8dcdeac22d . Reviewed by: jamie, glebius Approved by: kp (mentor) Differential Revision: https://reviews.freebsd.org/D37918
* jail: Fix regression panic from eb8dcdeac22dZhenlei Huang2023-01-131-16/+39
| | | | | | | | | | | | | | | And possibly infinite loop calling prison_ip_restrict() in kern_jail_set() [2]. [1] It is possible that prisons do not have any IPv4 or IPv6 addresses. [2] If prison_ip_restrict() is not provided with prison_ip, when it allocates prison_ip successfully, then it should return false to indicate not redo prison_ip_restrict() later. Reviewed by: glebius Approved by: kp (mentor) Fixes: eb8dcdeac22d jail: network epoch protection for IP address lists Differential Revision: https://reviews.freebsd.org/D37906
* jail: Correctly access IPv[46] addresses of prison_ipZhenlei Huang2023-01-131-3/+3
| | | | | | | | | | | * Fix wrong IPv[46] addresses inherited from parent jail * Properly restrict the child jail's IPv[46] addresses Reviewed by: melifaro, glebius Approved by: kp (mentor) Fixes: eb8dcdeac22d jail: network epoch protection for IP address lists Differential Revision: https://reviews.freebsd.org/D37871 Differential Revision: https://reviews.freebsd.org/D37872
* jail: Fix output of IPv[46] addresses of DDB `show prison`Zhenlei Huang2022-12-211-2/+2
| | | | | | | Reviewed by: melifaro, jamie Approved by: kp (mentor) Fixes: eb8dcdeac22d jail: network epoch protection for IP address lists Differential Revision: https://reviews.freebsd.org/D37732
* kern_jail.c: Allow mountd/nfsd to optionally run in a jailRick Macklem2022-12-171-1/+46
| | | | | | | | | | | | | | This patch adds "allow.nfsd" to the jail code based on a new kernel build option VNET_NFSD. This will not work until future patches fix nmount(2) to allow mountd to run in a vnet prison and the NFS server code is patched so that global variables are in a vnet. The jail(8) man page will be patched in a future commit. Reviewed by: jamie MFC after: 4 months Differential Revision: https://reviews.freebsd.org/D37637
* Import the WireGuard driver from zx2c4.com.John Baldwin2022-10-281-0/+1
| | | | | | | | | | | | | | | This commit brings back the driver from FreeBSD commit f187d6dfbf633665ba6740fe22742aec60ce02a2 plus subsequent fixes from upstream. Relative to upstream this commit includes a few other small fixes such as additional INET and INET6 #ifdef's, #include cleanups, and updates for recent API changes in main. Reviewed by: pauamma, gbe, kevans, emaste Obtained from: git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D36909
* if_me: Use dedicated network privilegeZhenlei Huang2022-10-151-0/+1
| | | | | | | Separate if_me privileges from if_gif. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D36691
* kern_jail: Fix a typo in a source code commentGordon Bergling2022-09-151-1/+1
| | | | | | - s/paramter/parameter/ MFC after: 3 days
* jail: add process linkageMateusz Guzik2022-09-051-18/+85
| | | | | | | | | | | It allows iteration over processes belonging to given jail instead of having to walk the entire allproc list. Note the iteration can miss processes which remains bug-compatible with previous code. Reviewed by: jamie (previous version), markj (previous version) Differential Revision: https://reviews.freebsd.org/D34522
* kern: Correct some typos in source code commentsGordon Bergling2022-09-041-1/+1
| | | | | | | - s/occured/occurred/ - s/the the/the/ MFC after: 3 days
* jail: Remove a prison's shared memory when it diesJamie Gritton2022-06-291-0/+2
| | | | | | | | | Add shm_remove_prison(), that removes all POSIX shared memory segments belonging to a prison. Call it from prison_cleanup() so a prison won't be stuck in a dying state due to the resources still held. PR: 257555 Reported by: grembo
* jail: add prison_cleanup() to release resources held by a dying jailJamie Gritton2022-06-291-4/+16
| | | | | | | Currently, when a jail starts dying, either by losing its last user reference or by being explicitly killed, osd_jail_call(...PR_METHOD_REMOVE...) is called. Encapsulate this into a function prison_cleanup() that can then do other cleanup.
* ovpn: Introduce OpenVPN DCO supportKristof Provost2022-06-281-0/+1
| | | | | | | | | | | | | | OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing (i.e. tunneling and cryptography) into the kernel, rather than using tap devices. This avoids significant copying and context switching overhead between kernel and user space and improves OpenVPN throughput. In my test setup throughput improved from around 660Mbit/s to around 2Gbit/s. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34340
* jail: Remove a double word in a source code commentGordon Bergling2022-04-091-1/+1
| | | | | | - s/a a/a/ MFC after: 3 days
* vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)Mateusz Guzik2022-03-241-1/+1
|
* jail: network epoch protection for IP address listsGleb Smirnoff2021-12-261-268/+502
| | | | | | | | | | | | | | | | | | | Now struct prison has two pointers (IPv4 and IPv6) of struct prison_ip type. Each points into epoch context, address count and variable size array of addresses. These structures are freed with network epoch deferred free and are not edited in place, instead a new structure is allocated and set. While here, the change also generalizes a lot (but not enough) of IPv4 and IPv6 processing. E.g. address family agnostic helpers for kern_jail_set() are provided, that reduce v4-v6 copy-paste. The fast-path prison_check_ip[46]_locked() is also generalized into prison_ip_check() that can be executed with network epoch protection only. Reviewed by: jamie Differential revision: https://reviews.freebsd.org/D33339
* Fix buffer overread in preloaded hostuuid parsingJessica Clarke2021-12-221-4/+27
| | | | | | | | | | | | | | | | | | | | Commit b6be9566d236 stopped prison0_init writing outside of the preloaded hostuuid's bounds. However, the preloaded data will not (normally) have a NUL in it, and so validate_uuid will walk off the end of the buffer in its call to sscanf. Previously if there was any whitespace in the string we'd at least know there's a NUL one past the end due to the off-by-one error, but now no such byte is guaranteed. Fix this by copying to a temporary buffer and explicitly adding a NUL. Whilst here, change the strlcpy call to use a far less suspicious argument for dstsize; in practice it's fine, but it's an unusual pattern and not necessary. Found by: CHERI Reviewed by: emaste, kevans, jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33616
* vfs: remove the unused thread argument from NDINIT*Mateusz Guzik2021-11-251-2/+1
| | | | | | See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
* jail(8): Fix a few common typos in source code commentsGordon Bergling2021-10-271-1/+1
| | | | | | - s/phyiscal/physical/ MFC after: 3 days
* jail(9): Fix a typo in a commentGordon Bergling2021-09-261-1/+1
| | | | | | - s/erorr/error/ MFC after: 3 days
* kern: ether_gen_addr: randomize on default hostuuid, tooKyle Evans2021-06-021-1/+0
| | | | | | | | | | | | | | | Currently, this will still hash the default (all zero) hostuuid and potentially arrive at a MAC address that has a high chance of collision if another interface of the same name appears in the same broadcast domain on another host without a hostuuid, e.g., some virtual machine setups. Instead of using the default hostuuid, just treat it as a failure and generate a random LA unicast MAC address. Reviewed by: bz, gbe, imp, kbowling, kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29788
* Fix buffer overflow in preloaded hostuuid cleaningColin Percival2021-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a module of type "hostuuid" is provided by the loader, prison0_init strips any trailing whitespace and ASCII control characters by (a) adjusting the buffer length, and (b) zeroing out the characters in question, before storing it as the system's hostuuid. The buffer length adjustment was correct, but the zeroing overwrote one byte higher in memory than intended -- in the typical case, zeroing one byte past the end of the hostuuid buffer. Due to the layout of buffers passed by the boot loader to the kernel, this will be the first byte of a subsequent buffer. This was *probably* harmless; prison0_init runs after preloaded kernel modules have been linked and after the preloaded /boot/entropy cache has been processed, so in both cases having the first byte overwritten will not cause problems. We cannot however rule out the possibility that other objects which are preloaded by the loader could suffer from having the first byte overwritten. Since the zeroing does not in fact serve any purpose, remove it and trim trailing whitespace and ASCII control characters by adjusting the buffer length alone. Fixes: c3188289 Preload hostuuid for early-boot use Reviewed by: kevans, markj MFC after: 3 days
* Fix 'hostuuid: preload data malformed' warningColin Percival2021-05-181-2/+2
| | | | | | | | | | | | | | | | | If the preloaded hostuuid value is invalid and verbose booting is enabled, a warning is printed. This printf had two bugs: 1. It was missing a trailing \n character. 2. The malformed UUID is printed with %s even though it is not known to be NUL-terminated. This commit adds the missing \n and uses %.*s with the (already known) length of the preloaded UUID to ensure that we don't read past the end of the buffer. Reported by: kevans Fixes: c3188289 Preload hostuuid for early-boot use MFC after: 3 days
* base: remove if_wg(4) and associated utilities, manpageKyle Evans2021-03-171-1/+0
| | | | | | | | | | | | After length decisions, we've decided that the if_wg(4) driver and related work is not yet ready to live in the tree. This driver has larger security implications than many, and thus will be held to more scrutiny than other drivers. Please also see the related message sent to the freebsd-hackers@ and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on 2021/03/16, with the subject line "Removing WireGuard Support From Base" for additional context.
* if_wg: import latest fixup work from the wireguard-freebsd projectKyle Evans2021-03-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the culmination of about a week of work from three developers to fix a number of functional and security issues. This patch consists of work done by the following folks: - Jason A. Donenfeld <Jason@zx2c4.com> - Matt Dunwoodie <ncon@noconroy.net> - Kyle Evans <kevans@FreeBSD.org> Notable changes include: - Packets are now correctly staged for processing once the handshake has completed, resulting in less packet loss in the interim. - Various race conditions have been resolved, particularly w.r.t. socket and packet lifetime (panics) - Various tests have been added to assure correct functionality and tooling conformance - Many security issues have been addressed - if_wg now maintains jail-friendly semantics: sockets are created in the interface's home vnet so that it can act as the sole network connection for a jail - if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0 - if_wg now exports via ioctl a format that is future proof and complete. It is additionally supported by the upstream wireguard-tools (which we plan to merge in to base soon) - if_wg now conforms to the WireGuard protocol and is more closely aligned with security auditing guidelines Note that the driver has been rebased away from using iflib. iflib poses a number of challenges for a cloned device trying to operate in a vnet that are non-trivial to solve and adds complexity to the implementation for little gain. The crypto implementation that was previously added to the tree was a super complex integration of what previously appeared in an old out of tree Linux module, which has been reduced to crypto.c containing simple boring reference implementations. This is part of a near-to-mid term goal to work with FreeBSD kernel crypto folks and take advantage of or improve accelerated crypto already offered elsewhere. There's additional test suite effort underway out-of-tree taking advantage of the aforementioned jail-friendly semantics to test a number of real-world topologies, based on netns.sh. Also note that this is still a work in progress; work going further will be much smaller in nature. MFC after: 1 month (maybe)
* jail: Add safety around prison_deref() flags.Jamie Gritton2021-02-261-2/+8
| | | | | | | | do_jail_attach() now only uses the PD_XXX flags that refer to lock status, so make sure that something else like PD_KILL doesn't slip through. Add a KASSERT() in prison_deref() to catch any further PD_KILL misuse.
* jail: Fix locking on an early jail_set error.Jamie Gritton2021-02-261-1/+1
| | | | I had locked allprison_lock without immediately setting PD_LIST_LOCKED.
* jail: re-commit 811e27fa3c44 with fixesJamie Gritton2021-02-251-94/+168
| | | | | | | | | Make sure PD_KILL isn't passed to do_jail_attach, where it might end up trying to kill the caller's prison (even prison0). Fix the child jail loop in prison_deref_kill, which was doing the post-order part during the pre-order part. That's not a system- killer, but make jails not always die correctly.
* jail: back out 811e27fa3c44 until it doesn't break JenkinsJamie Gritton2021-02-251-165/+93
| | | | Reported by: arichardson