src - FreeBSD source tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	session: avoid proctree lock on proc exit when possible	Mateusz Guzik	2016-01-20	1	-53/+1
\| \| \| \| \| \| \| \| \|	We can get away with the common case with only proc lock held. Reviewed by: kib Notes: svn path=/head/; revision=294472
*	Fix style issues around existing SDT probes.	Mark Johnston	2015-12-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week Notes: svn path=/head/; revision=292384
*	Enforce the maxproc limitation before allocating struct proc, initial	Konstantin Belousov	2015-10-08	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	struct thread and kernel stack for the thread. Otherwise, a load similar to a fork bomb would exhaust KVA and possibly kmem, mostly due to the struct proc being type-stable. The nprocs counter is changed from being protected by allproc_lock sx to be an atomic variable. Note that ddb/db_ps.c:db_ps() use of nprocs was unsafe before, and is still unsafe, but it seems that the only possible undesired consequence is the harmless warning printed when allproc linked list length does not match nprocs. Diagnosed by: Svatopluk Kraus <onwahe@gmail.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=289026
*	save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE	Andriy Gapon	2015-09-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days Notes: svn path=/head/; revision=288336
*	When the wait*(2) syscalls wait for any process (P_ALL), they should	Mariusz Zaborski	2015-08-12	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	ignore processes created with the pdfork(2) syscall. PR: 201054 Approved by: pjd (mentor) Discussed with: emaste, rwatson Notes: svn path=/head/; revision=286698
*	The si_status field of the siginfo_t, provided by the waitid(2) and	Konstantin Belousov	2015-07-18	1	-24/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=285670
*	Add an initial NUMA affinity/policy configuration for threads and processes.	Adrian Chadd	2015-07-11	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is based on work done by jeff@ and jhb@, as well as the numa.diff patch that has been circulating when someone asks for first-touch NUMA on -10 or -11. * Introduce a simple set of VM policy and iterator types. * tie the policy types into the vm_phys path for now, mirroring how the initial first-touch allocation work was enabled. * add syscalls to control changing thread and process defaults. * add a global NUMA VM domain policy. * implement a simple cascade policy order - if a thread policy exists, use it; if a process policy exists, use it; use the default policy. * processes inherit policies from their parent processes, threads inherit policies from their parent threads. * add a simple tool (numactl) to query and modify default thread/process policities. * add documentation for the new syscalls, for numa and for numactl. * re-enable first touch NUMA again by default, as now policies can be set in a variety of methods. This is only relevant for very specific workloads. This doesn't pretend to be a final NUMA solution. The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by 'sysctl vm.default_policy=rr'. This is only relevant if MAXMEMDOM is set to something other than 1. Ie, if you're using GENERIC or a modified kernel with non-NUMA, then this is a glorified no-op for you. Thank you to Norse Corp for giving me access to rather large (for FreeBSD!) NUMA machines in order to develop and verify this. Thank you to Dell for providing me with dual socket sandybridge and westmere v3 hardware to do NUMA development with. Thank you to Scott Long at Netflix for providing me with access to the two-socket, four-domain haswell v3 hardware. Thank you to Peter Holm for running the stress testing suite against the NUMA branch during various stages of development! Tested: * MIPS (regression testing; non-NUMA) * i386 (regression testing; non-NUMA GENERIC) * amd64 (regression testing; non-NUMA GENERIC) * westmere, 2 socket (thankyou norse!) * sandy bridge, 2 socket (thankyou dell!) * ivy bridge, 2 socket (thankyou norse!) * westmere-EX, 4 socket / 1TB RAM (thankyou norse!) * haswell, 2 socket (thankyou norse!) * haswell v3, 2 socket (thankyou dell) * haswell v3, 2x18 core (thankyou scott long / netflix!) * Peter Holm ran a stress test suite on this work and found one issue, but has not been able to verify it (it doesn't look NUMA related, and he only saw it once over many testing runs.) * I've tested bhyve instances running in fixed NUMA domains and cpusets; all seems to work correctly. Verified: * intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different NUMA policies for processes under test. Review: This was reviewed through phabricator (https://reviews.freebsd.org/D2559) as well as privately and via emails to freebsd-arch@. The git history with specific attributes is available at https://github.com/erikarn/freebsd/ in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy). This has been reviewed by a number of people (stas, rpaulo, kib, ngie, wblock) but not achieved a clear consensus. My hope is that with further exposure and testing more functionality can be implemented and evaluated. Notes: * The VM doesn't handle unbalanced domains very well, and if you have an overly unbalanced memory setup whilst under high memory pressure, VM page allocation may fail leading to a kernel panic. This was a problem in the past, but it's much more easily triggered now with these tools. * This work only controls the path through vm_phys; it doesn't yet strongly/predictably affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory isn't really guaranteed in any way. That's next on my plate. Sponsored by: Norse Corp, Inc.; Dell Notes: svn path=/head/; revision=285387
*	Don't clobber td->td_retval[0] in proc_reap().	Ed Schouten	2015-07-09	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While writing tests for CloudABI, I noticed that close() on process descriptors returns the process ID of the child process. This is interesting, as close() is only allowed to return 0 or -1. It turns out that we clobber td->td_retval[0] in proc_reap(), so that wait*() properly returns the process ID. Change proc_reap() to leave td->td_retval[0] alone. Set the return value in kern_wait6() instead, by keeping track of the PID before we (potentially) reap the process. Differential Revision: https://reviews.freebsd.org/D3032 Reviewed by: kib Notes: svn path=/head/; revision=285312
*	Remove several write-only variables, all reported by the gcc 4.9	Konstantin Belousov	2015-05-29	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=283735
*	Currently, softupdate code detects overstepping on the workitems	Konstantin Belousov	2015-05-27	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	limits in the code which is deep in the call stack, and owns several critical system resources, like vnode locks. Attempt to wait while the per-mount softupdate thread cleans up the backlog may deadlock, because the thread might need to lock the same vnode which is owned by the waiting thread. Instead of synchronously waiting for the worker, perform the worker' tickle and pause until the backlog is cleaned, at the safe point during return from kernel to usermode. A new ast request to call softdep_ast_cleanup() is created, the SU code now only checks the size of queue and schedules ast. There is no ast delivery for the kernel threads, so they are exempted from the mechanism, except NFS daemon threads. NFS server loop explicitely checks for the request, and informs the schedule_cleanup() that it is capable of handling the requests by the process P2_AST_SU flag. This is needed because nfsd may be the sole cause of the SU workqueue overflow. But, to not cause nsfd to spawn additional threads just because we slow down existing workers, only tickle su threads, without waiting for the backlog cleanup. Reviewed by: jhb, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=283600
*	Do not allow a process to reap an orphan (a child currently being	John Baldwin	2015-05-26	1	-12/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	traced by another process such as a debugger). The parent process does need to check for matching orphan pids to avoid returning ECHILD if an orphan has exited, but it should not return the exited status for the child until after the debugger has detached from the orphan process either explicitly or implicitly via wait(). Add two tests for for this case: one where the debugger is the direct child (thus the parent has a non-empty children list) and one where the debugger is not a direct child (so the only "child" of the parent is the orphan). Differential Revision: https://reviews.freebsd.org/D2644 Reviewed by: kib MFC after: 2 weeks Notes: svn path=/head/; revision=283562
*	Add KTR tracing for some MI ptrace events.	John Baldwin	2015-05-25	1	-0/+9
\| \| \| \| \| \| \| \|	Differential Revision: https://reviews.freebsd.org/D2643 Reviewed by: kib Notes: svn path=/head/; revision=283546
*	Only reparent a traced process to its old parent if the tracing process is	John Baldwin	2015-05-22	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	not the old parent. Otherwise, proc_reap() will leave the zombie in place resulting in the process' status being returned twice to its parent. Add test cases for PT_TRACE_ME and PT_ATTACH which are fixed by this change. Differential Revision: https://reviews.freebsd.org/D2594 Reviewed by: kib MFC after: 2 weeks Notes: svn path=/head/; revision=283282
*	Add kern.racct.enable tunable and RACCT_DISABLED config option.	Edward Tomasz Napierala	2015-04-29	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). Differential Revision: https://reviews.freebsd.org/D2369 Reviewed by: kib@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=282213
*	proc: get rid of proc lock + unlock pair in proc_reap	Mateusz Guzik	2015-03-16	1	-4/+5
\| \| \| \| \| \| \| \| \| \|	A comment in the code stated we PROC_LOCK and as a side effect guarantee all writers released process lock. But at that point such lock was already taken while we were removing the process from all lists, so it should be already unreachable. Notes: svn path=/head/; revision=280131
*	cred: add proc_set_cred helper	Mateusz Guzik	2015-03-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The goal here is to provide one place altering process credentials. This eases debugging and opens up posibilities to do additional work when such an action is performed. Notes: svn path=/head/; revision=280130
*	The umtx_lock mutex is used by top-half of the kernel, but is	Konstantin Belousov	2015-02-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	currently a spin lock. Apparently, the only reason for this is that umtx_thread_exit() is called under the process spinlock, which put the requirement on the umtx_lock. Note that the witness static order list is wrong for the umtx_lock, umtx_lock is explicitely before any thread lock, so it is also before sleepq locks. Change umtx_lock to be the sleepable mutex. For the reason above, the calls to umtx_thread_exit() are moved from thread_exit() earlier in each caller, when the process spin lock is not yet taken. Discussed with: jhb Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Notes: svn path=/head/; revision=279390
*	Add a facility for non-init process to declare itself the reaper of	Konstantin Belousov	2014-12-15	1	-6/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	the orphaned descendants. Base of the API is modelled after the same feature from the DragonFlyBSD. Requested by: bapt Reviewed by: jilles (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Notes: svn path=/head/; revision=275800
*	Add facility to stop all userspace processes. The supposed use of the	Konstantin Belousov	2014-12-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=275745
*	When process is exiting, check for suspension regardless of	Konstantin Belousov	2014-12-08	1	-9/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	multithreaded status of the process. The stopped state must be cleared before P_WEXIT is set. A stop signal delivered just before first PROC_LOCK() block in exit1(9) would put the process into pending stop with P_WEXIT set or assertion triggered. Also recheck for the suspension after failed thread_single(9) call, since process lock could be dropped. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=275615
*	The process spin lock currently has the following distinct uses:	Konstantin Belousov	2014-11-26	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=275121
*	Avoid unnecessary ppeers_lock acquisition in exit1.	Mateusz Guzik	2014-10-05	1	-10/+12
\| \| \| \| \| \| \|	MFC after: 1 week Notes: svn path=/head/; revision=272560
*	Fix up proc_realparent to always return correct process.	Mateusz Guzik	2014-09-03	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to the change it would always return initproc for non-traced processes. This fixes ps apparently always returning 1 as ppid. Pointy hat: mjg Reported by: many MFC after: 1 week Notes: svn path=/head/; revision=270993
*	Properly reparent traced processes when the tracer dies.	Mateusz Guzik	2014-08-24	1	-11/+22
\| \| \| \| \| \| \| \| \| \| \|	Previously they were uncoditionally reparented to init. In effect it was possible that tracee was never returned to original parent. Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=270443
*	Correct the order of arguments passed to LIST_INSERT_AFTER().	Mark Johnston	2014-08-15	1	-2/+2
\| \| \| \| \| \| \| \|	Reviewed by: kib X-MFC-With: r269656 Notes: svn path=/head/; revision=270024
*	Correct the problems with the ptrace(2) making the debuggee an orphan.	Konstantin Belousov	2014-08-07	1	-8/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One problem is inferior(9) looping due to the process tree becoming a graph instead of tree if the parent is traced by child. Another issue is due to the use of p_oppid to restore the original parent/child relationship, because real parent could already exited and its pid reused (noted by mjg). Add the function proc_realparent(9), which calculates the parent for given process. It uses the flag P_TREE_FIRST_ORPHAN to detect the head element of the p_orphan list and than stepping back to its container to find the parent process. If the parent has already exited, the init(8) is returned. Move the P_ORPHAN and the new helper flag from the p_flag* to new p_treeflag field of struct proc, which is protected by proctree lock instead of proc lock, since the orphans relationship is managed under the proctree_lock already. The remaining uses of p_oppid in ptrace(PT_DETACH) and process reapping are replaced by proc_realparent(9). Phabric: D417 Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=269656
*	Eliminate plim and vtmp local vars in exit1.	Mateusz Guzik	2014-07-10	1	-6/+3
\| \| \| \| \| \| \| \| \|	No functional changes. MFC after: 1 week Notes: svn path=/head/; revision=268514
*	Update kernel inclusions of capability.h to use capsicum.h instead; some	Robert Watson	2014-03-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks Notes: svn path=/head/; revision=263233
*	proc exit: don't take PROC_LOCK while freeing rlimits	Mateusz Guzik	2013-12-15	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	Code wishing to check rlimits of some process should check whether it is exiting first, which current consumers do. MFC after: 2 weeks Notes: svn path=/head/; revision=259407
*	Make process descriptors standard part of the kernel. rwhod(8) already	Pawel Jakub Dawidek	2013-11-30	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \|	requires process descriptors to work and having PROCDESC in GENERIC seems not enough, especially that we hope to have more and more consumers in the base. MFC after: 3 days Notes: svn path=/head/; revision=258768
*	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE	Andriy Gapon	2013-11-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks Notes: svn path=/head/; revision=258622
*	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging	Attilio Rao	2013-11-25	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip Notes: svn path=/head/; revision=258541
*	Fix siginfo_t.si_status for wait6/waitid/SIGCHLD.	Jilles Tjoelker	2013-11-17	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Per POSIX, si_status should contain the value passed to exit() for si_code==CLD_EXITED and the signal number for other si_code. This was incorrect for CLD_EXITED and CLD_DUMPED. This is still not fully POSIX-compliant (Austin group issue #594 says that the full value passed to exit() shall be returned via si_status, not just the low 8 bits) but is sufficient for a si_status-related test in libnih (upstart, Debian/kFreeBSD). PR: kern/184002 Reported by: Dmitrijs Ledkovs Tested by: Dmitrijs Ledkovs Notes: svn path=/head/; revision=258281
*	Specify SDT probe argument types in the probe definition itself rather than	Mark Johnston	2013-08-15	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks Notes: svn path=/head/; revision=254350
*	Remove cr_prison NULL check from proc_to_reap.	Mateusz Guzik	2013-07-22	1	-2/+1
\| \| \| \| \| \| \| \| \|	Userspace processes always have a prison. MFC after: 2 weeks Notes: svn path=/head/; revision=253538
*	Merge Capsicum overhaul:	Pawel Jakub Dawidek	2013-03-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib Notes: svn path=/head/; revision=247602
*	When vforked child is traced, the debugging events are not generated	Konstantin Belousov	2013-02-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	until child performs exec(). The behaviour is reasonable when a debugger is the real parent, because the parent is stopped until exec(), and sending a debugging event to the debugger would deadlock both parent and child. On the other hand, when debugger is not the parent of the vforked child, not sending debugging signals makes it impossible to debug across vfork. Fix the issue by declining generating debug signals only when vfork() was done and child called ptrace(PT_TRACEME). Set a new process flag P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is set, which indicates that the process was created with vfork() and still did not execed. Check P_PPTRACE from issignal(), instead of refusing the trace outright for the P_PPWAIT case. The scope of P_PPTRACE is exactly contained in the scope of P_PPWAIT. Found and tested by: zont Reviewed by: pluknet MFC after: 2 weeks Notes: svn path=/head/; revision=246484
*	The case of pid == WAIT_MYPGRP for the kern_wait() is already handled	Konstantin Belousov	2013-01-30	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in kern_wait6(), which is called by kern_wait(). Remove the redundand check, introduced in r243136, and add a comment noting this, to make the code less confusing. The blank lines are added to properly delineate the scope of the preceeding comments. Noted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 week Notes: svn path=/head/; revision=246118
*	Protect the p->p_pgrp dereference with the process lock.	Konstantin Belousov	2013-01-06	1	-0/+2
\| \| \| \| \| \| \|	MFC after: 3 days Notes: svn path=/head/; revision=245104
*	Restore the proper handling of the pid 0 for waitpid(2).	Konstantin Belousov	2012-11-16	1	-4/+9
\| \| \| \| \| \| \| \| \| \|	Fix the style around. Reported and reviewed by: bde (previous version) MFC after: 28 days Notes: svn path=/head/; revision=243136
*	Style fixes for r242958.	Konstantin Belousov	2012-11-16	1	-8/+6
\| \| \| \| \| \| \| \|	Reported and reviewed by: bde MFC after: 28 days Notes: svn path=/head/; revision=243133
*	Add the wait6(2) system call. It takes POSIX waitid()-like process	Konstantin Belousov	2012-11-13	1	-37/+268
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage. Allow to get the current rusage information for non-exited processes as well, similar to Solaris. The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in. Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state. PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month Notes: svn path=/head/; revision=242958
*	Remove the support for using non-mpsafe filesystem modules.	Konstantin Belousov	2012-10-22	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho Notes: svn path=/head/; revision=241896
*	Ignore stop and continue signals sent to an exiting process. Stop signals	John Baldwin	2012-09-13	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	set p_xstat to the signal that triggered the stop, but p_xstat is also used to hold the exit status of an exiting process. Without this change, a stop signal that arrived after a process was marked P_WEXIT but before it was marked a zombie would overwrite the exit status with the stop signal number. Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=240467
*	A few whitespace and comment fixes.	John Baldwin	2012-09-07	1	-3/+3
\| \| \| \|	Notes: svn path=/head/; revision=240204
*	When process exists, not only the children shall be reparented to	Konstantin Belousov	2012-04-02	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \|	init, but also the orphans shall be removed from the orphan list, because the list header is destroyed. Reported and tested by: pho MFC after: 3 days Notes: svn path=/head/; revision=233809
*	Add helper function to remove the process from the orphans list and	Konstantin Belousov	2012-04-02	1	-8/+14
\| \| \| \| \| \| \| \| \| \|	use it instead of inlined code. Tested by: pho MFC after: 3 days Notes: svn path=/head/; revision=233808
*	Add an assert for proctree_lock to proc_to_reap().	Jaakko Heinonen	2012-03-14	1	-0/+2
\| \| \| \| \| \| \| \|	Discussed with: kib MFC after: 1 week Notes: svn path=/head/; revision=232975
*	Lock the process around manipulations with p_flag.	Konstantin Belousov	2012-03-13	1	-0/+2
\| \| \| \| \| \| \| \|	Reported and reviewed by: jh MFC after: 3 days Notes: svn path=/head/; revision=232947
*	Restore the return statement erronously removed in the r232048.	Konstantin Belousov	2012-02-24	1	-0/+1
\| \| \| \| \| \| \| \| \|	Submitted by: cognet Pointy hat to: kib (reuse the one I already got today) MFC after: 13 days Notes: svn path=/head/; revision=232104