aboutsummaryrefslogtreecommitdiff
path: root/sys/kern/kern_fork.c
Commit message (Collapse)AuthorAgeFilesLines
* Add BSM record conversion for a number of syscalls:Christian S.J. Peron2020-05-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - thr_kill(2) and thr_exit(2) generally (no argument auditing here. - A set of syscalls for the process descriptor family, specifically: pdfork(2), pdgetpid(2) and pdkill(2) For these syscalls, audit the file descriptor. In the case of pdfork(2) a pointer to an integer (file descriptor) is passed in as an argument. We audit the post initialized file descriptor (not the random garbage that would have been passed in). We will also audit the child process which was created from the fork operation (similar to what is done for the fork(2) syscall). pdkill(2) we audit the signal value and fd, and finally pdgetpid(2) just the file descriptor: - Following is a sample of the produced audit trails: header,111,11,pdfork(2),0,Sat May 16 03:07:50 2020, + 394 msec argument,0,0x39d,child PID argument,2,0x2,flags argument,1,0x8,fd subject,root,root,0,root,0,924,0,0,0.0.0.0 return,success,925 header,79,11,pdgetpid(2),0,Sat May 16 03:07:50 2020, + 394 msec argument,1,0x8,fd subject,root,root,0,root,0,924,0,0,0.0.0.0 return,success,0 trailer,79 header,135,11,pdkill(2),0,Sat May 16 03:07:50 2020, + 395 msec argument,1,0x8,fd argument,2,0xf,signal process_ex,root,root,0,root,0,925,0,0,0.0.0.0 subject,root,root,0,root,0,924,0,0,0.0.0.0 return,success,0 trailer,135 MFC after: 1 week Notes: svn path=/head/; revision=361103
* Retire procfs-based process debugging.John Baldwin2020-04-011-11/+0
| | | | | | | | | | | | | | | | | | Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837 Notes: svn path=/head/; revision=359530
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki2020-02-261-2/+4
| | | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
* Add a way to manage thread signal mask using shared word, instead of syscall.Konstantin Belousov2020-02-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773 Notes: svn path=/head/; revision=357693
* schedlock 1/4Jeff Roberson2019-12-151-1/+0
| | | | | | | | | | | | | | | Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626 Notes: svn path=/head/; revision=355779
* rfork(2): add RFSPAWN flagKyle Evans2019-09-251-1/+14
| | | | | | | | | | | | | | When RFSPAWN is passed, rfork exhibits vfork(2) semantics but also resets signal handlers in the child during creation to avoid a point of corruption of parent state from the child. This flag will be used by posix_spawn(3) to handle potential signal issues. Reviewed by: jilles, kib Differential Revision: https://reviews.freebsd.org/D19058 Notes: svn path=/head/; revision=352711
* Add procctl(PROC_STACKGAP_CTL)Konstantin Belousov2019-09-031-1/+2
| | | | | | | | | | | | | | It allows a process to request that stack gap was not applied to its stacks, retroactively. Also it is possible to control the gaps in the process after exec. PR: 239894 Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21352 Notes: svn path=/head/; revision=351773
* fork: rework locking around do_forkMateusz Guzik2019-08-171-25/+23
| | | | | | | | | | | | | | | - move allproc lock into the func, it is of no use prior to it - the code would lock p1 and p2 while holding allproc to partially construct it after it gets added to the list. instead we can do the work prior to adding anything. - protect lastpid with procid_lock As a side effect we do less work with allproc held. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=351175
* fork: bump process count before checking for permission to cross the limitMateusz Guzik2019-08-171-13/+10
| | | | | | | | | | | | The limit is almost never reached. Do the check only on failure to see if we can override it. No change in user-visible behavior. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=351174
* fork: stop skipping < 100 ids on wrap aroundMateusz Guzik2019-08-171-11/+4
| | | | | | | | | | | | | | | Code doing this is commented with a claim that these IDs are occupied by daemons, but that's demonstrably false. To an extent the range is used by init and kernel processes (and on sufficiently big machines it indeed is fully populated). On a sample box 40-way box the highest id in the range is 63. On a different one it is 23. Just use the range. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=351173
* Inherit P2_PROTMAX_{ENABLE,DISABLE} across fork().Mark Johnston2019-07-101-1/+2
| | | | | | | | | | | | Thus, when using proccontrol(1) to disable implicit application of PROT_MAX within a process, child processes will inherit this setting. Discussed with: kib MFC with: r349609 Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=349892
* Extract eventfilter declarations to sys/_eventfilter.hConrad Meyer2019-05-201-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped. Notes: svn path=/head/; revision=347984
* Simplify the test against maxproc in fork1().Mark Johnston2019-05-071-11/+13
| | | | | | | | | | | | Previously nprocs_new would be tested against maxprocs twice when nprocs_new < maxprocs - 10. Eliminate the unnecessary comparison. Submitted by: Wuyang Chung <wuyang.chung1@gmail.com> GitHub PR: https://github.com/freebsd/freebsd/pull/397 MFC after: 1 week Notes: svn path=/head/; revision=347227
* Annotate nprocs with __exclusive_cache_lineMateusz Guzik2019-05-041-1/+1
| | | | | | | Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=347131
* Implement Address Space Layout Randomization (ASLR)Konstantin Belousov2019-02-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this change, randomization can be enabled for all non-fixed mappings. It means that the base address for the mapping is selected with a guaranteed amount of entropy (bits). If the mapping was requested to be superpage aligned, the randomization honours the superpage attributes. Although the value of ASLR is diminshing over time as exploit authors work out simple ASLR bypass techniques, it elimintates the trivial exploitation of certain vulnerabilities, at least in theory. This implementation is relatively small and happens at the correct architectural level. Also, it is not expected to introduce regressions in existing cases when turned off (default for now), or cause any significant maintaince burden. The randomization is done on a best-effort basis - that is, the allocator falls back to a first fit strategy if fragmentation prevents entropy injection. It is trivial to implement a strong mode where failure to guarantee the requested amount of entropy results in mapping request failure, but I do not consider that to be usable. I have not fine-tuned the amount of entropy injected right now. It is only a quantitive change that will not change the implementation. The current amount is controlled by aslr_pages_rnd. To not spoil coalescing optimizations, to reduce the page table fragmentation inherent to ASLR, and to keep the transient superpage promotion for the malloced memory, locality clustering is implemented for anonymous private mappings, which are automatically grouped until fragmentation kicks in. The initial location for the anon group range is, of course, randomized. This is controlled by vm.cluster_anon, enabled by default. The default mode keeps the sbrk area unpopulated by other mappings, but this can be turned off, which gives much more breathing bits on architectures with small address space, such as i386. This is tied with the question of following an application's hint about the mmap(2) base address. Testing shows that ignoring the hint does not affect the function of common applications, but I would expect more demanding code could break. By default sbrk is preserved and mmap hints are satisfied, which can be changed by using the kern.elf{32,64}.aslr.honor_sbrk sysctl. ASLR is enabled on per-ABI basis, and currently it is only allowed on FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support for additional architectures will be added after further testing. Both per-process and per-image controls are implemented: - procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS; - NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible to force ASLR off for the given binary. (A tool to edit the feature control note is in development.) Global controls are: - kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2); - kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings; - kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2); - vm.cluster_anon - enables anon mapping clustering. PR: 208580 (exp runs) Exp-runs done by: antoine Reviewed by: markj (previous version) Discussed with: emaste Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5603 Notes: svn path=/head/; revision=343964
* Re-wrap long line after r341827.Konstantin Belousov2019-01-171-1/+3
| | | | | | | | Sponsored by: The FreeBSD Foundation MFC after: 3 days Notes: svn path=/head/; revision=343107
* Microoptimize corner case of ID bitmap handling.Mateusz Guzik2018-12-191-8/+7
| | | | | | | | | Prior to the change we would avoidably test more possibly used IDs. While here update the comment: there is no pidchecked variable anymore. Notes: svn path=/head/; revision=342237
* Deinline vfork handling out of the syscall return path.Mateusz Guzik2018-12-191-0/+45
| | | | | | | | | | vfork is rarely called (comparatively to other syscalls) and it avoidably pollutes the fast path. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=342236
* Remove unused argument to priv_check_cred.Mateusz Guzik2018-12-111-3/+2
| | | | | | | | | | | | | | | | Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341827
* Fix a corner case in ID bitmap management.Mateusz Guzik2018-12-081-1/+3
| | | | | | | | | | | | | If all IDs from trypid to pid_max were used as pids, the code would enter a loop which would be infinite if none of the IDs could become free (e.g. they all belong to processes which did not transitioned to zombie). Fixes: r341684 ("Manage process-related IDs with bitmaps") Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341723
* proc: postpone proc unlock until after reporting with kqueueMateusz Guzik2018-12-081-5/+5
| | | | | | | | | | | | kqueue would always relock immediately afterwards. While here drop the NULL check for list itself. The list is always allocated. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341722
* Manage process-related IDs with bitmapsMateusz Guzik2018-12-071-76/+24
| | | | | | | | | | | | | | | | | | | | | | | Currently unique pid allocation on fork often requires a full walk of process, group, session lists to make sure it is not used by anything. This has a side effect of requiring proctree to be held along with allproc, which adds more contention in poudriere -j 128. The patch below implements trivial bitmaps which gets rid of the problem. Dedicated lock is introduced to manage IDs. While here a bug was discovered: all processes would inherit reap id from the first process spawned by init. This had a side effect of keeping the ID used and when allocation rolls over to the beginning it keeps being skipped. The patch is loosely based on initial work by mjoras@. Reviewed by: kib Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341684
* proc: create a dedicated lock for zombproc to ligthen the load on allproc_lockMateusz Guzik2018-11-291-0/+8
| | | | | | | | | | | | waitpid always takes proctree to evaluate the list, but only takes allproc if it can reap. With this patch allproc is no longer taken, which helps during poudriere -j 128. Discussed with: kib Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341176
* Revert "fork: fix use-after-free with vfork"Mateusz Guzik2018-11-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | This unreliably breaks libc handling of vfork where forking succeded, but execve did not. vfork code in libc performs waitpid with WNOHANG in case of failed exec. With the fix exit codepath was waking up the parent before the child fully transitioned to a zombie. Woken up parent would waitpid, which could find a not-yet-zombie child and fail to reap it due to the WNOHANG flag. While removing the flag fixes the problem, it is not an option due to older releases which would still suffer from the kernel change. Revert the fix until a solution can be worked out. Note that while use-after-free which gets back due to the revert is a real bug, it's side-effects are limited due to the fact that struct proc memory is never released by UMA. Notes: svn path=/head/; revision=340793
* fork: remove avoidable proc lock/unlock pairMateusz Guzik2018-11-221-13/+5
| | | | | | | | | | We don't have to access the process after making it runnable, so there is no need to hold it either. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=340785
* fork: fix use-after-free with vforkMateusz Guzik2018-11-221-0/+1
| | | | | | | | | | | | | | | | | | The pointer to the child is stored without any reference held. Then it is blindly used to wait until P_PPWAIT is cleared. However, if the child is autoreaped it could have exited and get freed before the parent started waiting. Use the existing hold mechanism to mitigate the problem. Most common case of doing exec remains unchanged. The corner case of doing exit performs wake up before waiting for holds to clear. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18295 Notes: svn path=/head/; revision=340784
* proc: implement pid hash locks and an iteratorMateusz Guzik2018-11-211-0/+2
| | | | | | | | | | | | | | | forks, exits and waits are frequently stalled during poudriere -j 128 runs due to killpg and process list exports performed for each package. Both uses take the allproc lock. The latter case can be modified to iterate over the hash with finer grained locking instead. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17817 Notes: svn path=/head/; revision=340742
* proc: always store parent pid in p_oppidMateusz Guzik2018-11-161-1/+2
| | | | | | | | | | | | Doing so removes the dependency on proctree lock from sysctl process list export which further reduces contention during poudriere -j 128 runs. Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17825 Notes: svn path=/head/; revision=340482
* fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.Konstantin Belousov2018-06-211-32/+39
| | | | | | | | | | | | | | | | | An RFSTOPPED thread can't clean TDB_STOPATFORK, which is done in the fork_return() in its context, so parent is stuck forever. Triggered when trying to ptrace linux process. Instead of waiting for the new thread to clear TDB_STOPATFORK, tag it as traced and reparent to the debugger in do_fork(), and let it only notify the debugger when run. Submitted by: Yanko Yankulov <yanko.yankulov@gmail.com> Reviewed by: jhb MFC after: 1 week X-MFC-Note: keep p_dbgwait placeholder intact Differential revision: https://reviews.freebsd.org/D15857 Notes: svn path=/head/; revision=335504
* Reduce contention on the proctree lock during heavy package build.Mateusz Guzik2018-02-201-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a proctree -> allproc ordering established. Most of the time it is either xlock -> xlock or slock -> slock. On fork however there is a slock -> xlock pair which results in pathological wait times due to threads keeping proctree held for reading and all waiting on allproc. Switch this to xlock -> xlock. Longer term fix would get rid of proctree in this place to begin with. Right now it is necessary to walk the session/process group lists to determine which id is free. The walk can be avoided e.g. with bitmaps. The exit path used to have one place which dealt with allproc and then with proctree. Move the allproc acquire into the section protected by proctree. This reduces contention against threads waiting on proctree in the fork codepath - the fork proctree holder does not have to wait for allproc as often. Finally, move tidhash manipulation outside of the area protected by either of these locks. The removal from the hash was already unprotected. There is no legitimate reason to look up thread ids for a process still under construction. This results in about 50% wait time reduction during -j 128 package build. Notes: svn path=/head/; revision=329615
* Postpone sx_sunlock(&proctree_lock) on fork until after allproc is dropped.Mateusz Guzik2018-02-171-3/+2
| | | | | | | | There is a significant contention on the lock during -j 128 package build. This change drops total wait time on this lock by 60%. Notes: svn path=/head/; revision=329420
* Remove trailing whitespace.Bjoern A. Zeeb2018-01-141-4/+4
| | | | | | | No functional change. Notes: svn path=/head/; revision=327968
* Implement 'domainset', a cpuset based NUMA policy mechanism. This allowsJeff Roberson2018-01-121-9/+0
| | | | | | | | | | | | | | | | | | | userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403 Notes: svn path=/head/; revision=327895
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Introduce EVENTHANDLER_LIST and some users.Matt Joras2017-11-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a facility to EVENTHANDLER(9) for explicitly defining a reference to an event handler list. This is useful since previously all invokers of events had to do a locked traversal of the global list of event handler lists in order to find the appropriate event handler list. By keeping a pointer to the appropriate list an invoker can avoid this traversal completely. The pointer is initialized with SYSINIT(9) during the eventhandler stage. Users registering interest in events do not need to know if the event is backed by such a list, since the list is added to the global list of lists. As with lists that are not pre-defined it is safe to register for the events before the list has been created. This converts the process_* and thread_* events to using the new facility, as these are events whose locked traversals end up showing up significantly in ports build workflows (and presumably other workflows with many short lived threads/procs). It may be advantageous to convert other events to using the new facility. The el_flags field is now unused, but leave it be so that this revision can be MFC'd. Reviewed by: bdrewery, markj, mjg Approved by: rstone (mentor) In collaboration with: ian MFC after: 4 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12814 Notes: svn path=/head/; revision=325621
* If the user tries to set kern.randompid to 1 (which is meaningless), setDag-Erling Smørgrav2017-09-101-8/+14
| | | | | | | | | | | it to a random value between 100 and 1123, rather than 0 as before. Submitted by: Marie Helene Kvello-Aune <marieheleneka@gmail.com> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D5336 Notes: svn path=/head/; revision=323390
* Move struct syscall_args syscall arguments parameters container intoKonstantin Belousov2017-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | struct thread. For all architectures, the syscall trap handlers have to allocate the structure on the stack. The structure takes 88 bytes on 64bit arches which is not negligible. Also, it cannot be easily found by other code, which e.g. caused duplication of some members of the structure to struct thread already. The change removes td_dbg_sc_code and td_dbg_sc_nargs which were directly copied from syscall_args. The structure is put into the copied on fork part of the struct thread to make the syscall arguments information correct in the child after fork. This move will also allow several more uses shortly. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks X-Differential revision: https://reviews.freebsd.org/D11080 Notes: svn path=/head/; revision=319873
* - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeterGleb Smirnoff2017-04-171-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156 Notes: svn path=/head/; revision=317061
* Defer ptracestop() signals that cannot be delivered immediatelyEric Badger2017-02-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | When a thread is stopped in ptracestop(), the ptrace(2) user may request a signal be delivered upon resumption of the thread. Heretofore, those signals were discarded unless ptracestop()'s caller was issignal(). Fix this by modifying ptracestop() to queue up signals requested by the ptrace user that will be delivered when possible. Take special care when the signal is SIGKILL (usually generated from a PT_KILL request); no new stop events should be triggered after a PT_KILL. Add a number of tests for the new functionality. Several tests were authored by jhb. PR: 212607 Reviewed by: kib Approved by: kib (mentor) MFC after: 2 weeks Sponsored by: Dell EMC In collaboration with: jhb Differential Revision: https://reviews.freebsd.org/D9260 Notes: svn path=/head/; revision=313992
* vfs: add vrefact, to be used when the vnode has to be already activeMateusz Guzik2016-12-121-1/+1
| | | | | | | | | | | | | This allows blind increment of relevant counters which under contention is cheaper than inc-not-zero loops at least on amd64. Use it in some of the places which are guaranteed to see already active vnodes. Reviewed by: kib (previous version) Notes: svn path=/head/; revision=309893
* Add PROC_TRAPCAP procctl(2) controls and global sysctl kern.trap_enocap.Konstantin Belousov2016-09-211-1/+1
| | | | | | | | | | | | | | | Both can be used to cause processes in capability mode to receive SIGTRAP when ENOTCAPABLE or ECAPMODE errors are returned from syscalls. Idea by: emaste Reviewed by: oshogbo (previous version), emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7965 Notes: svn path=/head/; revision=306081
* Renumber license clauses in sys/kern to avoid skipping #3Ed Maste2016-09-151-1/+1
| | | | Notes: svn path=/head/; revision=305832
* Don't set P2_PTRACE_FSTP in a process that invokes ptrace(PT_TRACE_ME).Mark Johnston2016-08-191-1/+1
| | | | | | | | | | | | | Such processes are stopped synchronously by a direct call to ptracestop(SIGTRAP) upon exec. P2_PTRACE_FSTP causes the exec()ing thread to suspend itself while waiting for a SIGSTOP that never arrives. Reviewed by: kib MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D7576 Notes: svn path=/head/; revision=304487
* Remove mention of the Giant from the fork_return() description.Konstantin Belousov2016-08-031-3/+3
| | | | | | | | | | | Making emphasis on this lock in the core function comment is confusing for the modern kernel. Sponsored by: The FreeBSD Foundation MFC after: 3 days Notes: svn path=/head/; revision=303702
* When a debugger attaches to the process, SIGSTOP is sent to theKonstantin Belousov2016-07-281-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | target. Due to a way issignal() selects the next signal to deliver and report, if the simultaneous or already pending another signal exists, that signal might be reported by the next waitpid(2) call. This causes minor annoyance for debuggers, which must be prepared to take any signal as the first event, then filter SIGSTOP later. More importantly, for tools like gcore(1), which attach and then detach without processing events, SIGSTOP might leak to be delivered after PT_DETACH. This results in the process being unintentionally stopped after detach, which is fatal for automatic tools. The solution is to force SIGSTOP to be the first signal reported after the attach. Attach code is modified to set P2_PTRACE_FSTP to indicate that the attaching ritual was not yet finished, and issignal() prefers SIGSTOP in that condition. Also, the thread which handles P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first waitpid(2). All that ensures that SIGSTOP is consumed first. Additionally, if P2_PTRACE_FSTP is still set on detach, which means that waitpid(2) was not called at all, SIGSTOP is removed from the queue, ensuring that the process is resumed on detach. In issignal(), when acting on STOPing signals, remove the signal from queue before suspending. Otherwise parallel attach could result in ptracestop() acting on that STOP as if it was the STOP signal from the attach. Then SIGSTOP from attach leaks again. As a minor refactoring, some bits of the common attach code is moved to new helper proc_set_traced(). Reported by: markj Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7256 Notes: svn path=/head/; revision=303423
* Add PTRACE_VFORK to trace vfork events.John Baldwin2016-07-181-0/+1
| | | | | | | | | | | | | | | | First, PL_FLAG_FORKED events now also set a PL_FLAG_VFORKED flag when the new child was created via vfork() rather than fork(). Second, a new PL_FLAG_VFORK_DONE event can now be enabled via the PTRACE_VFORK event mask. This new stop is reported after the vfork parent resumes due to the child calling exit or exec. Debuggers can use this stop to reinsert breakpoints in the vfork parent process before it resumes. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7045 Notes: svn path=/head/; revision=303001
* Add a mask of optional ptrace() events.John Baldwin2016-07-151-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ptrace() now stores a mask of optional events in p_ptevents. Currently this mask is a single integer, but it can be expanded into an array of integers in the future. Two new ptrace requests can be used to manipulate the event mask: PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK sets the current event mask. The current set of events include: - PTRACE_EXEC: trace calls to execve(). - PTRACE_SCE: trace system call entries. - PTRACE_SCX: trace syscam call exits. - PTRACE_FORK: trace forks and auto-attach to new child processes. - PTRACE_LWP: trace LWP events. The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS. The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for compatibility but now simply toggle corresponding flags in the event mask. While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both modify the event mask and continue the traced process. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7044 Notes: svn path=/head/; revision=302902
* When filt_proc() removes event from the knlist due to the processKonstantin Belousov2016-06-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen: - knote kn_knlist pointer is reset - INFLUX knote is removed from the process knlist. And, there are two consequences: - KN_LIST_UNLOCK() on such knote is nop - there is nothing which would block exit1() from processing past the knlist_destroy() (and knlist_destroy() resets knlist lock pointers). Both consequences result either in leaked process lock, or dereferencing NULL function pointers for locking. Handle this by stopping embedding the process knlist into struct proc. Instead, the knlist is allocated together with struct proc, but marked as autodestroy on the zombie reap, by knlist_detach() function. The knlist is freed when last kevent is removed from the list, in particular, at the zombie reap time if the list is empty. As result, the knlist_remove_inevent() is no longer needed and removed. Other changes: In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events from kn_sfflags for knote registered by kernel to only get NOTE_CHILD notifications. The flags leak resulted in excessive NOTE_EXEC/NOTE_FORK reports. Fix immediate note activation in filt_procattach(). Condition should be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT report for the exiting process. In knote_fork(), do not perform racy check for KN_INFLUX before kq lock is taken. Besides being racy, it did not accounted for notes just added by scan (KN_SCAN). Some minor and incomplete style fixes. Analyzed and tested by: Eric Badger <eric@badgerio.us> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6859 Notes: svn path=/head/; revision=302235
* Update comments for the MD functions managing contexts for newKonstantin Belousov2016-06-161-1/+1
| | | | | | | | | | | | | | | | | | | threads, to make it less confusing and using modern kernel terms. Rename the functions to reflect current use of the functions, instead of the historic KSE conventions: cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads) cpu_set_upcall -> cpu_copy_thread (for forks) cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation) Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) Differential revision: https://reviews.freebsd.org/D6731 Notes: svn path=/head/; revision=301961
* Introduce the PD_CLOEXEC for pdfork(2).Mariusz Zaborski2016-06-081-2/+6
| | | | | | | Reviewed by: mjg Notes: svn path=/head/; revision=301573