aboutsummaryrefslogtreecommitdiff
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Exploit r288122 to address a cosmetic issue. Pages belonging to eitherAlan Cox2015-10-061-1/+1
| | | | | | | | | | | | | | the kernel or kmem object can't be paged out. Since they can't be paged out, they are never enqueued in a paging queue. Nonetheless, passing PQ_INACTIVE to vm_page_unwire() in kmem_unback() creates the appearance that these pages are being enqueued in the inactive queue. As of r288122, we can avoid giving this false impression by passing PQ_NONE. Submitted by: kmacy Differential Revision: https://reviews.freebsd.org/D1674 Notes: svn path=/head/; revision=288912
* Mark swap_pager_putpages static at its definition. It was alreadyWarner Losh2015-10-051-3/+1
| | | | | | | | | | static at its declaration. Remove needless swapdev_strategy forward declaration. MFC After: 3 days Notes: svn path=/head/; revision=288901
* Reduce the scope of a variable to the only file where it is used.Alan Cox2015-10-033-2/+2
| | | | Notes: svn path=/head/; revision=288627
* As a step towards the elimination of PG_CACHED pages, rework the handlingMark Johnston2015-09-304-15/+28
| | | | | | | | | | | | | | | | | | | | | | | of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to the head of the inactive queue instead of being cached. This affects the implementation of POSIX_FADV_NOREUSE as well, since it works by applying POSIX_FADV_DONTNEED to file ranges after they have been read or written. At that point the corresponding buffers may still be dirty, so the previous implementation would coalesce successive ranges and apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the dirty buffers would eventually be cached. To preserve this behaviour in an efficient manner, this change adds a new buf flag, B_NOREUSE, which causes the pages backing a VMIO buf to be placed at the head of the inactive queue when the buf is released. POSIX_FADV_NOREUSE then works by setting this flag in bufs that underlie the specified range. Reviewed by: alc, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3726 Notes: svn path=/head/; revision=288431
* The conversion of kmem_alloc_attr() from operating on a vm map to a vmemAlan Cox2015-09-261-28/+13
| | | | | | | | | | | | | | | | | | | arena in r254025 introduced a bug in the case when an allocation is only partially successful. Specifically, the vm object lock was not being acquired before freeing the allocated pages. To address this bug, replace the existing code by a call to kmem_unback(). Change the type of a variable in kmem_alloc_attr() so that an allocation of two or more gigabytes won't fail. Replace the error handling code in kmem_back() by a call to kmem_unback(). Reviewed by: kib (an earlier version) MFC after: 1 week Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=288281
* Exploit r288122 to address a cosmetic issue. Since the pages allocatedAlan Cox2015-09-261-1/+1
| | | | | | | | | | | | | | by noobj_alloc() don't belong to a vm object, they can't be paged out. Since they can't be paged out, they are never enqueued in a paging queue. Nonetheless, passing PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages are being enqueued in the inactive queue. As of r288122, we can avoid giving this false impression by passing PQ_NONE. Submitted by: kmacy Differential Revision: https://reviews.freebsd.org/D1674 Notes: svn path=/head/; revision=288274
* Change vm_page_unwire() such that it (1) accepts PQ_NONE as the specifiedAlan Cox2015-09-222-17/+21
| | | | | | | | | | | | | | | | | queue and (2) returns a Boolean indicating whether the page's wire count transitioned to zero. Exploit this change in vfs_vmio_release() to avoid pointlessly enqueueing a page that is about to be freed. (An earlier version of this change was developed by attilio@ and kmacy@. Any errors in this version are my own.) Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=288122
* Correct a non-fatal error in vm_pageout_worker(). vm_pageout_worker()Alan Cox2015-09-201-6/+11
| | | | | | | | | | | | | | | | | | | should not assume that vm_pages_needed will remain set while it sleeps. Other threads can clear vm_pages_needed by performing a sufficient number of vm_page_free() calls, e.g., process termination. The effect of this error was that vm_pageout_worker() would free and/or launder pages when, in fact, there was no shortage of free pages. Rewrite a nearby comment to describe all of the possible cases and not just the most common case. The problem being that the comment made the most common case seem like the only case. Reviewed by: kib MFC after: 1 week Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=288025
* Eliminate (many) unnecessary calls to pmap_remove_all(). Pages from objectsAlan Cox2015-09-171-1/+2
| | | | | | | | | | | | with a reference count of zero can't possibly be mapped, so there is never a need for vm_page_set_invalid() to call pmap_remove_all() on them. Reviewed by: kib MFC after: 1 week Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=287944
* Remove the v_cache_min and v_cache_max sysctls. They are unused and haveMark Johnston2015-09-113-9/+2
| | | | | | | | | | no effect. Reviewed by: alc Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=287640
* Remove a check which caused spurious SIGSEGV on usermode access to theKonstantin Belousov2015-09-091-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mapped address without valid pte installed, when parallel wiring of the entry happen. The entry must be copy on write. If entry is COW but was already copied, and parallel wiring set MAP_ENTRY_IN_TRANSITION, vm_fault() would sleep waiting for the MAP_ENTRY_IN_TRANSITION flag to clear. After that, the fault handler is restarted and vm_map_lookup() or vm_map_lookup_locked() trip over the check. Note that this is race, if the address is accessed after the wiring is done, the entry does not fault at all. There is no reason in the current kernel to disallow write access to the COW wired entry if the entry permissions allow it. Initially this was done in r24666, since that kernel did not supported proper copy-on-write for wired text, which was fixed in r199869. The r251901 revision re-introduced the r24666 fix for the current VM. Note that write access must clear MAP_ENTRY_NEEDS_COPY entry flag by performing COW. In reverse, when MAP_ENTRY_NEEDS_COPY is set in vmspace_fork(), the MAP_ENTRY_USER_WIRED flag is cleared. Put the assert stating the invariant, instead of returning the error. Reported and debugging help by: peter Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=287591
* The swap pager is compatible with direct dispatch. It does its ownWarner Losh2015-09-081-11/+42
| | | | | | | | | | | | | | | | | | | | locking and doesn't sleep. Flag the consumer we create as such. In addition, decrement the in flight index when we have an out of memory error after having incremented it previously. This would have prevented swapoff from working if the swap pager ever hit a resource shortage trying to swap out something (the swap in path always waits for a bio, so won't have this issue). Simplify the close logic by abandoning the use of private and initializing the index to 1 and dropping that reference when we previously set private. Also, set sw_id only while sw_dev_mtx is held. This should only affect swapping to a vnode, as opposed to a geom whose close always sets it to NULL with sw_dev_mtx held. Differential Review: https://reviews.freebsd.org/D3547 Notes: svn path=/head/; revision=287567
* To simplify upcoming changes to the inactive queue scan, change the codeAlan Cox2015-09-081-18/+8
| | | | | | | | | | | so that there is only one place where pages are freed and only one place where pages are moved to the tail of the queue. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=287550
* Eliminate pointless requeueing of pages from terminated objects. TheseAlan Cox2015-09-051-23/+25
| | | | | | | | | | | | | | | pages will have left the inactive queue before the page daemon performs its next scan. Also, ignore references to pages from terminated objects. This allows the clean pages to be freed a little sooner. Move some comments to their proper place, i.e., next to the code that they describe, and update other nearby comments. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=287488
* Don't trash memory from UMA_ZONE_NOFREE zones.Mateusz Guzik2015-09-021-2/+2
| | | | | | | | | | | | Objects obtained from such zones are supposed to retain type stability, which was violated by aforementioned trashing. This is a follow-up to r284861. Discussed with: kib Notes: svn path=/head/; revision=287415
* Handle held pages earlier in the inactive queue scan.Alan Cox2015-09-011-31/+33
| | | | | | | | Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=287344
* Remove weighted page handling from vm_page_advise().Mark Johnston2015-08-281-54/+25
| | | | | | | | | | | | | | | | | | | | | This was added in r51337 as part of the implementation of madvise(MADV_DONTNEED). Its objective was to ensure that the page daemon would eventually reclaim other unreferenced pages (i.e., unreferenced pages not touched by madvise()) from the active queue. Now that the pagedaemon performs steady scanning of the active page queue, this weighted handling is unnecessary. Instead, always "cache" clean pages by moving them to the head of the inactive page queue. This simplifies the implementation of vm_page_advise() and eliminates the fragmentation that resulted from the distribution of pages among multiple queues. Suggested by: alc Reviewed by: alc Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3401 Notes: svn path=/head/; revision=287235
* In vm_pageout_scan(), simplify the logic for determining if a page can beAlan Cox2015-08-271-15/+15
| | | | | | | | | | | paged out and apply some nearby style fixes. In collaboration with: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation, EMC / Isilon Storage Division Notes: svn path=/head/; revision=287219
* Testing whether a page is dirty does not require the page lock. Moreover,Alan Cox2015-08-251-6/+10
| | | | | | | | | | | it may involve a pmap operation that iterates over the page's PV list, so unnecessarily holding the page lock is undesirable. MFC after: 1 week Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=287121
* Make the UMA harvesting go away completely if not wanted. Default to "not ↵Mark Murray2015-08-221-4/+4
| | | | | | | | | | | | | | | | | | | wanted". Provide and document the RANDOM_ENABLE_UMA option. Change RANDOM_FAST to RANDOM_UMA to clarify the harvesting. Remove RANDOM_DEBUG option, replace with SDT probes. These will be of use to folks measuring the harvesting effect when deciding whether to use RANDOM_ENABLE_UMA. Requested by: scottl and others. Approved by: so (/dev/random blanket) Differential Revision: https://reviews.freebsd.org/D3197 Notes: svn path=/head/; revision=287023
* Eliminate pointless assignments to rtvals[] in swap_pager_putpages().Alan Cox2015-08-211-12/+12
| | | | | | | | Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=287002
* Prevent ticks rollover from preventing vm_lowmem eventRyan Stone2015-08-201-3/+4
| | | | | | | | | | | | | | | | | | | Currently vm_pageout_scan() uses a ticks-based scheme to rate-limit the number of times that the vm_lowmem event will happen. However if no events happen for long enough for ticks to roll over, this leaves us in a long window in which vm_lowmem events will not happen. Replace the use of ticks with time_t to prevent rollover from ever being an issue. Reviewed by: ian MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3439 Notes: svn path=/head/; revision=286970
* Add the kernel support for minidumps on arm64.Andrew Turner2015-08-201-7/+7
| | | | | | | | | Obtained from: ABT Systems Ltd Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3318 Notes: svn path=/head/; revision=286958
* As another piece of PG_CACHE page elimination, remove an LRU-defeating callAlan Cox2015-08-161-5/+0
| | | | | | | | | | | | | to vm_page_try_to_cache() from vm_pageout_flush(). Other changes, most recently r286814, have made this call unnecessary. Reviewed by: kib Discussed with: jeff Tested by: pho Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=286828
* Make kstack_pages a tunable on arm, x86, and powepc. On i386, theKonstantin Belousov2015-08-101-5/+5
| | | | | | | | | | | | | | | | | | | | | | initial thread stack is not adjusted by the tunable, the stack is allocated too early to get access to the kernel environment. See TD0_KSTACK_PAGES for the thread0 stack sizing on i386. The tunable was tested on x86 only. From the visual inspection, it seems that it might work on arm and powerpc. The arm USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already incorrect for the threads with non-default kstack size. I only changed the macros to use variable instead of constant, since I cannot test. On arm64, mips and sparc64, some static data structures are sized by KSTACK_PAGES, so the tunable is disabled. Sponsored by: The FreeBSD Foundation MFC after: 2 week Notes: svn path=/head/; revision=286584
* Avoid sign extension of value passed to kva_alloc from uma_zone_reserve_kvaZbigniew Bodek2015-08-101-2/+2
| | | | | | | | | | | | | | | | | | | | | Fixes "panic: vm_radix_reserve_kva: unable to reserve KVA" caused by sign extention of "pages * UMA_SLAB_SIZE" value passed to kva_alloc() which takes unsigned long argument. In the erroneus case that triggered this bug, the number of pages to allocate in uma_zone_reserve_kva() was 0x8ebe6, that gave the total number of bytes to allocate equal to 0x8ebe6000 (int). This was then sign extended in kva_alloc() to 0xffffffff8ebe6000 (unsigned long). Reviewed by: alc, kib Submitted by: Zbigniew Bodek <zbb@semihalf.com> Obtained from: Semihalf Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3346 Notes: svn path=/head/; revision=286583
* Introduce a sysctl for reporting the number of fully populated reservations.Alan Cox2015-08-061-0/+32
| | | | Notes: svn path=/head/; revision=286390
* Properly sort the function declarations added in r286296Jason A. Harmening2015-08-051-2/+2
| | | | | | | | Submitted by: alc Approved by: kib (mentor) Notes: svn path=/head/; revision=286313
* Add two new pmap functions:Jason A. Harmening2015-08-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | vm_offset_t pmap_quick_enter_page(vm_page_t m) void pmap_quick_remove_page(vm_offset_t kva) These will create and destroy a temporary, CPU-local KVA mapping of a specified page. Guarantees: --Will not sleep and will not fail. --Safe to call under a non-sleepable lock or from an ithread Restrictions: --Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms --Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page. --MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm. NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing. Reviewed by: kib Approved by: kib (mentor) Differential Revision: http://reviews.freebsd.org/D3013 Notes: svn path=/head/; revision=286296
* Refinements to r281079's sequential access optimization: Prefetched pages,Alan Cox2015-08-031-2/+12
| | | | | | | | | | | | | | | | | | | which constitute the majority of the pages that are processed by vm_fault_dontneed(), are already near the tail of the inactive queue. Only the pages at faulting virtual addresses are actually moved by vm_page_advise(..., MADV_DONTNEED). However, vm_page_advise(..., MADV_DONTNEED) is simultaneously too aggressive and passive for the moved pages. It makes most of these pages too easily reclaimable, and at the same time it leaves enough pages in the active queue to trigger pageouts by the page daemon. Instead, with this change, the pages at faulting virtual addresses are moved to the tail of the inactive queue, where they are relatively close to the pages prefetched by the same page fault. Discussed with: jeff Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=286255
* Do not pretend that vm_fault(9) supports unwiring the address. RenameKonstantin Belousov2015-07-303-26/+21
| | | | | | | | | | | | | | | | | the VM_FAULT_CHANGE_WIRING flag to VM_FAULT_WIRE. Assert that the flag is only passed when faulting on the wired map entry. Remove the vm_page_unwire() call, which should be never reachable. Since VM_FAULT_WIRE flag implies wired map entry, the TRYPAGER() macro is reduced to the testing of the fs.object having a default pager. Inline the check. Suggested and reviewed by: alc Tested by: pho (previous version) MFC after: 1 week Notes: svn path=/head/; revision=286086
* - Make 'struct buf *buf' private to vfs_bio.c. Having a global variableJeff Roberson2015-07-291-0/+2
| | | | | | | | | | | | | | | | | 'buf' is inconvenient and has lead me to some irritating to discover bugs over the years. It also makes it more challenging to refactor the buf allocation system. - Move swbuf and declare it as an extern in vfs_bio.c. This is still not perfect but better than it was before. - Eliminate the unused ffs function that relied on knowledge of the buf array. - Move the shutdown code that iterates over the buf array into vfs_bio.c. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=285993
* Revert r173708's modifications to vm_object_page_remove().Konstantin Belousov2015-07-252-24/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assume that a vnode is mapped shared and mlocked(), and then the vnode is truncated, or truncated and then again extended past the mapping point EOF. Truncation removes the pages past the truncation point, and if pages are later created at this range, they are not properly mapped into the mlocked region, and their wiring count is wrong. The revert leaves the invalidated but wired pages on the object queue, which means that the pages are found by vm_object_unwire() when the mapped range is munlock()ed, and reused by the buffer cache when the vnode is extended again. The changes in r173708 were required since then vm_map_unwire() looked at the page tables to find the page to unwire. This is no longer needed with the vm_object_unwire() introduction, which follows the objects shadow chain. Also eliminate OBJPR_NOTWIRED flag for vm_object_page_remove(), which is now redundand, we do not remove wired pages. Reported by: trasz, Dmitry Sivachenko <trtrmitya@gmail.com> Suggested and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=285878
* Refactor unmapped buffer address handling.Jeff Roberson2015-07-233-29/+18
| | | | | | | | | | | | | | | | | | | | - Use pointer assignment rather than a combination of pointers and flags to switch buffers between unmapped and mapped. This eliminates multiple flags and generally simplifies the logic. - Eliminate b_saveaddr since it is only used with pager bufs which have their b_data re-initialized on each allocation. - Gather up some convenience routines in the buffer cache for manipulating buf space and buf malloc space. - Add an inline, buf_mapped(), to standardize checks around unmapped buffers. In collaboration with: mlaier Reviewed by: kib Tested by: pho (many small revisions ago) Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=285819
* Add an initial NUMA affinity/policy configuration for threads and processes.Adrian Chadd2015-07-114-11/+596
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is based on work done by jeff@ and jhb@, as well as the numa.diff patch that has been circulating when someone asks for first-touch NUMA on -10 or -11. * Introduce a simple set of VM policy and iterator types. * tie the policy types into the vm_phys path for now, mirroring how the initial first-touch allocation work was enabled. * add syscalls to control changing thread and process defaults. * add a global NUMA VM domain policy. * implement a simple cascade policy order - if a thread policy exists, use it; if a process policy exists, use it; use the default policy. * processes inherit policies from their parent processes, threads inherit policies from their parent threads. * add a simple tool (numactl) to query and modify default thread/process policities. * add documentation for the new syscalls, for numa and for numactl. * re-enable first touch NUMA again by default, as now policies can be set in a variety of methods. This is only relevant for very specific workloads. This doesn't pretend to be a final NUMA solution. The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by 'sysctl vm.default_policy=rr'. This is only relevant if MAXMEMDOM is set to something other than 1. Ie, if you're using GENERIC or a modified kernel with non-NUMA, then this is a glorified no-op for you. Thank you to Norse Corp for giving me access to rather large (for FreeBSD!) NUMA machines in order to develop and verify this. Thank you to Dell for providing me with dual socket sandybridge and westmere v3 hardware to do NUMA development with. Thank you to Scott Long at Netflix for providing me with access to the two-socket, four-domain haswell v3 hardware. Thank you to Peter Holm for running the stress testing suite against the NUMA branch during various stages of development! Tested: * MIPS (regression testing; non-NUMA) * i386 (regression testing; non-NUMA GENERIC) * amd64 (regression testing; non-NUMA GENERIC) * westmere, 2 socket (thankyou norse!) * sandy bridge, 2 socket (thankyou dell!) * ivy bridge, 2 socket (thankyou norse!) * westmere-EX, 4 socket / 1TB RAM (thankyou norse!) * haswell, 2 socket (thankyou norse!) * haswell v3, 2 socket (thankyou dell) * haswell v3, 2x18 core (thankyou scott long / netflix!) * Peter Holm ran a stress test suite on this work and found one issue, but has not been able to verify it (it doesn't look NUMA related, and he only saw it once over many testing runs.) * I've tested bhyve instances running in fixed NUMA domains and cpusets; all seems to work correctly. Verified: * intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different NUMA policies for processes under test. Review: This was reviewed through phabricator (https://reviews.freebsd.org/D2559) as well as privately and via emails to freebsd-arch@. The git history with specific attributes is available at https://github.com/erikarn/freebsd/ in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy). This has been reviewed by a number of people (stas, rpaulo, kib, ngie, wblock) but not achieved a clear consensus. My hope is that with further exposure and testing more functionality can be implemented and evaluated. Notes: * The VM doesn't handle unbalanced domains very well, and if you have an overly unbalanced memory setup whilst under high memory pressure, VM page allocation may fail leading to a kernel panic. This was a problem in the past, but it's much more easily triggered now with these tools. * This work only controls the path through vm_phys; it doesn't yet strongly/predictably affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory isn't really guaranteed in any way. That's next on my plate. Sponsored by: Norse Corp, Inc.; Dell Notes: svn path=/head/; revision=285387
* The intention of r254304 was to scan the active queue continuously.Alan Cox2015-07-082-15/+20
| | | | | | | | | | | | | | | | | | | | | However, I've observed the active queue scan stopping when there are frequent free page shortages and the inactive queue is steadily refilled by other mechanisms, such as the sequential access heuristic in vm_fault() or madvise(2). To remedy this problem, record the time of the last active queue scan, and always scan a number of pages proportional to the time since the last scan, regardless of whether that last scan was a timeout-triggered ("pass == 0") or free-page-shortage-triggered ("pass > 0") scan. Also, on a timeout-triggered scan, allow a full scan of the active queue when the system is short of inactive pages. Reviewed by: kib MFC after: 6 weeks Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=285282
* Add a local variable initialization needed in the OBJT_DEFAULT case.Mark Johnston2015-07-051-0/+1
| | | | | | | | Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D2992 Notes: svn path=/head/; revision=285180
* vm: don't lock proc around accesses to vm_{t,d}addr and RLIMIT_DATA in sys_mmapMateusz Guzik2015-07-021-4/+2
| | | | | | | vm_{t,d}addr are constant and we can use thread's copy of resource limits Notes: svn path=/head/; revision=285054
* Account for the main process stack being one page below the highestKonstantin Belousov2015-07-021-1/+2
| | | | | | | | | | | | | | | | user address when ABI uses shared page. Note that the change is no-op for correctness, since shared page does not fault. The mapping for the shared page is installed at the address space creation, the page is unmanaged and its pte/pv entry cannot be reclaimed. Submitted by: Oliver Pinter Review: https://reviews.freebsd.org/D2954 MFC after: 1 week Notes: svn path=/head/; revision=285046
* Huge cleanup of random(4) code.Mark Murray2015-06-301-32/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * GENERAL - Update copyright. - Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set neither to ON, which means we want Fortuna - If there is no 'device random' in the kernel, there will be NO random(4) device in the kernel, and the KERN_ARND sysctl will return nothing. With RANDOM_DUMMY there will be a random(4) that always blocks. - Repair kern.arandom (KERN_ARND sysctl). The old version went through arc4random(9) and was a bit weird. - Adjust arc4random stirring a bit - the existing code looks a little suspect. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Redo read_random(9) so as to duplicate random(4)'s read internals. This makes it a first-class citizen rather than a hack. - Move stuff out of locked regions when it does not need to be there. - Trim RANDOM_DEBUG printfs. Some are excess to requirement, some behind boot verbose. - Use SYSINIT to sequence the startup. - Fix init/deinit sysctl stuff. - Make relevant sysctls also tunables. - Add different harvesting "styles" to allow for different requirements (direct, queue, fast). - Add harvesting of FFS atime events. This needs to be checked for weighing down the FS code. - Add harvesting of slab allocator events. This needs to be checked for weighing down the allocator code. - Fix the random(9) manpage. - Loadable modules are not present for now. These will be re-engineered when the dust settles. - Use macros for locks. - Fix comments. * src/share/man/... - Update the man pages. * src/etc/... - The startup/shutdown work is done in D2924. * src/UPDATING - Add UPDATING announcement. * src/sys/dev/random/build.sh - Add copyright. - Add libz for unit tests. * src/sys/dev/random/dummy.c - Remove; no longer needed. Functionality incorporated into randomdev.*. * live_entropy_sources.c live_entropy_sources.h - Remove; content moved. - move content to randomdev.[ch] and optimise. * src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h - Remove; plugability is no longer used. Compile-time algorithm selection is the way to go. * src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h - Add early (re)boot-time randomness caching. * src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h - Remove; no longer needed. * src/sys/dev/random/uint128.h - Provide a fake uint128_t; if a real one ever arrived, we can use that instead. All that is needed here is N=0, N++, N==0, and some localised trickery is used to manufacture a 128-bit 0ULLL. * src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h - Improve unit tests; previously the testing human needed clairvoyance; now the test will do a basic check of compressibility. Clairvoyant talent is still a good idea. - This is still a long way off a proper unit test. * src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h - Improve messy union to just uint128_t. - Remove unneeded 'static struct fortuna_start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) * src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h - Improve messy union to just uint128_t. - Remove unneeded 'staic struct start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) - Fix some magic numbers elsewhere used as FAST and SLOW. Differential Revision: https://reviews.freebsd.org/D2025 Reviewed by: vsevolod,delphij,rwatson,trasz,jmg Approved by: so (delphij) Notes: svn path=/head/; revision=284959
* If INVARIANTS is specified, add ctor/dtor to junk memory if they areJohn-Mark Gurney2015-06-252-0/+19
| | | | | | | | | | unspecified... Submitted by: Suresh Gumpula at Netapp Differential Revision: https://reviews.freebsd.org/D2725 Notes: svn path=/head/; revision=284861
* Avoid pmap_is_modified() on pages that can't be mapped.Alan Cox2015-06-211-3/+5
| | | | | | | | MFC after: 1 week Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=284654
* o Un-inline vm_pager_get_pages(), vm_pager_get_pages_async().Gleb Smirnoff2015-06-173-42/+74
| | | | | | | | | | | o Provide an extensive set of assertions for input array of pages. o Remove now duplicate assertions from different pagers. Sponsored by: Nginx, Inc. Sponsored by: Netflix Notes: svn path=/head/; revision=284529
* Invalid pages do not need neither update of the activation count norKonstantin Belousov2015-06-141-15/+15
| | | | | | | | | | | | | | | they coould be dirty. Move the handling if the invalid pages in the inactive scan earlier. Remove some code duplication in the scan by introducing the 'drop_page' label, which centralizes the object and the page unlock. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=284387
* As the next step in eliminating PG_CACHE pages, free rather than cacheAlan Cox2015-06-141-3/+3
| | | | | | | | | | | | | | | | pages in vm_pageout_scan(). The reactivation rate of cache pages created by vm_pageout_scan() is extremely low; typically no more than 0.5% to 2.25% of the pages are ever reactivated. At the same time, caching pages is more expensive than freeing them. For example, in a test with PostgreSQL, this change reduced the amount of time spent in the inactive queue scan by 1/6. Differential Revision: https://reviews.freebsd.org/D2805 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=284376
* Make KPI of vm_pager_get_pages() more strict: if a pager changes a pageGleb Smirnoff2015-06-123-25/+13
| | | | | | | | | | | | | | in the requested array, then it is responsible for disposition of previous page and is responsible for updating the entry in the requested array. Now consumers of KPI do not need to re-lookup the pages after call to vm_pager_get_pages(). Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=284310
* Implement lockless resource limits.Mateusz Guzik2015-06-105-24/+16
| | | | | | | | | | | | | Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits. Notes: svn path=/head/; revision=284215
* Correct a type error in kmem_unback(). Previously, kmem_unback() did notAlan Cox2015-06-101-2/+1
| | | | | | | | | | | | correctly handle deallocation requests of two or more gigabytes in size. Eventually, this would lead to a panic elsewhere in the kernel, such as "vm_radix_insert: key <vm_pindex_t> is already present". Reported by: Ilias Marinos MFC after: 1 week Notes: svn path=/head/; revision=284207
* Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages.Alan Cox2015-06-082-4/+0
| | | | | | | | | Differential Revision: https://reviews.freebsd.org/D2712 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=284147
* Add a new file operations hook for mmap operations. File type-specificJohn Baldwin2015-06-042-228/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | logic is now placed in the mmap hook implementation rather than requiring it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to support mmap() as well as potentially allowing mmap() for existing file types that do not currently support any mapping. The vm_mmap() function is now split up into two functions. A new vm_mmap_object() function handles the "back half" of vm_mmap() and accepts a referenced VM object to map rather than a (handle, handle_type) tuple. vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a a VM object and then calling vm_mmap_object() to handle the actual mapping. The vm_mmap() function remains for use by other parts of the kernel (e.g. device drivers and exec) but now only supports mapping vnodes, character devices, and anonymous memory. The mmap() system call invokes vm_mmap_object() directly with a NULL object for anonymous mappings. For mappings using a file descriptor, the descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is responsible for performing type-specific checks and adjustments to arguments as well as possibly modifying mapping parameters such as flags or the object offset. The fo_mmap() hook routines then call vm_mmap_object() to handle the actual mapping. The fo_mmap() hook is optional. If it is not set, then fo_mmap() will fail with ENODEV. A fo_mmap() hook is implemented for regular files, character devices, and shared memory objects (created via shm_open()). While here, consistently use the VM_PROT_* constants for the vm_prot_t type for the 'prot' variable passed to vm_mmap() and vm_mmap_object() as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines. Previously some places were using the mmap()-specific PROT_* constants instead. While this happens to work because PROT_xx == VM_PROT_xx, using VM_PROT_* is more correct. Differential Revision: https://reviews.freebsd.org/D2658 Reviewed by: alc (glanced over), kib MFC after: 1 month Sponsored by: Chelsio Notes: svn path=/head/; revision=283998