| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
Notes:
svn path=/head/; revision=335068
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Per-cpu zone allocations are very rarely done compared to regular zones.
The intent is to avoid pessimizing the latter case with per-cpu specific
code.
In particular contrary to the claim in r334824, M_ZERO is sometimes being
used for such zones. But the zeroing method is completely different and
braching on it in the fast path for regular zones is a waste of time.
Notes:
svn path=/head/; revision=334858
|
|
|
|
|
|
|
|
|
| |
Turns out there is code which ends up passing M_ZERO to counters.
Since counters zero unconditionally on their own, just ignore drop the
flag in that place.
Notes:
svn path=/head/; revision=334830
|
|
|
|
|
|
|
|
|
|
|
| |
Nothing in the tree uses it and pcpu zones have a fundamentally different use
case than the regular zones - they are not supposed to be allocated and freed
all the time.
This reduces pollution in the allocation fast path.
Notes:
svn path=/head/; revision=334824
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
trashing freed memory and checking that allocated memory is properly
trashed, and also of keeping a bitset of freed items. Trashing/checking
creates a lot of CPU cache poisoning, while keeping debugging bitsets
consistent creates a lot of contention on UMA zone lock(s). The performance
difference between INVARIANTS kernel and normal one is mostly attributed
to UMA debugging, rather than to all KASSERT checks in the kernel.
Add loader tunable vm.debug.divisor that allows either to turn off UMA
debugging completely, or turn it on only for a fraction of allocations,
while still running all KASSERTs in kernel. That allows to run INVARIANTS
kernels in production environments without reducing load by orders of
magnitude, but still doing useful extra checks.
Default value is 1, meaning debug every allocation. Value of 0 would
disable UMA debugging completely. Values above 1 enable debugging only
for every N-th item. It isn't possible to strictly follow the number,
but still amount of debugging is reduced roughly by (N-1)/N percent.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15199
Notes:
svn path=/head/; revision=334819
|
|
|
|
|
|
|
|
|
|
|
| |
we need to set the domain bit from the vm_severe_domains bitset (instead
of clearing it).
Reviewed by: jeff, markj
Sponsored by: Netflix, Inc.
Notes:
svn path=/head/; revision=334783
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, libc.so would initialize its notion of the break address
using _end, a special symbol emitted by the static linker following
the bss section. Compatibility issues between lld and ld.bfd could
cause the wrong definition of _end (libc.so's definition rather than
that of the executable) to be used, breaking the brk()/sbrk()
interface.
Avoid this problem and future interoperability issues by simply not
relying on _end. Instead, modify the break() system call to return
the kernel's view of the current break address, and have libc
initialize its state using an extra syscall upon the first use of the
interface. As a side effect, this appears to fix brk()/sbrk() usage
in executables run with rtld direct exec, since the kernel and libc.so
no longer maintain separate views of the process' break address.
PR: 228574
Reviewed by: kib (previous version)
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D15663
Notes:
svn path=/head/; revision=334626
|
|
|
|
|
|
|
| |
Reported by: alc
Notes:
svn path=/head/; revision=334622
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vm_map_madvise(). Previously, vm_map_madvise() used a traditional Unix-
style "return (0);" to indicate success in the common case, but Mach-
style return values in the edge cases. Since KERN_SUCCESS equals zero,
the only problem with this inconsistency was stylistic. vm_map_madvise()
has exactly two callers in the entire source tree, and only one of them
cares about the return value. That caller, kern_madvise(), can be
simplified if vm_map_madvise() consistently uses Unix-style return
values.
Since vm_map_madvise() uses the variable modify_map as a Boolean, make it
one.
Eliminate a redundant error check from kern_madvise(). Add a comment
explaining where the check is performed.
Explicitly note that exec_release_args_kva() doesn't care about
vm_map_madvise()'s return value. Since MADV_FREE is passed as the
behavior, the return value will always be zero.
Reviewed by: kib, markj
MFC after: 7 days
Notes:
svn path=/head/; revision=334621
|
|
|
|
|
|
|
| |
Suggested by: mjg
Notes:
svn path=/head/; revision=334618
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It serves little purpose after r308474 and r329882. As a side
effect, the removal fixes a bug in r329882 which caused the
page daemon to periodically invoke lowmem handlers even in the
absence of memory pressure.
Reviewed by: jeff
Differential Revision: https://reviews.freebsd.org/D15491
Notes:
svn path=/head/; revision=334508
|
|
|
|
|
|
|
|
|
| |
Reported by: mmacy
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Notes:
svn path=/head/; revision=334507
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the flag MAP_GUARD. Rather than enumerating the flags that are not
allowed, enumerate the flags that are allowed. The list of allowed flags
is much shorter and less likely to change. (As an aside, one of the
previously enumerated flags, MAP_PREFAULT, was not even a legal flag for
mmap(2). However, because of an earlier check within kern_mmap(), this
misuse of MAP_PREFAULT was harmless.)
Reviewed by: kib
MFC after: 10 days
Notes:
svn path=/head/; revision=334499
|
|
|
|
|
|
|
|
|
| |
PR: 228533
Submitted by: Jakub Piecuch <j.piecuch96@gmail.com>
MFC after: 1 week
Notes:
svn path=/head/; revision=334389
|
|
|
|
|
|
|
|
|
|
|
| |
we must use vm_page_xunbusy_maybelocked() rather than vm_page_xunbusy() to
unbusy the page.
Reviewed by: kib
X-MFC with: r334233
Notes:
svn path=/head/; revision=334287
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
that the map entry is wired if the caller passes the flag VM_FAULT_WIRE.
Eliminate the same assertion, but spelled differently, at the end of
vm_fault_hold() and vm_fault_populate(). Repeat the assertion only if the
map is unlocked and the map lookup must be repeated.
Reviewed by: kib
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D15582
Notes:
svn path=/head/; revision=334274
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
superpage mappings were already being created by automatic promotion in
vm_fault_populate(), this change reduces the cost of creating those
mappings. Essentially, one pmap_enter(..., psind=1) call takes the place
of 512 pmap_enter(..., psind=0) calls, and that one pmap_enter(...,
psind=1) call eliminates the allocation of a page table page.
Reviewed by: kib
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D15572
Notes:
svn path=/head/; revision=334233
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vadvise syscall (aka ovadvise) is undocumented and has always been
implmented as returning EINVAL. Put the syscall under COMPAT11 and
provide a userspace implementation.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15557
Notes:
svn path=/head/; revision=334223
|
|
|
|
|
|
|
|
| |
Reviewed by: kib
MFC after: 10 days
Notes:
svn path=/head/; revision=334180
|
|
|
|
|
|
|
|
|
| |
An old revision was committed by accident.
Differential Revision: https://reviews.freebsd.org/D15490
Notes:
svn path=/head/; revision=334179
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should have been done when they were removed from libc, but was
overlooked in the runup to 11.0. No users should exist.
Approved by: andrew
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15539
Notes:
svn path=/head/; revision=334168
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scans are largely independent, so this helps make the code
marginally neater, and makes it easier to incorporate feedback from the
active queue scan into the page daemon control loop.
Improve some comments while here. No functional change intended.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D15490
Notes:
svn path=/head/; revision=334154
|
|
|
|
|
|
|
|
|
|
| |
While here, remove a superfluous comment.
Coverity CID: 1383559
MFC after: 3 days
Notes:
svn path=/head/; revision=334057
|
|
|
|
| |
Notes:
svn path=/head/; revision=333903
|
|
|
|
|
|
|
|
|
|
|
|
| |
Such pages are dequeued as they're encountered during the inactive queue
scan, so by the time we get to the active queue scan, they should have
already been subtracted from the inactive queue length.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D15479
Notes:
svn path=/head/; revision=333799
|
|
|
|
|
|
|
|
|
|
|
|
| |
The value of m->queue must be cached after comparing it with PQ_NONE,
since it may be concurrently changing.
Reported by: glebius
Reviewed by: jeff
Differential Revision: https://reviews.freebsd.org/D15462
Notes:
svn path=/head/; revision=333703
|
|
|
|
|
|
|
|
|
|
|
| |
vm_object_reserve() == true is impossible on power. Make conditional
on VM_LEVEL_0_ORDER being defined.
Reviewed by: jeff
Approved by: sbruno
Notes:
svn path=/head/; revision=333701
|
|
|
|
|
|
|
|
|
|
|
| |
vm_page_queue(), added in r333256, generalizes vm_pageout_page_queued(),
so use it instead. No functional change intended.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D15402
Notes:
svn path=/head/; revision=333581
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current UMA internals are not suited for efficient operation in
multi-socket environments. In particular there is very common use of
MAXCPU arrays and other fields which are not always properly aligned and
are not local for target threads (apart from the first node of course).
Turns out the existing UMA_ALIGN macro can be used to mostly work around
the problem until the code get fixed. The current setting of 64 bytes
runs into trouble when adjacent cache line prefetcher gets to work.
An example 128-way benchmark doing a lot of malloc/frees has the following
instruction samples:
before:
kernel`lf_advlockasync+0x43b 32940
kernel`malloc+0xe5 42380
kernel`bzero+0x19 47798
kernel`spinlock_exit+0x26 60423
kernel`0xffffffff80 78238
0x0 136947
kernel`uma_zfree_arg+0x46 159594
kernel`uma_zalloc_arg+0x672 180556
kernel`uma_zfree_arg+0x2a 459923
kernel`uma_zalloc_arg+0x5ec 489910
after:
kernel`bzero+0xd 46115
kernel`lf_advlockasync+0x25f 46134
kernel`lf_advlockasync+0x38a 49078
kernel`fget_unlocked+0xd1 49942
kernel`lf_advlockasync+0x43b 55392
kernel`copyin+0x4a 56963
kernel`bzero+0x19 81983
kernel`spinlock_exit+0x26 91889
kernel`0xffffffff80 136357
0x0 239424
See the review for more details.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D15346
Notes:
svn path=/head/; revision=333484
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With r332974, when performing a synchronized access of a page's "queue"
field, one must first check whether the page is logically dequeued. If
so, then the page lock does not prevent the page from being removed
from its page queue. Intoduce vm_page_queue(), which returns the page's
logical queue index. In some cases, direct access to the "queue" field
is still required, but such accesses should be confined to sys/vm.
Reported and tested by: pho
Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15280
Notes:
svn path=/head/; revision=333256
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the vm_fault_prefault() call from vm_fault_soft_fast(), extend the
scope of the object rlock to avoid re-taking it inside
vm_fault_prefault(). It causes pmap_enter_quick() sometimes called
with shadow object lock as well as the page lock, but this looks
innocent.
Noted and measured by: mjg
Reviewed by: alc, markj (as part of the larger patch)
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15122
Notes:
svn path=/head/; revision=333091
|
|
|
|
|
|
|
|
|
|
|
| |
Cached counters are typically zero at this point so it performs
avoidable atomics. Everything reading them also reads the cached
ones, thus there is really no point.
Reviewed by: jeff
Notes:
svn path=/head/; revision=333052
|
|
|
|
|
|
|
|
|
| |
While here whack unused locking keys for the struct.
Discussed with: jeff
Notes:
svn path=/head/; revision=333051
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently both the page lock and a page queue lock must be held in
order to enqueue, dequeue or requeue a page in a given page queue.
The queue locks are a scalability bottleneck in many workloads. This
change reduces page queue lock contention by batching queue operations.
To detangle the page and page queue locks, per-CPU batch queues are
used to reference pages with pending queue operations. The requested
operation is encoded in the page's aflags field with the page lock
held, after which the page is enqueued for a deferred batch operation.
Page queue scans are similarly optimized to minimize the amount of
work performed with a page queue lock held.
Reviewed by: kib, jeff (previous versions)
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14893
Notes:
svn path=/head/; revision=332974
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows the creation of zones which don't do any caching in front of
the keg. If the zone is a cache zone, this means that UMA will not
attempt any memory allocations when allocating an item from the backend.
This is intended for use after a panic by netdump, but likely has other
applications.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15184
Notes:
svn path=/head/; revision=332968
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They were previously initialized by the corresponding page daemon
threads, but for vmd_inacthead this may be too late if
vm_page_deactivate_noreuse() is called during boot.
Reported and tested by: cperciva
Reviewed by: alc, kib
MFC after: 1 week
Notes:
svn path=/head/; revision=332771
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pages allocated from a given reservation may belong to different
objects. It is therefore possible for vm_page_ps_test() to be called
with the base page's object unlocked. Check for this case before
asserting that the object lock is held.
Reported by: jhb
Reviewed by: kib
MFC after: 1 week
Notes:
svn path=/head/; revision=332658
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SKZ63 Processor May Hang When Executing Code In an HLE Transaction
Region
Problem: Under certain conditions, if the processor acquires an HLE
(Hardware Lock Elision) lock via the XACQUIRE instruction in the Host
Physical Address range between 40000000H and 403FFFFFH, it may hang
with an internal timeout error (MCACOD 0400H) logged into
IA32_MCi_STATUS.
Move the pages from the range into the blacklist. Add a tunable to
not waste 4M if local DoS is not the issue.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15001
Notes:
svn path=/head/; revision=332182
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
Notes:
svn path=/head/; revision=332122
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The division added in r331732 meant that we wouldn't attempt a
background laundering until at least v_free_target - v_free_min clean
pages had been freed by the page daemon since the last laundering. If
the inactive queue is depleted but not completely empty (e.g., because
it contains busy pages), it can thus take a long time to meet this
threshold. Restore the pre-r331732 behaviour of using a non-zero
background laundering threshold if at least one inactive queue scan has
elapsed since the last attempt at background laundering.
Submitted by: tijl (original version)
Notes:
svn path=/head/; revision=331879
|
|
|
|
| |
Notes:
svn path=/head/; revision=331873
|
|
|
|
|
|
|
|
|
|
|
| |
single slab, but with alignment adjustment it won't. Again, when
there is only one item in a slab alignment can be ignored. See
previous revision of this file for more info.
PR: 227116
Notes:
svn path=/head/; revision=331872
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and zone has a large alignment. With alignment taken into
account uk_rsize will be greater than space in a slab. However,
since we have only one item per slab, it is always naturally
aligned.
Code that will panic before this change with 4k page:
z = uma_zcreate("test", 3984, NULL, NULL, NULL, NULL, 31, 0);
uma_zalloc(z, M_WAITOK);
A practical scenario to hit the panic is a machine with 56 CPUs
and 2 NUMA domains, which yields in zone size of 3984.
PR: 227116
MFC after: 2 weeks
Notes:
svn path=/head/; revision=331871
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
per-cpu alloc and free of pages. The cache is filled with as few trips
to the phys allocator as possible by the use of a new
vm_phys_alloc_npages() function which allocates as many as N pages.
This code was originally by markj with the import function rewritten by
me.
Reviewed by: markj, kib
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14905
Notes:
svn path=/head/; revision=331863
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a cache of fully populated buckets. This will be used in a follow-on
commit.
The flag idea was originally from markj.
Reviewed by: markj, kib
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Notes:
svn path=/head/; revision=331862
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are out of tree consumers of vm_map_min() and vm_map_max(), and
I believe there are consumers of vm_map_pmap(), although the later is
arguably less in the need of KBI-stable interface. For the consumers
benefit, make modules using this KPI not depended on the struct vm_map
layout.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14902
Notes:
svn path=/head/; revision=331760
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than using the number of inactive queue scans as a metric for
how many clean pages are being freed by the page daemon, have the
page daemon keep a running counter of the number of pages it has freed,
and have the laundry thread use that when computing the background
laundering threshold.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D14884
Notes:
svn path=/head/; revision=331732
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new "interleave" allocation policy which stripes pages across
domains with a stride or width keeping contiguity within a multi-page
region.
Move the kernel to the dedicated numbered cpuset #2 making it possible
to assign kernel threads and memory policy separately from user. This
also eliminates the need for the complicated interrupt binding code.
Add a sysctl API for viewing and manipulating domainsets. Refactor some
of the cpuset_t manipulation code using the generic bitset type so that
it can be used for both. This probably belongs in a dedicated subr file.
Attempt to improve the include situation.
Reviewed by: kib
Discussed with: jhb (cpuset parts)
Tested by: pho (before review feedback)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14839
Notes:
svn path=/head/; revision=331723
|
|
|
|
|
|
|
|
|
|
| |
rather than requiring a half-dozen. Many non-vm files may want to know
the number of valid domains.
Sponsored by: Netflix, Dell/EMC Isilon
Notes:
svn path=/head/; revision=331605
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
should be honored.
We must not sleep or acquire any MI VM locks if TDP_NOFAULTING is
specified. On the other hand, there were some callers in the tree
which set TDP_NOFAULTING for larger scope than needed, I fixed the
code which I wrote, but I suspect that linuxkpi and out of tree drm
drivers might abuse this still.
So only enable the mode for vm_fault_quick_hold_pages() where
vm_fault_hold() is not called when specifically asked by user. I
decided to use vm_prot_t flag to not change KPI. Since number of
flags in vm_prot_t is limited, I reused the same flag which was
already consumed for vm_map_lookup().
Reported and tested by: pho (as part of the larger patch)
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14825
Notes:
svn path=/head/; revision=331557
|