aboutsummaryrefslogtreecommitdiff
path: root/lib
diff options
context:
space:
mode:
authorJohn Baldwin <jhb@FreeBSD.org>2011-11-04 04:02:50 +0000
committerJohn Baldwin <jhb@FreeBSD.org>2011-11-04 04:02:50 +0000
commit936c09ac0f32aa16d60492cebe7254b0631105eb (patch)
tree159ae25b13b965df34d0e93885cca08178c0b2a2 /lib
parentdccc45e4c095172c99b9b30d4e7dfcc4ac44c295 (diff)
downloadsrc-936c09ac0f32aa16d60492cebe7254b0631105eb.tar.gz
src-936c09ac0f32aa16d60492cebe7254b0631105eb.zip
Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
Notes
Notes: svn path=/head/; revision=227070
Diffstat (limited to 'lib')
-rw-r--r--lib/libc/sys/Makefile.inc3
-rw-r--r--lib/libc/sys/Symbol.map4
-rw-r--r--lib/libc/sys/madvise.23
-rw-r--r--lib/libc/sys/posix_fadvise.2139
4 files changed, 147 insertions, 2 deletions
diff --git a/lib/libc/sys/Makefile.inc b/lib/libc/sys/Makefile.inc
index fe5061d97f3c..6da6a0078441 100644
--- a/lib/libc/sys/Makefile.inc
+++ b/lib/libc/sys/Makefile.inc
@@ -96,7 +96,8 @@ MAN+= abort2.2 accept.2 access.2 acct.2 adjtime.2 \
mq_setattr.2 \
msgctl.2 msgget.2 msgrcv.2 msgsnd.2 \
msync.2 munmap.2 nanosleep.2 nfssvc.2 ntp_adjtime.2 open.2 \
- pathconf.2 pdfork.2 pipe.2 poll.2 posix_fallocate.2 posix_openpt.2 profil.2 \
+ pathconf.2 pdfork.2 pipe.2 poll.2 posix_fadvise.2 posix_fallocate.2 \
+ posix_openpt.2 profil.2 \
pselect.2 ptrace.2 quotactl.2 \
read.2 readlink.2 reboot.2 recv.2 rename.2 revoke.2 rfork.2 rmdir.2 \
rtprio.2
diff --git a/lib/libc/sys/Symbol.map b/lib/libc/sys/Symbol.map
index 095751a441cd..d0c0c940f1d9 100644
--- a/lib/libc/sys/Symbol.map
+++ b/lib/libc/sys/Symbol.map
@@ -378,6 +378,10 @@ FBSD_1.2 {
setloginclass;
};
+FBSD_1.3 {
+ posix_fadvise;
+};
+
FBSDprivate_1.0 {
___acl_aclcheck_fd;
__sys___acl_aclcheck_fd;
diff --git a/lib/libc/sys/madvise.2 b/lib/libc/sys/madvise.2
index 48f0e5af8e7c..b5ea6b2b9352 100644
--- a/lib/libc/sys/madvise.2
+++ b/lib/libc/sys/madvise.2
@@ -169,7 +169,8 @@ was specified and the process does not have superuser privileges.
.Xr mincore 2 ,
.Xr mprotect 2 ,
.Xr msync 2 ,
-.Xr munmap 2
+.Xr munmap 2 ,
+.Xr posix_fadvise 2
.Sh STANDARDS
The
.Fn posix_madvise
diff --git a/lib/libc/sys/posix_fadvise.2 b/lib/libc/sys/posix_fadvise.2
new file mode 100644
index 000000000000..bdf321fe2191
--- /dev/null
+++ b/lib/libc/sys/posix_fadvise.2
@@ -0,0 +1,139 @@
+.\" Copyright (c) 1991, 1993
+.\" The Regents of the University of California. All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\" 4. Neither the name of the University nor the names of its contributors
+.\" may be used to endorse or promote products derived from this software
+.\" without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" @(#)madvise.2 8.1 (Berkeley) 6/9/93
+.\" $FreeBSD$
+.\"
+.Dd October 26, 2011
+.Dt POSIX_FADVISE 2
+.Os
+.Sh NAME
+.Nm posix_fadvise
+.Nd give advice about use of file data
+.Sh LIBRARY
+.Lb libc
+.Sh SYNOPSIS
+.In fcntl.h
+.Ft int
+.Fn posix_fadvise "int fd" "off_t offset" "off_t len" "int advice"
+.Sh DESCRIPTION
+The
+.Fn posix_fadvise
+system call
+allows a process to describe to the system its data access behavior for an
+open file descriptor
+.Fa fd .
+The advice covers the data starting at offset
+.Fa offset
+and continuing for
+.Fa len
+bytes.
+If
+.Fa len
+is zero,
+all data from
+.Fa offset
+to the end of the file is covered.
+.Pp
+The behavior is specified by the
+.Fa advice
+parameter and may be one of:
+.Bl -tag -width POSIX_FADV_SEQUENTIAL
+.It Dv POSIX_FADV_NORMAL
+Tells the system to revert to the default data access behavior.
+.It Dv POSIX_FADV_RANDOM
+Is a hint that file data will be accessed randomly,
+and prefetching is likely not advantageous.
+.It Dv POSIX_FADV_SEQUENTIAL
+Tells the system that file data will be accessed sequentially.
+This currently does nothing as the default behavior uses heuristics to
+detect sequential behavior.
+.It Dv POSIX_FADV_WILLNEED
+Tells the system that the specified data will be accessed in the near future.
+The system may initiate an asychronous read of the data if it is not already
+present in memory.
+.It Dv POSIX_FADV_DONTNEED
+Tells the system that the specified data will not be accessed in the near
+future.
+The system may decrease the in-memory priority of clean data within the
+specified range and future access to this data may require a read operation.
+.It Dv POSIX_FADV_NOREUSE
+Tells the system that the specified data will only be accessed once and
+then not reused.
+Accesses to data within the specified range are treated as if the file
+descriptor has the
+.Dv O_DIRECT
+flag enabled.
+.El
+.Pp
+.Sh RETURN VALUES
+.Rv -std posix_fadvise
+.Sh ERRORS
+The
+.Fn posix_fadvise
+system call will fail if:
+.Bl -tag -width Er
+.It Bq Er EBADF
+The
+.Fa fd
+argument is not a valid file descriptor.
+.It Bq Er EINVAL
+The
+.Fa advice
+argument is not valid.
+.It Bq Er EINVAL
+The
+.Fa offset
+or
+.Fa len
+arguments are negative,
+or
+.Fa offset
++
+.Fa len
+is greater than the maximum file size.
+.It Bq Er ENODEV
+The
+.Fa fd
+argument does not refer to a regular file.
+.It Bq Er ESPIPE
+The
+.Fa fd
+argument is associated with a pipe or FIFO.
+.El
+.Sh SEE ALSO
+.Xr madvise 2
+.Sh STANDARDS
+The
+.Fn posix_fadvise
+interface conforms to
+.St -p1003.1-2001 .
+.Sh HISTORY
+The
+.Fn posix_fadvise
+system call first appeared in
+.Fx 10.0 .