aboutsummaryrefslogtreecommitdiff
path: root/sys/kern/uipc_usrreq.c
Commit message (Collapse)AuthorAgeFilesLines
* Update kernel inclusions of capability.h to use capsicum.h instead; someRobert Watson2014-03-161-1/+1
| | | | | | | | | | | further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks Notes: svn path=/head/; revision=263233
* Replace 4.4BSD Lite's unix domain socket backpressure hack with a cleanerAlan Somers2014-03-131-38/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mechanism, based on the new SB_STOP sockbuf flag. The old hack dynamically changed the sending sockbuf's high water mark whenever adding or removing data from the receiving sockbuf. It worked for stream sockets, but it never worked for SOCK_SEQPACKET sockets because of their atomic nature. If the sockbuf was partially full, it might return EMSGSIZE instead of blocking. The new solution is based on DragonFlyBSD's fix from commit 3a6117bbe0ed6a87605c1e43e12a1438d8844380 on 2008-05-27. It adds an SB_STOP flag to sockbufs. Whenever uipc_send surpasses the socket's size limit, it sets SB_STOP on the sending sockbuf. sbspace() will then return 0 for that sockbuf, causing sosend_generic and friends to block. uipc_rcvd will likewise clear SB_STOP. There are two fringe benefits: uipc_{send,rcvd} no longer need to call chgsbsize() on every send and receive because they don't change the sockbuf's high water mark. Also, uipc_sense no longer needs to acquire the UIPC linkage lock, because it's simpler to compute the st_blksizes. There is one drawback: since sbspace() will only ever return 0 or the maximum, sosend_generic will allow the sockbuf to exceed its nominal maximum size by at most one packet of size less than the max. I don't think that's a serious problem. In fact, I'm not even positive that FreeBSD guarantees a socket will always stay within its nominal size limit. sys/sys/sockbuf.h Add the SB_STOP flag and adjust sbspace() sys/sys/unpcb.h Delete the obsolete unp_cc and unp_mbcnt fields from struct unpcb. sys/kern/uipc_usrreq.c Adjust uipc_rcvd, uipc_send, and uipc_sense to use the SB_STOP backpressure mechanism. Removing obsolete unpcb fields from db_show_unpcb. tests/sys/kern/unix_seqpacket_test.c Clear expected failures from ATF. Obtained from: DragonFly BSD PR: kern/185812 Reviewed by: silence from freebsd-net@ and rwatson@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation Notes: svn path=/head/; revision=263116
* Partial revert of change 262914. I screwed up subversion syntax withAlan Somers2014-03-071-24/+38
| | | | | | | | | | | | | perforce syntax and committed some unrelated files. Only devd files should've been committed. Reported by: imp Pointy hat to: asomers MFC after: 3 weeks X-MFC-With: r262914 Notes: svn path=/head/; revision=262915
* sbin/devd/devd.8Alan Somers2014-03-071-38/+24
| | | | | | | | | | | | | sbin/devd/devd.cc Add a -q flag to devd that will suppress syslog logging at LOG_NOTICE or below. Requested by: ian@ and imp@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation Notes: svn path=/head/; revision=262914
* Fix PR kern/185813 "SOCK_SEQPACKET AF_UNIX sockets with asymmetricalAlan Somers2014-03-061-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | buffers drop packets". It was caused by a check for the space available in a sockbuf, but it was checking the wrong sockbuf. sys/sys/sockbuf.h sys/kern/uipc_sockbuf.c Add sbappendaddr_nospacecheck_locked(), which is just like sbappendaddr_locked but doesn't validate the receiving socket's space. Factor out common code into sbappendaddr_locked_internal(). We shouldn't simply make sbappendaddr_locked check the space and then call sbappendaddr_nospacecheck_locked, because that would cause the O(n) function m_length to be called twice. sys/kern/uipc_usrreq.c Use sbappendaddr_nospacecheck_locked for SOCK_SEQPACKET sockets, because the receiving sockbuf's size limit is irrelevant. tests/sys/kern/unix_seqpacket_test.c Now that 185813 is fixed, pipe_128k_8k fails intermittently due to 185812. Make it fail every time by adding a usleep after starting the writer thread and before starting the reader thread in test_pipe. That gives the writer time to fill up its send buffer. Also, clear the expected failure message due to 185813. It actually said "185812", but that was a typo. PR: kern/185813 Reviewed by: silence from freebsd-net@ and rwatson@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation Notes: svn path=/head/; revision=262867
* Provide pr_ctloutput method for AF_LOCAL/SOCK_SEQPACKET sockets.Gleb Smirnoff2013-09-111-0/+1
| | | | | | | | | | This makes setsockopt() on them working. Reported by: Yuri <yuri rawbw.com> Approved by: re (kib) Notes: svn path=/head/; revision=255478
* Change the cap_rights_t type from uint64_t to a structure that we can extendPawel Jakub Dawidek2013-09-051-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...); bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=255219
* Fix receiving fd over unix socket broken in r247740.Mateusz Guzik2013-07-021-2/+2
| | | | | | | | | | | If n fds were passed, it would receive the first one n times. Reported by: Shawn Webb <lattera@gmail.com>, koobs, gleb Tested by: koobs, gleb Reviewed by: pjd Notes: svn path=/head/; revision=252502
* Improve r250890, so that we stop processing of a message with zeroGleb Smirnoff2013-06-041-8/+7
| | | | | | | | | | descriptors as early as possible, and assert that number of descriptors is positive in unp_freerights(). Reviewed by: mjg, pjd, jilles Notes: svn path=/head/; revision=251374
* passing fd over unix socket: fix a corner case where callerMateusz Guzik2013-05-211-1/+8
| | | | | | | | | | | | wants to pass no descriptors. Previously the kernel would leak memory and try to free a potentially arbitrary pointer. Reviewed by: pjd Notes: svn path=/head/; revision=250890
* Fxi a bunch of typos.Eitan Adler2013-05-101-1/+1
| | | | | | | | PR: misc/174625 Submitted by: Jeremy Chadwick <jdc@koitsu.org> Notes: svn path=/head/; revision=250460
* Add fdallocn function and use it when passing fds over unix socket.Mateusz Guzik2013-04-141-12/+9
| | | | | | | | | | This gets rid of "unp_externalize fdalloc failed" panic. Reviewed by: pjd MFC after: 1 week Notes: svn path=/head/; revision=249480
* Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC.Jilles Tjoelker2013-03-191-2/+4
| | | | | | | | | | | | | | | | | | | | | | This change allows creating file descriptors with close-on-exec set in some situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file descriptors (SCM_RIGHTS) atomically close-on-exec. The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD. MSG_CMSG_CLOEXEC is the first free bit for MSG_*. The SOCK_* flags are not passed to MAC because this may cause incorrect failures and can be done later via fcntl() anyway. On the other hand, audit is expected to cope with the new flags. For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags argument. Reviewed by: kib Notes: svn path=/head/; revision=248534
* Fix memory leak when one process send descriptor over UNIX domain socket,Pawel Jakub Dawidek2013-03-111-18/+18
| | | | | | | but the other process exited before receiving it. Notes: svn path=/head/; revision=248176
* For some reason when I started to pass filedescent structures instead ofPawel Jakub Dawidek2013-03-031-20/+28
| | | | | | | | | | | | | | | | | | | | | pointers to the file structure receiving descriptors stopped to work when also at least few kilobytes of data is being send. In the kernel the soreceive_generic() function doesn't see control mbuf as the first mbuf and unp_externalize() is never called, first 6(?) kilobytes of data is missing as well on receiving end. This breaks for example tmux. I don't know yet why going from 8 bytes to sizeof(struct filedescent) per descriptor (or even to 16 bytes per descriptor) breaks things, but to work-around it for now use 8 bytes per file descriptor at the cost of memory allocation. Reported by: flo, Diane Bruce, Jan Beich <jbeich@tormail.org> Simple testcase provided by: mjg Notes: svn path=/head/; revision=247740
* Plug memory leaks in file descriptors passing.Pawel Jakub Dawidek2013-03-031-1/+2
| | | | Notes: svn path=/head/; revision=247736
* - Implement two new system calls:Pawel Jakub Dawidek2013-03-021-5/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des Notes: svn path=/head/; revision=247667
* Merge Capsicum overhaul:Pawel Jakub Dawidek2013-03-021-44/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib Notes: svn path=/head/; revision=247602
* Add support of passing SCM_BINTIME ancillary data object for PF_LOCALSergey Kandaurov2013-02-151-0/+13
| | | | | | | | | | | | sockets. PR: kern/175883 Submitted by: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua> Discussed with: glebius, phk MFC after: 2 weeks Notes: svn path=/head/; revision=246826
* Configure UMA warnings for the following zones:Pawel Jakub Dawidek2012-12-071-0/+1
| | | | | | | | | | | | | | | | | | | | - unp_zone: kern.ipc.maxsockets limit reached - socket_zone: kern.ipc.maxsockets limit reached - zone_mbuf: kern.ipc.nmbufs limit reached - zone_clust: kern.ipc.nmbclusters limit reached - zone_jumbop: kern.ipc.nmbjumbop limit reached - zone_jumbo9: kern.ipc.nmbjumbo9 limit reached - zone_jumbo16: kern.ipc.nmbjumbo16 limit reached Note that those warnings are printed not often than every five minutes and can be globally turned off by setting sysctl/tunable vm.zone_warnings to 0. Discussed on: arch Obtained from: WHEEL Systems MFC after: 2 weeks Notes: svn path=/head/; revision=243999
* Schedule garbage collection run for the in-flight rights passed overKonstantin Belousov2012-11-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | the unix domain sockets to the next tick, coalescing the serial calls until the collection fires. The thought is that more work for the collector could arise in the near time, allowing to clean more and not spend too much CPU on repeated collection when there is no garbage. Currently the collection task is fired immediately upon unix domain socket close if there are any rights in flight, which caused excessive CPU usage and too long blocking of the threads waiting for unp_list_lock and unp_link_rwlock in write mode. Robert noted that it would be nice if we could find some heuristic by which we decide whether to run GC a bit more quickly. E.g., if the number of UNIX domain sockets is close to its resource limit, but not quite. Reported and tested by: Markus Gebert <markus.gebert@hostpoint.ch> Reviewed by: rwatson MFC after: 2 weeks Notes: svn path=/head/; revision=243342
* Update comment.Gleb Smirnoff2012-11-161-1/+2
| | | | Notes: svn path=/head/; revision=243152
* Remove the support for using non-mpsafe filesystem modules.Konstantin Belousov2012-10-221-24/+5
| | | | | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho Notes: svn path=/head/; revision=241896
* Fix up kernel sources to be ready for a 64-bit ino_t.Matthew D Fleming2012-09-271-1/+1
| | | | | | | Original code by: Gleb Kurtsou Notes: svn path=/head/; revision=241011
* Supply the pr_ctloutput method for local datagram sockets,Gleb Smirnoff2012-09-071-0/+1
| | | | | | | | | | so that setsockopt() and getsockopt() work on them. This makes 'tools/regression/sockets/unix_cmsg -t dgram' more successful. Notes: svn path=/head/; revision=240214
* When checking if file descriptor number is valid, explicitely check for 'fd'Pawel Jakub Dawidek2012-06-131-1/+1
| | | | | | | | | being less than 0 instead of using cast-to-unsigned hack. Today's commit was brought to you by the letters 'B', 'D' and 'E' :) Notes: svn path=/head/; revision=237036
* Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()Mikolaj Golub2012-02-291-8/+6
| | | | | | | | | | | | | | | | | | | | | | operations for setting and accessing vnode's v_socket field. The operations are necessary to implement proper unix socket handling on layered file systems like nullfs(5). This change fixes the long standing issue with nullfs(5) being in that unix sockets did not work between lower and upper layers: if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. PR: kern/51583, kern/159663 Suggested by: kib Reviewed by: arch MFC after: 2 weeks Notes: svn path=/head/; revision=232317
* When detaching an unix domain socket, uipc_detach() checksMikolaj Golub2012-02-251-0/+39
| | | | | | | | | | | | | | | | | | unp->unp_vnode pointer to detect if there is a vnode associated with (binded to) this socket and does necessary cleanup if there is. The issue is that after forced unmount this check may be too late as the unp_vnode is reclaimed and the reference is stale. To fix this provide a helper function that is called on a socket vnode reclamation to do necessary cleanup. Pointed by: kib Reviewed by: kib MFC after: 2 weeks Notes: svn path=/head/; revision=232152
* unp_connect() may use a shared lock on the vnode to fetch the socket.Mikolaj Golub2012-02-211-2/+2
| | | | | | | | | Suggested by: jhb Reviewed by: jhb, kib, rwatson MFC after: 2 weeks Notes: svn path=/head/; revision=231976
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.Ed Schouten2011-11-071-4/+5
| | | | | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static. Notes: svn path=/head/; revision=227309
* Fix handling of corrupt compress(1)ed data. [11:04]Bjoern A. Zeeb2011-09-281-0/+4
| | | | | | | | | | | | | Add missing length checks on unix socket addresses. [11:05] Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-11:04.compress Security: CVE-2011-2895 [11:04] Security: FreeBSD-SA-11:05.unix Notes: svn path=/head/; revision=225827
* Prevent the hiwatermark for the unix domain socket from becomingKonstantin Belousov2011-08-201-2/+5
| | | | | | | | | | | | | | | | | | | | effectively negative. Often seen as upstream fastcgi connection timeouts in nginx when using sendfile over unix domain sockets for communication. Sendfile(2) may send more bytes then currently allowed by the hiwatermark of the socket, e.g. because the so_snd sockbuf lock is dropped after sbspace() call in the kern_sendfile() loop. In this case, recalculated hiwatermark will overflow. Since lowatermark is renewed as half of the hiwatermark by sendfile code, and both are unsigned, the send buffer never reaches the free space requested by lowatermark, causing indefinite wait in sendfile. Reviewed by: rwatson Approved by: re (bz) MFC after: 2 weeks Notes: svn path=/head/; revision=225040
* Mfp4 CH=177274,177280,177284-177285,177297,177324-177325Bjoern A. Zeeb2011-02-161-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks Notes: svn path=/head/; revision=218757
* The unp_gc() function drops and reaquires lock between scan andKonstantin Belousov2011-02-011-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | collect phases. The unp_discard() function executes unp_externalize_fp(), which might make the socket eligible for gc-ing, and then, later, taskqueue will close the socket. Since unp_gc() dropped the list lock to do the malloc, close might happen after the mark step but before the collection step, causing collection to not find the socket and miss one array element. I believe that the race was there before r216158, but the stated revision made the window much wider by postponing the close to taskqueue sometimes. Only process as much array elements as we find the sockets during second phase of gc [1]. Take linkage lock and recheck the eligibility of the socket for gc, as well as call fhold() under the linkage lock. Reported and tested by: jmallett Submitted by: jmallett [1] Reviewed by: rwatson, jeff (possibly) MFC after: 1 week Notes: svn path=/head/; revision=218168
* Specify a CTLTYPE_FOO so that a future sysctl(8) change does not needMatthew D Fleming2011-01-181-9/+10
| | | | | | | to rely on the format string. Notes: svn path=/head/; revision=217555
* Trim whitespaces at the end of lines. Use the commit to recordKonstantin Belousov2010-12-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | proper log message for r216150. MFC after: 1 week If unix socket has a unix socket attached as the rights that has a unix socket attached as the rights that has a unix socket attached as the rights ... Kernel may overflow the stack on attempt to close such socket. Only close the rights file in the context of the current close if the file is not unix domain socket. Otherwise, postpone the work to taskqueue, preventing unlimited recursion. The pass of the unix domain sockets over the SCM_RIGHTS message control is not widely used, and more, the close of the socket with still attached rights is mostly an application failure. The change should not affect the performance of typical users of SCM_RIGHTS. Reviewed by: jeff, rwatson Notes: svn path=/head/; revision=216158
* Reviewed by: jeff, rwatsonKonstantin Belousov2010-12-031-5/+74
| | | | | | | MFC after: 1 week Notes: svn path=/head/; revision=216150
* Remove spurious '/*-' marks and fix some other style problems.Edward Tomasz Napierala2010-07-221-1/+1
| | | | | | | Submitted by: bde@ Notes: svn path=/head/; revision=210365
* Revert r210225 - turns out I was wrong; the "/*-" is not license-onlyEdward Tomasz Napierala2010-07-181-1/+1
| | | | | | | | | | thing; it's also used to indicate that the comment should not be automatically rewrapped. Explained by: cperciva@ Notes: svn path=/head/; revision=210226
* The "/*-" comment marker is supposed to denote copyrights. Remove non-copyrightEdward Tomasz Napierala2010-07-181-1/+1
| | | | | | | occurences from sys/sys/ and sys/kern/. Notes: svn path=/head/; revision=210225
* Fix build on amd64, where sysctl arg1 is a pointer.Robert Watson2009-10-051-1/+1
| | | | | | | | Reported by: Mr Tinderbox MFC after: 3 months Notes: svn path=/head/; revision=197794
* First cut at implementing SOCK_SEQPACKET support for UNIX (local) domainRobert Watson2009-10-051-16/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sockets. This allows for reliable bi-directional datagram communication over UNIX domain sockets, in contrast to SOCK_DGRAM (M:N, unreliable) or SOCK_STERAM (bi-directional bytestream). Largely, this reuses existing UNIX domain socket code. This allows applications requiring record- oriented semantics to do so reliably via local IPC. Some implementation notes (also present in XXX comments): - Currently we lack an sbappend variant able to do datagrams and control data without doing addresses, so we mark SOCK_SEQPACKET as PR_ADDR. Adding a new variant will solve this problem. - UNIX domain sockets on FreeBSD provide back-pressure/flow control notification for stream sockets by manipulating the send socket buffer's size during pru_send and pru_rcvd. This trick works less well for SOCK_SEQPACKET as sosend_generic() uses sb_hiwat not just to manage blocking, but also to determine maximum datagram size. Fixing this requires rethinking how back-pressure is done for SOCK_SEQPACKET; in the mean time, it's possible to get EMSGSIZE when buffers fill, instead of blocking. Discussed with: benl Reviewed by: bz, rpaulo MFC after: 3 months Sponsored by: Google Notes: svn path=/head/; revision=197775
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andRobert Watson2009-08-011-1/+2
| | | | | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket) Notes: svn path=/head/; revision=196019
* Remove unnecessary/redundant includes.Jamie Gritton2009-06-231-1/+0
| | | | | | | Approved by: bz (mentor) Notes: svn path=/head/; revision=194707
* Fix a deadlock in the getpeername() method for UNIX domain sockets.John Baldwin2009-06-181-4/+4
| | | | | | | | | | | | | Instead of locking the local unp followed by the remote unp, use the same locking model as accept() and read lock the global link lock followed by the remote unp while fetching the remote sockaddr. Reported by: Mel Flynn mel.flynn of mailing.thruhere.net Reviewed by: rwatson MFC after: 1 week Notes: svn path=/head/; revision=194460
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICRobert Watson2009-06-051-1/+0
| | | | | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd Notes: svn path=/head/; revision=193511
* Add internal 'mac_policy_count' counter to the MAC Framework, which is aRobert Watson2009-06-021-2/+0
| | | | | | | | | | | | | | | | | | | | | count of the number of registered policies. Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero. This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies. Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls. Obtained from: TrustedBSD Project Notes: svn path=/head/; revision=193332
* Change the curvnet variable from a global const struct vnet *,Marko Zec2009-05-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_* macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor) Notes: svn path=/head/; revision=191816
* Remove VOP_LEASE and supporting functions. This hasn't been used sinceRobert Watson2009-04-101-3/+1
| | | | | | | | | | | | | | | | | the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon Notes: svn path=/head/; revision=190888
* Decompose the global UNIX domain sockets rwlock into two differentRobert Watson2009-03-081-102/+96
| | | | | | | | | | | | | | | | | | | | | | | locks: a global list/counter/generation counter protected by a new mutex unp_list_lock, and a global linkage rwlock, unp_global_rwlock, which protects the connections between UNIX domain sockets. This eliminates conditional lock acquisition that was previously a property of the global lock being held over sonewconn() leading to a call to uipc_attach(), which also required the global lock, but couldn't rely on it as other paths existed to uipc_attach() that didn't hold it: now uipc_attach() uses only the list lock, which follows the linkage lock in the lock order. It may also reduce contention on the global lock for some workloads. Add global UNIX domain socket locks to hard-coded witness lock order. MFC after: 1 week Discussed with: kris Notes: svn path=/head/; revision=189544