aboutsummaryrefslogtreecommitdiff
path: root/sys/rpc
Commit message (Collapse)AuthorAgeFilesLines
* Do pass removing some write-only variables from the kernel.Alexander Kabaev2017-12-251-6/+2
| | | | | | | | | | | | This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385 Notes: svn path=/head/; revision=327173
* sys: general adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-277-0/+14
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended. Notes: svn path=/head/; revision=326272
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-2035-0/+71
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Listening sockets improvements.Gleb Smirnoff2017-06-081-41/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Separate fields of struct socket that belong to listening from fields that belong to normal dataflow, and unionize them. This shrinks the structure a bit. - Take out selinfo's from the socket buffers into the socket. The first reason is to support braindamaged scenario when a socket is added to kevent(2) and then listen(2) is cast on it. The second reason is that there is future plan to make socket buffers pluggable, so that for a dataflow socket a socket buffer can be changed, and in this case we also want to keep same selinfos through the lifetime of a socket. - Remove struct struct so_accf. Since now listening stuff no longer affects struct socket size, just move its fields into listening part of the union. - Provide sol_upcall field and enforce that so_upcall_set() may be called only on a dataflow socket, which has buffers, and for listening sockets provide solisten_upcall_set(). o Remove ACCEPT_LOCK() global. - Add a mutex to socket, to be used instead of socket buffer lock to lock fields of struct socket that don't belong to a socket buffer. - Allow to acquire two socket locks, but the first one must belong to a listening socket. - Make soref()/sorele() to use atomic(9). This allows in some situations to do soref() without owning socket lock. There is place for improvement here, it is possible to make sorele() also to lock optionally. - Most protocols aren't touched by this change, except UNIX local sockets. See below for more information. o Reduce copy-and-paste in kernel modules that accept connections from listening sockets: provide function solisten_dequeue(), and use it in the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4), infiniband, rpc. o UNIX local sockets. - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX local sockets. Most races exist around spawning a new socket, when we are connecting to a local listening socket. To cover them, we need to hold locks on both PCBs when spawning a third one. This means holding them across sonewconn(). This creates a LOR between pcb locks and unp_list_lock. - To fix the new LOR, abandon the global unp_list_lock in favor of global unp_link_lock. Indeed, separating these two locks didn't provide us any extra parralelism in the UNIX sockets. - Now call into uipc_attach() may happen with unp_link_lock hold if, we are accepting, or without unp_link_lock in case if we are just creating a socket. - Another problem in UNIX sockets is that uipc_close() basicly did nothing for a listening socket. The vnode remained opened for connections. This is fixed by removing vnode in uipc_close(). Maybe the right way would be to do it for all sockets (not only listening), simply move the vnode teardown from uipc_detach() to uipc_close()? Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9770 Notes: svn path=/head/; revision=319722
* * limit size of buffers to RPC_MAXDATASIZEXin LI2017-06-013-7/+21
| | | | | | | | | | | | | | | | | | | * don't leak memory * be more picky about bad parameters From: https://raw.githubusercontent.com/guidovranken/rpcbomb/master/libtirpc_patch.txt https://github.com/guidovranken/rpcbomb/blob/master/rpcbind_patch.txt via NetBSD. Reviewed by: emaste, cem (earlier version) Differential Revision: https://reviews.freebsd.org/D10922 MFC after: 3 days Notes: svn path=/head/; revision=319369
* Remove register keyword from sys/ and ANSIfy prototypesEd Maste2017-05-171-1/+1
| | | | | | | | | | | | | | | A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193 Notes: svn path=/head/; revision=318389
* Fix the client side krpc from doing TCP reconnects for ERESTART from sosend().Rick Macklem2017-05-071-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sosend() replies ERESTART in the client side krpc, it indicates that the RPC message hasn't yet been sent and that the send queue is full or locked while a signal is posted for the process. Without this patch, this would result in a RPC_CANTSEND reply from clnt_vc_call(), which would cause clnt_reconnect_call() to create a new TCP transport connection. For most NFS servers, this wasn't a serious problem, although it did imply retries of outstanding RPCs, which could possibly have missed the DRC. For an NFSv4.1 mount to AmazonEFS, this caused a serious problem, since AmazonEFS often didn't retain the NFSv4.1 session and would reply with NFS4ERR_BAD_SESSION. This implies to the client a crash/reboot which requires open/lock state recovery. Three options were considered to fix this: - Return the ERESTART all the way up to the system call boundary and then have the system call redone. This is fraught with risk, due to convoluted code paths, asynchronous I/O RPCs etc. cperciva@ worked on this, but it is still a work in prgress and may not be feasible. - Set SB_NOINTR for the socket buffer. This fixes the problem, but makes the sosend() completely non interruptible, which kib@ considered inappropriate. It also would break forced dismount when a thread was blocked in sosend(). - Modify the retry loop in clnt_vc_call(), so that it loops for this case for up to 15sec. Testing showed that the sosend() usually succeeded by the 2nd retry. The extreme case observed was 111 loop iterations, or about 100msec of delay. This third alternative is what is implemented in this patch, since the change is: - localized - straightforward - forced dismount is not broken by it. This patch has been tested by cperciva@ extensively against AmazonEFS. Reported by: cperciva Tested by: cperciva MFC after: 2 weeks Notes: svn path=/head/; revision=317906
* Fix a crash during unmount of an NFSv4.1 mount.Rick Macklem2017-04-102-6/+1
| | | | | | | | | | | | | | | | | | Larry Rosenman reported a crash on freebsd-current@ which was caused by a premature release of the krpc backchannel socket structure. I believe this was caused by a race between the SVC_RELEASE() in clnt_vc.c and the xprt_unregister() in the higher layer (clnt_rc.c), which tried to lock the mutex in the xprt structure and crashed. This patch fixes this by removing the xprt_unregister() in the clnt_vc layer and allowing this to always be done by the clnt_rc (higher reconnect layer). Reported by: ler@lerctr.org Tested by: ler@letctr.org MFC after: 2 weeks Notes: svn path=/head/; revision=316694
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* add svcpool_close to handle killed nfsd threadsAndriy Gapon2017-02-142-2/+46
| | | | | | | | | | | | | | | | | | This patch adds a new function to the server krpc called svcpool_close(). It is similar to svcpool_destroy(), but does not free the data structures, so that the pool can be used again. This function is then used instead of svcpool_destroy(), svcpool_create() when the nfsd threads are killed. PR: 204340 Reported by: Panzura Approved by: rmacklem Obtained from: rmacklem MFC after: 1 week Notes: svn path=/head/; revision=313735
* Hide the boottime and bootimebin globals, provide the getboottime(9)Konstantin Belousov2016-07-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name. Issue is that boottime* should be adjusted from tc_windup(), which requires them to be members of the timehands structure. As a preparation, this commit only introduces the interface. Some uses of boottime were found doubtful, e.g. NLM uses boottime to identify the system boot instance. Arguably the identity should not change on the leap second adjustment, but the commit is about the timekeeping code and the consumers were kept bug-to-bug compatible. Tested by: pho (as part of the bigger patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302 Notes: svn path=/head/; revision=303382
* Don't test for xpt not being NULL before calling svc_xprt_free(..)Enji Cooper2016-07-111-3/+2
| | | | | | | | | | | | | svc_xprt_alloc(..) will always return initialized memory as it uses mem_alloc(..) under the covers, which uses malloc(.., M_WAITOK, ..). MFC after: 1 week Reported by: Coverity CID: 1007341 Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=302553
* Convert `svc_xprt_alloc(..)` and `svc_xprt_free(..)`'s prototypes toEnji Cooper2016-07-111-3/+2
| | | | | | | | | | ANSI C style prototypes MFC after: 1 week Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=302552
* Deobfuscate cleanup path in clnt_vc_create(..)Enji Cooper2016-07-111-6/+4
| | | | | | | | | | | | | | Similar to r300836, r301800, and r302550, cl and ct will always be non-NULL as they're allocated using the mem_alloc routines, which always use `malloc(..., M_WAITOK)`. MFC after: 1 week Reported by: Coverity CID: 1007342 Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=302551
* Deobfuscate cleanup path in clnt_dg_create(..)Enji Cooper2016-07-111-5/+3
| | | | | | | | | | | | | | | | | | Similar to r300836 and r301800, cl and cu will always be non-NULL as they're allocated using the mem_alloc routines, which always use `malloc(..., M_WAITOK)`. Deobfuscating the cleanup path fixes a leak where if cl was NULL and cu was not, cu would not be free'd, and also removes a duplicate test for cl not being NULL. MFC after: 1 week Reported by: Coverity CID: 1007033, 1007344 Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=302550
* Deobfuscate cleanup path in clnt_bck_create(..)Enji Cooper2016-06-101-8/+3
| | | | | | | | | | | | | | | | | | | | Similar to r300836, cl and ct will always be non-NULL as they're allocated using the mem_alloc routines, which always use `malloc(..., M_WAITOK)`. Deobfuscating the cleanup path fixes a leak where if cl was NULL and ct was not, ct would not be free'd, and also removes a duplicate test for cl not being NULL. Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D6801 MFC after: 1 week Reported by: Coverity CID: 1229999 Reviewed by: cem Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=301800
* Fix the rpcb_getaddr() definition to match its declaration.Kevin Lo2016-06-091-1/+1
| | | | | | | Submitted by: Sebastian Huber <sebastian dot huber at embedded-brains dot de> Notes: svn path=/head/; revision=301734
* Quell false positives in svc_vc_create and svc_vc_create_conn with cd and xprtEnji Cooper2016-05-271-13/+11
| | | | | | | | | | | | | | | | | | | | | Both cd and xprt will be non-NULL after their respective malloc(9) wrappers are called (mem_alloc and svc_xprt_alloc, which calls mem_alloc) as mem_alloc always gets called with M_WAITOK|M_ZERO today. Thus, testing for them being non-NULL is incorrect -- it misleads Coverity and it misleads the reader. Remove some unnecessary NULL initializations as a follow up to help solidify the fact that these pointers will be initialized properly in sys/rpc/.. with the interfaces the way they are currently. Differential Revision: https://reviews.freebsd.org/D6572 MFC after: 2 weeks Reported by: Coverity CID: 1007338, 1007339, 1007340 Reviewed by: markj, truckman Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=300836
* Remove unnecessary memset(.., 0, ..)'sEnji Cooper2016-05-241-2/+0
| | | | | | | | | | | | The mem_alloc macro calls calloc (userspace) / malloc(.., M_WAITOK|M_ZERO) under the covers, so zeroing out memory is already handled by the underlying calls MFC after: 1 week Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=300625
* sys/rpc: minor spelling fixes.Pedro F. Giffuni2016-05-066-11/+11
| | | | | | | No functional change. Notes: svn path=/head/; revision=299150
* sys: Make use of our rounddown() macro when sys/param.h is available.Pedro F. Giffuni2016-04-301-2/+2
| | | | | | | No functional change. Notes: svn path=/head/; revision=298848
* kgssapi(4): Fix string overrun in Kerberos principal constructionConrad Meyer2016-04-201-1/+1
| | | | | | | | | | | | 'buf.value' was previously treated as a nul-terminated string, but only allocated with strlen() space. Rectify this. Reported by: Coverity CID: 1007639 Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=298336
* RPC: for pointers replace 0 with NULL.Pedro F. Giffuni2016-04-141-1/+1
| | | | | | | | | These are mostly cosmetical, no functional change. Found with devel/coccinelle. Notes: svn path=/head/; revision=297975
* Cleanup unnecessary semicolons from the kernel.Pedro F. Giffuni2016-04-101-1/+1
| | | | | | | Found with devel/coccinelle. Notes: svn path=/head/; revision=297793
* Remove some NULL checks for M_WAITOK allocations.Edward Tomasz Napierala2016-03-291-8/+0
| | | | | | | | MFC after: 1 month Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=297391
* Fix incorrect (fortunately bigger) malloc size.Alexander Motin2016-03-191-1/+1
| | | | | | | | Submitted by: pfg MFC after: 1 week Notes: svn path=/head/; revision=297051
* These files were getting sys/malloc.h and vm/uma.h with header pollutionGleb Smirnoff2016-02-011-0/+1
| | | | | | | via sys/mbuf.h Notes: svn path=/head/; revision=295126
* Improve locking of sg_threadcount.Alexander Motin2015-11-191-1/+3
| | | | | | | MFC after: 1 week Notes: svn path=/head/; revision=291061
* Increase group limit for kerberized NFSv4Josh Paetzel2015-09-261-5/+2
| | | | | | | | | | | PR: 202659 Submitted by: matthew.l.dailey@dartmouth.edu Reviewed by: rmacklem dfr MFC after: 1 week Sponsored by: iXsystems Notes: svn path=/head/; revision=288272
* Set curvnet context inside the RPC code in more places.Xin LI2015-08-182-0/+10
| | | | | | | | | Reviewed by: melifaro MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3398 Notes: svn path=/head/; revision=286894
* Remove useless acquire semantic from the atomic_add operation beforeKonstantin Belousov2015-07-281-1/+1
| | | | | | | | | | | | | sosend(). The only release on the xp_snt_cnt is done after sosend(), with an intent to synchronize with load_acq in svc_vc_ack(). Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=285933
* Remove hard limits on number of accepting NFS connections.Alexander Motin2015-04-072-3/+3
| | | | | | | | | | Limits of 5 connections set long ago creates problems for SPEC benchmark. Make the NFS follow system-wide maximum. MFC after: 1 week Notes: svn path=/head/; revision=281199
* Fix overflow bugs in and remove obsolete limit from kernel RPCGarrett Wollman2015-04-012-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implementation. The kernel RPC code, which is responsible for the low-level scheduling of incoming NFS requests, contains a throttling mechanism that prevents too much kernel memory from being tied up by NFS requests that are being serviced. When the throttle is engaged, the RPC layer stops servicing incoming NFS sockets, resulting ultimately in backpressure on the clients (if they're using TCP). However, this is a very heavy-handed mechanism as it prevents all clients from making any requests, regardless of how heavy or light they are. (Thus, when engaged, the throttle often prevents clients from even mounting the filesystem.) The throttle mechanism applies specifically to requests that have been received by the RPC layer (from a TCP or UDP socket) and are queued waiting to be serviced by one of the nfsd threads; it does not limit the amount of backlog in the socket buffers. The original implementation limited the total bytes of queued requests to the minimum of a quarter of (nmbclusters * MCLBYTES) and 45 MiB. The former limit seems reasonable, since requests queued in the socket buffers and replies being constructed to the requests in progress will all require some amount of network memory, but the 45 MiB limit is plainly ridiculous for modern memory sizes: when running 256 service threads on a busy server, 45 MiB would result in just a single maximum-sized NFS3PROC_WRITE queued per thread before throttling. Removing this limit exposed integer-overflow bugs in the original computation, and related bugs in the routines that actually account for the amount of traffic enqueued for service threads. The old implementation also attempted to reduce accounting overhead by batching updates until each queue is fully drained, but this is prone to livelock, resulting in repeated accumulate-throttle-drain cycles on a busy server. Various data types are changed to long or unsigned long; explicit 64-bit types are not used due to the unavailability of 64-bit atomics on many 32-bit platforms, but those platforms also cannot support nmbclusters large enough to cause overflow. This code (in a 10.1 kernel) is presently running on production NFS servers at CSAIL. Summary of this revision: * Removes 45 MiB limit on requests queued for nfsd service threads * Fixes integer-overflow and signedness bugs * Avoids unnecessary throttling by not deferring accounting for completed requests Differential Revision: https://reviews.freebsd.org/D2165 Reviewed by: rmacklem, mav MFC after: 30 days Relnotes: yes Sponsored by: MIT Computer Science & Artificial Intelligence Laboratory Notes: svn path=/head/; revision=280930
* rpc: Uninitialized pointer readPedro F. Giffuni2015-02-021-1/+1
| | | | | | | | | | | Initialize *xprt to avoid exposing a random value in cleanup_svc_vc_create. This is the kernel counterpart of r278041. CID: 1007340 Notes: svn path=/head/; revision=278100
* Add facility to stop all userspace processes. The supposed use of theKonstantin Belousov2014-12-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=275745
* Current reaction of the nfsd worker threads to any signal is exit.Konstantin Belousov2014-12-081-4/+16
| | | | | | | | | | | | This is not correct at least for the stop requests. Check for stop conditions and suspend threads if requested. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=275618
* In preparation of merging projects/sendfile, transform bare access toGleb Smirnoff2014-11-122-3/+3
| | | | | | | | | | | | | | | sb_cc member of struct sockbuf to a couple of inline functions: sbavail() and sbused() Right now they are equal, but once notion of "not ready socket buffer data", will be checked in, they are going to be different. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=274421
* Merge the NFSv4.1 server code in projects/nfsv4.1-server overRick Macklem2014-07-014-1/+632
| | | | | | | | | | | | into head. The code is not believed to have any effect on the semantics of non-NFSv4.1 server behaviour. It is a rather large merge, but I am hoping that there will not be any regressions for the NFS server. MFC after: 1 month Notes: svn path=/head/; revision=268115
* Fix race in r267221.Alexander Motin2014-06-091-2/+4
| | | | | | | MFC after: 2 weeks Notes: svn path=/head/; revision=267278
* Split RPC pool threads into number of smaller semi-isolated groups.Alexander Motin2014-06-083-182/+250
| | | | | | | | | | | | | | | | | Old design with unified thread pool was good from the point of thread utilization. But single pool-wide mutex became huge congestion point for systems with many CPUs. To reduce the congestion create several thread groups within a pool (one group for every 6 CPUs and 12 threads), each group with own mutex. Each connection during its registration is assigned to one of the groups in round-robin fashion. File affinify code may still move requests between the groups, but otherwise groups are self-contained. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=267228
* Remove st_idle variable, duplicating st_xprt.Alexander Motin2014-06-082-6/+1
| | | | | | | MFC after: 2 weeks Notes: svn path=/head/; revision=267223
* Introduce new per-thread lock to protect the list of requests.Alexander Motin2014-06-082-78/+54
| | | | | | | | | | This allows to slightly simplify svc_run_internal() code: if we processed all the requests in a queue, then we know that new one will not appear. MFC after: 2 weeks Notes: svn path=/head/; revision=267221
* Properly free resources in case of error.Christian Brueffer2014-05-021-7/+5
| | | | | | | | | CID: 1007032 Found with: Coverity Prevent(tm) MFC after: 2 weeks Notes: svn path=/head/; revision=265240
* Fix lock acquisition in case no request space available, missed in r260097.Alexander Motin2014-02-041-1/+1
| | | | | | | MFC after: 3 days Notes: svn path=/head/; revision=261449
* Don't expose svc_loss_reg / _unreg to userland as they're kernel-onlyPeter Wemm2014-01-081-0/+2
| | | | | | | additions from r260229 and the SVCPOOL type doesn't exist in userland. Notes: svn path=/head/; revision=260459
* Fix NULL dereference panic on UDP requests introduced in r260229.Alexander Motin2014-01-061-1/+1
| | | | Notes: svn path=/head/; revision=260367
* Replace locks added in r260229 to protect sequence counters with atomics.Alexander Motin2014-01-042-15/+9
| | | | | | | | | | | | | | New algorithm does not create additional lock congestion, while some races it includes should not be a problem. Those races may keep requests in DRC cache for some more time by returning ACK position smaller then actual, but it still should be able to drop thems when proper ACK finally read. Races of the original algorithm based on TCP seq number were worse because they happened when reply sequence number were recorded. After that even correctly read ACKs could not clean DRC sometimes. Notes: svn path=/head/; revision=260258
* Rework NFS Duplicate Request Cache cleanup logic.Alexander Motin2014-01-034-20/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=260229
* Move most of NFS file handle affinity code out of the heavily congestedAlexander Motin2013-12-302-53/+54
| | | | | | | | | global RPC thread pool lock and protect it with own set of locks. On synthetic benchmarks this improves peak NFS request rate by 40%. Notes: svn path=/head/; revision=260097
* Introduce xprt_inactive_self() -- variant for use when sure that portAlexander Motin2013-12-294-11/+25
| | | | | | | | is assigned to thread. For example, withing receive handlers. In that case the function reduces to single assignment and can avoid locking. Notes: svn path=/head/; revision=260036