aboutsummaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_reass.c
Commit message (Collapse)AuthorAgeFilesLines
* - Remove net.inet.tcp.reass.overflows sysctl. It counts exactlyGleb Smirnoff2014-05-061-12/+1
| | | | | | | | | | | | same events that tcpstat's tcps_rcvmemdrop counter counts. - Rename tcps_rcvmemdrop to tcps_rcvreassfull and improve its description in netstat(1) output. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=265408
* The tcp_log_addrs() uses th pointer, which points into the mbuf, thus weGleb Smirnoff2014-05-051-1/+1
| | | | | | | | | | can not free the mbuf before tcp_log_addrs(). Sponsored by: Nginx, Inc. Sponsored by: Netflix Notes: svn path=/head/; revision=265393
* The FreeBSD-SA-14:08.tcp was a lesson on not doing acrobatics withGleb Smirnoff2014-05-041-152/+72
| | | | | | | | | | | | | | | | | | | | | | mixing on stack memory and UMA memory in one linked list. Thus, rewrite TCP reassembly code in terms of memory usage. The algorithm remains unchanged. We actually do not need extra memory to build a reassembly queue. Arriving mbufs are always packet header mbufs. So we got the length of data as pkthdr.len. We got m_nextpkt for linkage. And we need only one pointer to point at the tcphdr, use PH_loc for that. In tcpcb the t_segq fields becomes mbuf pointer. The t_segqlen field now counts not packets, but bytes in the queue. This gives us more precision when comparing to socket buffer limits. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=265338
* Fix TCP reassembly vulnerability.Xin LI2014-04-301-3/+4
| | | | | | | | | Patch done by: glebius Security: FreeBSD-SA-14:08.tcp Security: CVE-2014-3000 Notes: svn path=/head/; revision=265121
* Utilize SYSCTL_UMA_CUR() to export usage of syncache andGleb Smirnoff2014-02-071-15/+3
| | | | | | | | | tcp reassembly zones. Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=261594
* The r48589 promised to remove implicit inclusion of if_var.h soon. PrepareGleb Smirnoff2013-10-261-0/+1
| | | | | | | | | | | to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=257176
* uma_zone_set_max() directly returns the rounded effective zoneAndre Oppermann2013-02-011-4/+4
| | | | | | | | | | limit. Use the return value directly instead of doing a second uma_zone_set_max() step. MFC after: 1 week Notes: svn path=/head/; revision=246208
* Fix sysctl_handle_int() usage. Either arg1 or arg2 should be supplied,Gleb Smirnoff2012-12-251-1/+1
| | | | | | | and arg2 doesn't pass size of arg1. Notes: svn path=/head/; revision=244680
* Simplify implementation of net.inet.tcp.reass.maxsegments andAndre Oppermann2012-10-281-17/+11
| | | | | | | | | net.inet.tcp.reass.cursegments. MFC after: 2 weeks Notes: svn path=/head/; revision=242253
* Plug a TCP reassembly UMA zone leak introduced in r226113 by only using theLawrence Stewart2011-11-271-17/+22
| | | | | | | | | | | | | backup stack queue entry when the zone is exhausted, otherwise we leak a zone allocation each time we plug a hole in the reassembly queue. Reported by: many on freebsd-stable@ (thread: "TCP Reassembly Issues") Tested by: many on freebsd-stable@ (thread: "TCP Reassembly Issues") Reviewed by: bz (very brief sanity check) MFC after: 3 days Notes: svn path=/head/; revision=228016
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.Ed Schouten2011-11-071-1/+1
| | | | | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static. Notes: svn path=/head/; revision=227309
* Prevent TCP sessions from stalling indefinitely in reassemblyAndre Oppermann2011-10-071-2/+28
| | | | | | | | | | | | | | | | | | | | | | | when reaching the zone limit of reassembly queue entries. When the zone limit was reached not even the missing segment that would complete the sequence space could be processed preventing the TCP session forever from making any further progress. Solve this deadlock by using a temporary on-stack queue entry for the missing segment followed by an immediate dequeue again by delivering the contiguous sequence space to the socket. Add logging under net.inet.tcp.log_debug for reassembly queue issues. Reviewed by: lsteward (previous version) Tested by: Steven Hartland <killing-at-multiplay.co.uk> MFC after: 3 days Notes: svn path=/head/; revision=226113
* Specify a CTLTYPE_FOO so that a future sysctl(8) change does not needMatthew D Fleming2011-01-181-3/+6
| | | | | | | | | to rely on the format string. For SYSCTL_PROC instances that I noticed a discrepancy between the CTLTYPE and the format specifier, fix the CTLTYPE. Notes: svn path=/head/; revision=217554
* Trim extra spaces before tabs.John Baldwin2011-01-071-1/+1
| | | | Notes: svn path=/head/; revision=217126
* After some off-list discussion, revert a number of changes to theDimitry Andric2010-11-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE. Notes: svn path=/head/; revision=215701
* Add new, per connection, statistics for TCP, including:George V. Neville-Neil2010-11-171-0/+1
| | | | | | | | | | | | | Retransmitted Packets Zero Window Advertisements Out of Order Receives These statistics are available via the -T argument to netstat(1). MFC after: 2 weeks Notes: svn path=/head/; revision=215434
* Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughoutDimitry Andric2010-11-141-4/+4
| | | | | | | the tree. Notes: svn path=/head/; revision=215317
* Retire the system-wide, per-reassembly queue segment limit. The mechanism is farLawrence Stewart2010-10-161-11/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | too coarse grained to be useful and the default value significantly degrades TCP performance on moderate to high bandwidth-delay product paths with non-zero loss (e.g. 5+Mbps connections across the public Internet often suffer). Replace the outgoing mechanism with an individual per-queue limit based on the number of MSS segments that fit into the socket's receive buffer. This should strike a good balance between performance and the potential for resource exhaustion when FreeBSD is acting as a TCP receiver. With socket buffer autotuning (which is enabled by default), the reassembly queue tracks the socket buffer and benefits too. As the XXX comment suggests, my testing uncovered some unexpected behaviour which requires further investigation. By using so->so_rcv.sb_hiwat instead of sbspace(&so->so_rcv), we allow more segments to be held across both the socket receive buffer and reassembly queue than we probably should. The tradeoff is better performance in at least one common scenario, versus a devious sender's ability to consume more resources on a FreeBSD receiver. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo MFC after: 2 weeks Notes: svn path=/head/; revision=213913
* - Switch the "net.inet.tcp.reass.cursegments" andLawrence Stewart2010-10-161-13/+23
| | | | | | | | | | | | | | | | | | | | "net.inet.tcp.reass.maxsegments" sysctl variables to be based on UMA zone stats. The value returned by the cursegments sysctl is approximate owing to the way in which uma_zone_get_cur is implemented. - Discontinue use of V_tcp_reass_qsize as a global reassembly segment count variable in the reassembly implementation. The variable was used without proper synchronisation and was duplicating accounting done by UMA already. The lack of synchronisation was particularly problematic on SMP systems terminating many TCP sessions, resulting in poor TCP performance for connections with non-zero packet loss. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo (as part of a larger patch) MFC after: 2 weeks Notes: svn path=/head/; revision=213912
* Internalise reassembly queue related functionality and variables which shouldLawrence Stewart2010-09-251-3/+25
| | | | | | | | | | | | | not be used outside of the reassembly queue implementation. Provide a new function to flush all segments from a reassembly queue and call it from the appropriate places instead of manipulating the queue directly. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo MFC after: 2 weeks Notes: svn path=/head/; revision=213158
* MFP4: @176978-176982, 176984, 176990-176994, 177441Bjoern A. Zeeb2010-04-291-14/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days Notes: svn path=/head/; revision=207369
* Destroy TCP UMA zones (empty or not) upon network stack teardownBjoern A. Zeeb2010-03-071-0/+9
| | | | | | | | | | | | | | | to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet. We will still leak pages (especially for zones marked NOFREE). Reshuffle cleanup order in tcp_destroy() to get rid of what we can easily free first. Sponsored by: ISPsystem Reviewed by: rwatson MFC after: 5 days Notes: svn path=/head/; revision=204838
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andRobert Watson2009-08-011-1/+1
| | | | | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket) Notes: svn path=/head/; revision=196019
* Remove unused VNET_SET() and related macros; only VNET_GET() isRobert Watson2009-07-161-3/+3
| | | | | | | | | | | | ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib) Notes: svn path=/head/; revision=195727
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorRobert Watson2009-07-141-21/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith) Notes: svn path=/head/; revision=195699
* Remove comment about moving tcp_reass() to its own file named tcp_reass.c,Robert Watson2009-05-251-2/+1
| | | | | | | | | that happened a while ago. MFC after: 3 days Notes: svn path=/head/; revision=192761
* Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() andRobert Watson2009-04-111-6/+6
| | | | | | | | | | | TCPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days Notes: svn path=/head/; revision=190948
* First pass at separating per-vnet initializer functionsMarko Zec2009-04-061-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor) Notes: svn path=/head/; revision=190787
* Rather than using hidden includes (with cicular dependencies),Bjoern A. Zeeb2008-12-021-0/+1
| | | | | | | | | | | | | | directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=185571
* Change the initialization methodology for global variables scheduledMarko Zec2008-11-191-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation Notes: svn path=/head/; revision=185088
* Step 1.5 of importing the network stack virtualization infrastructureMarko Zec2008-10-021-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*). (*) netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation Notes: svn path=/head/; revision=183550
* Commit step 1 of the vimage project, (network stack)Bjoern A. Zeeb2008-08-171-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch Notes: svn path=/head/; revision=181803
* Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros toRobert Watson2008-04-171-1/+1
| | | | | | | | | | | | | | | | | | explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch) Notes: svn path=/head/; revision=178285
* Add FBSDID to all files in netinet so that people can moreMike Silbersack2007-10-071-1/+3
| | | | | | | | | easily include file version information in bug reports. Approved by: re (kensmith) Notes: svn path=/head/; revision=172467
* Complete the (mechanical) move of the TCP reassembly and timewaitAndre Oppermann2007-05-131-31/+2
| | | | | | | | | | functions from their origininal place to their own files. TCP Reassembly from tcp_input.c -> tcp_reass.c TCP Timewait from tcp_subr.c -> tcp_timewait.c Notes: svn path=/head/; revision=169541
* Drop everything that doesn't belong into this new file.Andre Oppermann2007-05-111-2980/+0
| | | | | | | It's neither functional nor connected to the build yet. Notes: svn path=/head/; revision=169481
* Move universally to ANSI C function declarations, with relativelyRobert Watson2007-05-101-1/+2
| | | | | | | consistent style(9)-ish layout. Notes: svn path=/head/; revision=169454
* o Fix style(9) bugs introduced in the last commit.Maxim Konovalov2007-05-091-3/+3
| | | | | | | Pointed out by: bde Notes: svn path=/head/; revision=169417
* o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build.Maxim Konovalov2007-05-091-0/+2
| | | | | | | | PR: kern/112517 Submitted by: vd Notes: svn path=/head/; revision=169405
* Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead ofAndre Oppermann2007-05-061-22/+22
| | | | | | | a decdicated sack_enable int for this bool. Change all users accordingly. Notes: svn path=/head/; revision=169317
* o Remove redundant tcp reassembly check in header prediction codeAndre Oppermann2007-05-061-19/+9
| | | | | | | | | o Rearrange code to make intent in TCPS_SYN_SENT case more clear o Assorted style cleanup o Comment clarification for tcp_dropwithreset() Notes: svn path=/head/; revision=169316
* Reorder the TCP header prediction test to check for the most volatileAndre Oppermann2007-05-061-4/+6
| | | | | | | values first to spend less time on a fallback to normal processing. Notes: svn path=/head/; revision=169315
* Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segmentAndre Oppermann2007-05-061-65/+17
| | | | | | | | | | | and change it to a void function. We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late late segments arriving for such a connection is handled directly in the TW code. Notes: svn path=/head/; revision=169314
* Tweak comment at end of tcp_input() when calling into tcp_do_segment(): theRobert Watson2007-05-041-3/+3
| | | | | | | pcbinfo lock will be released as well, not just the pcb lock. Notes: svn path=/head/; revision=169268
* o Fix INP lock leak in the minttl caseAndre Oppermann2007-04-231-5/+6
| | | | | | | | o Remove indirection in the decision of unlocking inp o Further annotation of locking in tcp_input() Notes: svn path=/head/; revision=168986
* o Remove unncessary TOF_SIGLEN flag from struct tcpoptAndre Oppermann2007-04-201-1/+2
| | | | | | | | o Correctly set to->to_signature in tcp_dooptions() o Update comments Notes: svn path=/head/; revision=168906
* Add more KASSERT's.Andre Oppermann2007-04-201-0/+4
| | | | Notes: svn path=/head/; revision=168905
* Remove bogus check for accept queue length and associated failure handlingAndre Oppermann2007-04-201-16/+10
| | | | | | | | | | | | | | | | | from the incoming SYN handling section of tcp_input(). Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then. Change return value of syncache_add() to void. No status communication is required. Notes: svn path=/head/; revision=168903
* Simplifly syncache_expand() and clarify its semantics. Zero is returnedAndre Oppermann2007-04-201-8/+8
| | | | | | | | | | | | | | | | | | when the ACK is invalid and doesn't belong to any registered connection, either in syncache or through SYN cookies. True but a NULL struct socket is returned when the 3WHS completed but the socket could not be created due to insufficient resources or limits reached. For both cases an RST is sent back in tcp_input(). A logic error leading to a panic is fixed where syncache_expand() would free the mbuf on socket allocation failure but tcp_input() later supplies it to tcp_dropwithreset() to issue a RST to the peer. Reported by: kris (the panic) Notes: svn path=/head/; revision=168902
* Remove unused variable tcbinfo_mtx.Robert Watson2007-04-151-1/+0
| | | | Notes: svn path=/head/; revision=168769