aboutsummaryrefslogtreecommitdiff
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove unused code. This is triggered by the bugreport of Sylvestre LedruMichael Tuexen2014-05-061-8/+0
| | | | | | | | | | which deal with useless code in the user land stack: https://bugzilla.mozilla.org/show_bug.cgi?id=1003929 MFC after: 3 days Notes: svn path=/head/; revision=265455
* - Remove net.inet.tcp.reass.overflows sysctl. It counts exactlyGleb Smirnoff2014-05-062-13/+2
| | | | | | | | | | | | same events that tcpstat's tcps_rcvmemdrop counter counts. - Rename tcps_rcvmemdrop to tcps_rcvreassfull and improve its description in netstat(1) output. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=265408
* The tcp_log_addrs() uses th pointer, which points into the mbuf, thus weGleb Smirnoff2014-05-051-1/+1
| | | | | | | | | | can not free the mbuf before tcp_log_addrs(). Sponsored by: Nginx, Inc. Sponsored by: Netflix Notes: svn path=/head/; revision=265393
* The FreeBSD-SA-14:08.tcp was a lesson on not doing acrobatics withGleb Smirnoff2014-05-045-173/+76
| | | | | | | | | | | | | | | | | | | | | | mixing on stack memory and UMA memory in one linked list. Thus, rewrite TCP reassembly code in terms of memory usage. The algorithm remains unchanged. We actually do not need extra memory to build a reassembly queue. Arriving mbufs are always packet header mbufs. So we got the length of data as pkthdr.len. We got m_nextpkt for linkage. And we need only one pointer to point at the tcphdr, use PH_loc for that. In tcpcb the t_segq fields becomes mbuf pointer. The t_segqlen field now counts not packets, but bytes in the queue. This gives us more precision when comparing to socket buffer limits. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=265338
* Fix panic on IPv4 address removal introduced in r265279.Alexander V. Chernikov2014-05-031-0/+1
| | | | | | | | Reported by: Trond Endrestøl MFC with: r265279 Notes: svn path=/head/; revision=265288
* Pass radix head ptr along with rte to rtexpunge().Alexander V. Chernikov2014-05-031-4/+4
| | | | | | | Rename rtexpunge to rt_expunge(). Notes: svn path=/head/; revision=265279
* Fix TCP reassembly vulnerability.Xin LI2014-04-301-3/+4
| | | | | | | | | Patch done by: glebius Security: FreeBSD-SA-14:08.tcp Security: CVE-2014-3000 Notes: svn path=/head/; revision=265121
* Fix a panic when removing an IP address from an interface, if the same addressAlan Somers2014-04-291-3/+7
| | | | | | | | | | | | | | | | | | exists on another interface. The panic was introduced by change 264887, which changed the fibnum parameter in the call to rtalloc1_fib() in ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS. The solution is to use the interface fib in that call. For the majority of users, that will be equivalent to the legacy behavior. PR: kern/189089 Reported by: neel Reviewed by: neel MFC after: 3 weeks X-MFC with: 264887 Sponsored by: Spectra Logic Notes: svn path=/head/; revision=265092
* Fix subnet and default routes on different FIBs on the same subnet.Alan Somers2014-04-244-12/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These two bugs are closely related. The root cause is that ifa_ifwithnet does not consider FIBs when searching for an interface address. sys/net/if_var.h sys/net/if.c Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those functions will only return an address whose interface fib equals the argument. sys/net/route.c Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib arguments. sys/netinet/in.c Update in_addprefix to consider the interface fib when adding prefixes. This will prevent it from not adding a subnet route when one already exists on a different fib. sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/netinet6/nd6.c Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet. In some cases it there wasn't a clear specific fib number to use. In others, I was unable to test those functions so I chose RT_DEFAULT_FIB to minimize divergence from current behavior. I will fix some of the latter changes along with PR kern/187553. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c tests/sys/netinet/Makefile Revert r263738. The udp_dontroute test was right all along. However, bugs kern/187550 and kern/187553 cancelled each other out when it came to this test. Because of kern/187553, ifa_ifwithnet searched the default fib instead of the requested one, but because of kern/187550, there was an applicable subnet route on the default fib. The new test added in r263738 doesn't work right, however. I can verify with dtrace that ifa_ifwithnet returned the wrong address before I applied this commit, but route(8) miraculously found the correct interface to use anyway. I don't know how. Clear expected failure messages for kern/187550 and kern/187552. PR: kern/187550 PR: kern/187552 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic Notes: svn path=/head/; revision=264905
* Fix host and network routes for new interfaces when net.add_addr_allfibs=0Alan Somers2014-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | sys/net/route.c In rtinit1, use the interface fib instead of the process fib. The latter wasn't very useful because ifconfig(8) is usually invoked with the default process fib. Changing ifconfig(8) to use setfib(2) would be redundant, because it already sets the interface fib. tests/sys/netinet/fibs_test.sh Clear the expected ATF failure sys/net/if.c Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib sys/netinet/in.c sys/net/if_var.h Add a fibnum argument to ifa_switch_loopback_route, a subroutine of in_scrubprefix. Pass it the interface fib. PR: kern/187549 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic Corporation Notes: svn path=/head/; revision=264887
* Fix jailed raw sockets not setting the correct source address bySteven Hartland2014-04-243-8/+10
| | | | | | | | | calling in_pcbladdr instead of prison_get_ip4 MFC after: 1 month Notes: svn path=/head/; revision=264879
* Don't free an mbuf twice. This only happens in very rare errorMichael Tuexen2014-04-231-1/+15
| | | | | | | | | | cases where the peer sends illegal sequencing information in DATA chunks for an existing association. MFC after: 3 days. Notes: svn path=/head/; revision=264838
* Add {} braces so that the code conforms to the indentation.Rick Macklem2014-04-211-2/+4
| | | | | | | | | | | Fortunately, I don't think doing the assignment of cap->tsomax unconditionally causes any problem. Reviewed by: glebius MFC after: 2 weeks Notes: svn path=/head/; revision=264739
* Add consistency checks to ensure that fragments of a user messageMichael Tuexen2014-04-201-1/+36
| | | | | | | | | have the same U-bit. MFC after: 3 days Notes: svn path=/head/; revision=264704
* Send also a packet containing an ABORT chunk in response to an OOTB packetMichael Tuexen2014-04-201-3/+0
| | | | | | | | | containing a COOKIE-ECHO chunk. MFC after: 3 days Notes: svn path=/head/; revision=264701
* Use consistently debug output instead of an unconditional printf.Michael Tuexen2014-04-191-1/+1
| | | | | | | MFC after: 3 days Notes: svn path=/head/; revision=264682
* Send the correct error cause, when a DATA chunk with no user dataMichael Tuexen2014-04-194-1/+41
| | | | | | | | | is received. This bug was reported by Irene Ruengeler. MFC after: 3 days Notes: svn path=/head/; revision=264679
* Some whitespace and style fixes.John Baldwin2014-04-111-26/+22
| | | | | | | Submitted by: bde Notes: svn path=/head/; revision=264356
* The tw_pcbrele() function does not need the global timewait lock.John Baldwin2014-04-111-14/+4
| | | | | | | | Submitted by: Julien Charbon Suggested by: glebius Notes: svn path=/head/; revision=264351
* Don't leak the TCP pcbinfo lock if a time wait connection is closedJohn Baldwin2014-04-111-1/+3
| | | | | | | | | | in between grabbing a reference on the connection structure and obtaining the pcbinfo lock. Reviewed by: Julien Charbon Notes: svn path=/head/; revision=264342
* Currently, the TCP slow timer can starve TCP input processing while itJohn Baldwin2014-04-104-29/+127
| | | | | | | | | | | | | | | | | | walks the list of connections in TIME_WAIT closing expired connections due to contention on the global TCP pcbinfo lock. To remediate, introduce a new global lock to protect the list of connections in TIME_WAIT. Only acquire the TCP pcbinfo lock when closing an expired connection. This limits the window of time when TCP input processing is stopped to the amount of time needed to close a single connection. Submitted by: Julien Charbon <jcharbon@verisign.com> Reviewed by: rwatson, rrs, adrian MFC after: 2 months Notes: svn path=/head/; revision=264321
* Remove a bogus re-assignment.Kevin Lo2014-04-081-1/+0
| | | | Notes: svn path=/head/; revision=264248
* Minor style cleanups.Kevin Lo2014-04-072-15/+15
| | | | Notes: svn path=/head/; revision=264213
* Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks.Kevin Lo2014-04-077-59/+295
| | | | | | | | | | | Tested with vlc and a test suite [1]. [1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz Reviewed by: jhb, glebius, adrian Notes: svn path=/head/; revision=264212
* Improve readability of comments for DELAY_ACK() macro.Hiren Panchasara2014-04-031-7/+8
| | | | Notes: svn path=/head/; revision=264063
* Increment the SSN only after processing the last fragment of anMichael Tuexen2014-04-011-1/+2
| | | | | | | | | ordered user message. MFC after: 3 days Notes: svn path=/head/; revision=264017
* Don't copy the MF flag from original IP header to ICMP error message.Andrey V. Elsukov2014-03-311-0/+1
| | | | | | | | | PR: 188092 MFC after: 1 week Sponsored by: Yandex LLC Notes: svn path=/head/; revision=263966
* Handle an edge case of address management similar to TCP.Michael Tuexen2014-03-291-1/+8
| | | | | | | | | | | This needs to be reconsidered when the address handling will be reimplemented. The patch is from rrs@. MFC after: 3 days Notes: svn path=/head/; revision=263922
* Use SCTP_OVER_UDP_TUNNELING_PORT more consistently.Michael Tuexen2014-03-292-10/+4
| | | | | | | MFC after: 3 days Notes: svn path=/head/; revision=263921
* Correct ARP update handling when the routes for network interfaces areAlan Somers2014-03-262-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | restricted to a single FIB in a multifib system. Restricting an interface's routes to the FIB to which it is assigned (by setting net.add_addr_allfibs=0) causes ARP updates to fail with "arpresolve: can't allocate llinfo for x.x.x.x". This is due to the ARP update code hard coding it's lookup for existing routing entries to FIB 0. sys/netinet/in.c: When dealing with RTM_ADD (add route) requests for an interface, use the interface's assigned FIB instead of the default (FIB 0). sys/netinet/if_ether.c: In arpresolve(), enhance error message generated when an lla_lookup() fails so that the interface causing the error is visible in logs. tests/sys/netinet/fibs_test.sh Clear ATF expected error. PR: kern/167947 Submitted by: Nikolay Denev <ndenev@gmail.com> (previous version) Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic Corporation Notes: svn path=/head/; revision=263779
* Correct the comments as support for RFC 1644 has been removed for a long time.Hiren Panchasara2014-03-251-3/+1
| | | | Notes: svn path=/head/; revision=263748
* * Provide information in error causes in ASCII instead ofMichael Tuexen2014-03-1612-665/+275
| | | | | | | | | | | | | | proprietary binary format. * Add support for a diagnostic information error cause. The code is sysctlable and the default is 0, which means it is not sent. This is joint work with rrs@. MFC after: 1 week Notes: svn path=/head/; revision=263237
* Several years after initial development, merge prototype support forRobert Watson2014-03-156-9/+794
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge) Notes: svn path=/head/; revision=263198
* Remove AppleTalk support.Gleb Smirnoff2014-03-141-13/+0
| | | | | | | | | | | | | AppleTalk was a network transport protocol for Apple Macintosh devices in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was a legacy protocol and primary networking protocol is TCP/IP. The last Mac OS X release to support AppleTalk happened in 2009. The same year routing equipment vendors (namely Cisco) end their support. Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE. Notes: svn path=/head/; revision=263152
* Remove IPX support.Gleb Smirnoff2014-03-141-1/+0
| | | | | | | | | | | | | | IPX was a network transport protocol in Novell's NetWare network operating system from late 80s and then 90s. The NetWare itself switched to TCP/IP as default transport in 1998. Later, in this century the Novell Open Enterprise Server became successor of Novell NetWare. The last release that claimed to still support IPX was OES 2 in 2007. Routing equipment vendors (e.g. Cisco) discontinued support for IPX in 2011. Thus, IPX won't be supported in FreeBSD 11.0-RELEASE. Notes: svn path=/head/; revision=263140
* Put the offset of the CRC32C in csum_data instead of 0.Michael Tuexen2014-03-121-4/+4
| | | | | | | | | | | | | | The virtio driver needs the offset to be stored in csum_data, like in the case for UDP and TCP. The virtio problem was reported by Niu Zhixiong <kaiaixi@gmail.com>, who helped in debugging and testing the patch. MFC after: 3 days Notes: svn path=/head/; revision=263096
* SCTP uses CRC32C and not Adler anymore. While there change the referenceMichael Tuexen2014-03-121-2/+2
| | | | | | | | | | to RFC 4960. This does not change any code, just comments. MFC after: 3 days Notes: svn path=/head/; revision=263094
* Since both netinet/ and netinet6/ call into netipsec/ and netpfil/,Gleb Smirnoff2014-03-122-9/+3
| | | | | | | | | | | | | | | | | | | | | the protocol specific mbuf flags are shared between them. - Move all M_FOO definitions into a single place: netinet/in6.h, to avoid future clashes. - Resolve clash between M_DECRYPTED and M_SKIP_FIREWALL which resulted in a failure of operation of IPSEC and packet filters. Thanks to Nicolas and Georgios for all the hard work on bisecting, testing and finally finding the root of the problem. PR: kern/186755 PR: kern/185876 In collaboration with: Georgios Amanakis <gamanakis gmail.com> In collaboration with: Nicolas DEFFAYET <nicolas-ml deffayet.com> Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=263091
* - Remove rt_metrics_lite and simply put its members into rtentry.Gleb Smirnoff2014-03-058-33/+26
| | | | | | | | | | | | | | | | | | | - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=262763
* Remove ifa_ref()/ifa_free(), which are atomic(9), from ip_output().Gleb Smirnoff2014-03-041-9/+1
| | | | | | | | | | | The ifaddr is already referenced by the rtentry, and we are holding reference on the rtentry throughout the function execution. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=262747
* Remove more constants related to static sysctl nodes. The MAXID constantsJohn Baldwin2014-02-256-20/+6
| | | | | | | | | | | were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run) Notes: svn path=/head/; revision=262489
* Improve logging of send errors, reporting error code and interface.Gleb Smirnoff2014-02-221-38/+33
| | | | | | | | | Reduce code duplication between INET and INET6. Tested by: Lytochkin Boris <lytboris gmail.com> Notes: svn path=/head/; revision=262341
* Remove redundant code and fix a style error.Michael Tuexen2014-02-202-6/+2
| | | | | | | MFC after: 3 days Notes: svn path=/head/; revision=262252
* o Remove at compile time the HASH_ALL code, that was neverGleb Smirnoff2014-02-171-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=262027
* Fixup for r261590 (vnet sysctl handlers cleanup).Mikolaj Golub2014-02-092-11/+2
| | | | | | | Reviewed by: glebius Notes: svn path=/head/; revision=261650
* o Revamp API between flowtable and netinet, netinet6.Gleb Smirnoff2014-02-072-30/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=261601
* Utilize SYSCTL_UMA_CUR() to export usage of syncache andGleb Smirnoff2014-02-072-28/+5
| | | | | | | | | tcp reassembly zones. Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=261594
* Catch up on r261590.Gleb Smirnoff2014-02-071-4/+0
| | | | Notes: svn path=/head/; revision=261592
* Adjust r239672 from rrs and r258821 from eadler.Peter Wemm2014-01-281-32/+13
| | | | | | | | | By definition, the very first FIN is not a duplicate. Process it normally and don't feed it to congestion control as though it were a dupe. Don't prevent CC from seeing later dupe acks while in a half close state. Notes: svn path=/head/; revision=261244
* Decrease lock contention within the TCP accept case by removingGeorge V. Neville-Neil2014-01-282-10/+3
| | | | | | | | | | | | | | the INP_INFO lock from tcp_usr_accept. As the PR/patch states this was following the advice already in the code. See the PR below for a full disucssion of this change and its measured effects. PR: 183659 Submitted by: Julian Charbon Reviewed by: jhb Notes: svn path=/head/; revision=261242