aboutsummaryrefslogtreecommitdiff
path: root/sys/net
Commit message (Collapse)AuthorAgeFilesLines
* vnet: add CURVNET_ASSERT_SET for !VIMAGEMateusz Guzik2022-02-191-0/+1
| | | | | Reported by: ler Sponsored by: Rubicon Communications, LLC ("Netgate")
* vnet: add CURVNET_ASSERT_SETMateusz Guzik2022-02-191-0/+4
| | | | | | Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34312
* if_epair: Use ANSI C definitionLi-Wen Hsu2022-02-151-2/+2
| | | | | | This fixes -Werror=strict-prototypes from gcc9 Sponsored by: The FreeBSD Foundation
* if_epair: implement fanoutKristof Provost2022-02-151-123/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow multiple cores to be used to process if_epair traffic. We do this (if RSS is enabled) based on the RSS hash of the incoming packet. This allows us to distribute the load over multiple cores, rather than sending everything to the same one. We also switch from swi_sched() to taskqueues, which also contributes to better throughput. Benchmark results: With net.isr.maxthreads=-1 Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1) Before 627 Kpps After (no RSS) 1.198 Mpps After (RSS) 3.148 Mpps Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1) Before 7.705 Kpps After (no RSS) 1.017 Mpps After (RSS) 2.083 Mpps MFC after: 3 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D33731
* vlan: allow net.link.vlan.mtag_pcp to be set per vnetKristof Provost2022-02-142-6/+8
| | | | | | The primary reason for this change is to facilitate testing. MFC after: 1 week
* if_vxlan(4): Allow netmap_generic to intercept RX packets.Aleksandr Fedorov2022-02-061-2/+7
| | | | | | | | | | | | | | | Netmap (generic) intercepts the if_input method to handle RX packets. Call ifp->if_input() instead of netisr_dispatch(). Add stricter check for incoming packet length. This change is very useful with bhyve + vale + if_vxlan. Reviewed by: vmaffione (mentor), kib, np, donner Approved by: vmaffione (mentor), kib, np, donner MFC after: 2 weeks Sponsored by: vstack.com Differential Revision: https://reviews.freebsd.org/D30638
* pflog: align header to 4 bytes, not 8Kristof Provost2022-02-011-2/+3
| | | | | | | | | | 6d4baa0d01 incorrectly rounded the lenght of the pflog header up to 8 bytes, rather than 4. PR: 261566 Reported by: Guy Harris <gharris@sonic.net> MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")
* pf: Initialize pf_kpool mutexes earlierMark Johnston2022-01-311-2/+3
| | | | | | | | | | | | | | | | | | | | | There are some error paths in ioctl handlers that will call pf_krule_free() before the rule's rpool.mtx field is initialized, causing a panic with INVARIANTS enabled. Fix the problem by introducing pf_krule_alloc() and initializing the mutex there. This does mean that the rule->krule and pool->kpool conversion functions need to stop zeroing the input structure, but I don't see a nicer way to handle this except perhaps by guarding the mtx_destroy() with a mtx_initialized() check. Constify some related functions while here and add a regression test based on a syzkaller reproducer. Reported by: syzbot+77cd12872691d219c158@syzkaller.appspotmail.com Reviewed by: kp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34115
* ifnet: garbage collect unused function ifaddr_byindex().Gleb Smirnoff2022-01-282-22/+1
| | | | Last use was removed in 5adea417d49.
* rtsock: always set m_pkthdr.rcvif when queueing on netisrGleb Smirnoff2022-01-271-2/+0
| | | | | | | | | | netisr uses global workstreams and after dequeueing an mbuf it uses rcvif to get the VNET of the mbuf. Of course, this is not needed when kernel is compiled without VIMAGE. It came out that routing socket does not set rcvif if compiled without VIMAGE. Make this assignment not depending on VIMAGE option. Fixes: 6871de9363e5
* netisr: serialize/restore m_pkthdr.rcvif when queueing mbufsGleb Smirnoff2022-01-271-4/+14
| | | | | Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268
* ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvifGleb Smirnoff2022-01-272-14/+44
| | | | | | | | | Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show e6abef09187a
* ifnet: make if_index globalGleb Smirnoff2022-01-271-130/+86
| | | | | | | | | | | | | | | Now that ifindex is static to if.c we can unvirtualize it. For lifetime of an ifnet its index never changes. To avoid leaking foreign interfaces the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI filter their returned value on curvnet. Since if_vmove() no longer changes the if_index, inline ifindex_alloc() and ifindex_free() into if_alloc() and if_free() respectively. API wise the only change is that now minimum interface index can be greater than 1. The holes in interface indexes were always allowed. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33672
* Add definitions for TLS receive tags using the existing send tag infrastructure.Hans Petter Selasky2022-01-261-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | Although send tags are strictly used for transmit, the name might be changed in the future to be more generic. The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the network driver itself when creating the TLS RX decryption offload filter. TLS receive tags have a modify callback to tell the network driver about the progress of decryption. Currently decryption is done IP packet by IP packet, even if the IP packet contains a partial TLS record. The modify callback allows the network driver to keep track of TCP sequence numbers pointing to the beginning of TLS records after TCP packet reassembly. These callbacks only happen when encrypted or partially decrypted data is received and are used to verify the decryptions starting point for the hardware. Typically the hardware will guess where TLS headers start and needs help from the software to know if the guess was correct. This is the purpose of the modify callback. Differential Revision: https://reviews.freebsd.org/D32356 Discussed with: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking
* if_clone: correctly destroy a clone from a different vnetGleb Smirnoff2022-01-251-28/+2
| | | | | | | | | | | | | | | | | | | | Try to live with cruel reality fact - if_vmove doesn't move an interface from previous vnet cloning infrastructure to the new one. Let's admit this as design feature and make it work better. * Delete two blocks of code that would fallback to vnet0, if a cloner isn't found. They didn't do any good job and also whole idea of treating vnet0 as special one is wrong. * When deleting a cloned interface, lookup its cloner using it's home vnet. With this change simple sequence works correctly: ifconfig foo0 create jail -c name=jj persist vnet vnet.interface=foo0 jexec jj ifconfig foo0 destroy Differential revision: https://reviews.freebsd.org/D33942
* if_vmove: improve restoration in cloner's ifgroup membershipGleb Smirnoff2022-01-253-47/+37
| | | | | | | | | * Do a single call into if_clone.c instead of two. The cloner can't disappear since the interface sits on its list. * Make restoration smarter - check that cloner with same name exists in the new vnet. Differential revision: https://reviews.freebsd.org/D33941
* iflib: Allow drivers to determine which queue to TX onEric Joyner2022-01-252-5/+19
| | | | | | | | | | | | | | | | | | | | Adds a new function pointer to struct if_txrx in order to allow drivers to set their own function that will determine which queue a packet should be sent on. Since this includes a kernel ABI change, bump the __FreeBSD_version as well. (This motivation behind this is to allow the driver to examine the UP in the VLAN tag and determine which queue to TX on based on that, in support of HW TX traffic shaping.) Signed-off-by: Eric Joyner <erj@FreeBSD.org> Reviewed by: kbowling@, stallamr@netapp.com Tested by: jeffrey.e.pieper@intel.com Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D31485
* netmap: fix LOR in iflib_netmap_registerVincenzo Maffione2022-01-141-0/+3
| | | | | | | | | | | In iflib_device_register(), the CTX_LOCK is acquired first and then IFNET_WLOCK is acquired by ether_ifattach(). However, in netmap_hw_reg() we do the opposite: IFNET_RLOCK is acquired first, and then CTX_LOCK is acquired by iflib_netmap_register(). Fix this LOR issue by wrapping the CTX_LOCK/UNLOCK calls in iflib_device_register with an additional IFNET_WLOCK. This is safe since the IFNET_WLOCK is recursive. MFC after: 1 month
* pf: protect the rpool from racesKristof Provost2022-01-141-0/+1
| | | | | | | | | | | | | | | | | | | | The roundrobin pool stores its state in the rule, which could potentially lead to invalid addresses being returned. For example, thread A just executed PF_AINC(&rpool->counter) and immediately afterwards thread B executes PF_ACPY(naddr, &rpool->counter) (i.e. after the pf_match_addr() check of rpool->counter). Lock the rpool with its own mutex to prevent these races. The performance impact of this is expected to be low, as each rule has its own lock, and the lock is also only relevant when state is being created (so only for the initial packets of a connection, not for all traffic). See also: https://redmine.pfsense.org/issues/12660 Reviewed by: glebius MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33874
* Revert "iflib: Relax timer period from 0.5 to 0.5-0.75s."Alexander Motin2022-01-101-27/+29
| | | | | | | | | I've noticed relations between iflib_timer() vs ixl_admin_timer(). Both scheduled at the same 2Hz rate, but the second is rescheduling the first each time, so if the first get any slower, it won't be executed at all. Revert this until deeper investigation. This reverts commit 90bc1cf65778aafb1f226c8fe08218cfed5e40b2.
* iflib: Relax timer period from 0.5 to 0.5-0.75s.Alexander Motin2022-01-101-29/+27
| | | | | | While there switch it from hardclock ticks to milliseconds. MFC after: 2 weeks
* Fix ifa refcount leak in ifa_ifwithnet()Ryan Stone2022-01-061-3/+9
| | | | | | | | | | | | | | | | | | | | | | | In 4f6c66cc9c75c8, ifa_ifwithnet() was changed to no longer ifa_ref() the returned ifaddr, and instead the caller was required to stay in the net_epoch for as long as they wanted the ifaddr to remain valid. However, this missed the case where an AF_LINK lookup would call ifaddr_byindex(), which still does ifa_ref() the ifaddr. This would cause a refcount leak. Fix this by inlining the relevant parts of ifaddr_byindex() here, with the ifa_ref() call removed. This also avoids an unnecessary entry and exit from the net_epoch for this case. I've audited all in-tree consumers of ifa_ifwithnet() that could possibly perform an AF_LINK lookup and confirmed that none of them will expect the ifaddr to have a reference that they need to release. MFC after: 2 months Sponsored by: Dell Inc Differential Revision: https://reviews.freebsd.org/D28705 Reviewed by: melifaro
* Fix kernel build without INET and INET6Ed Maste2022-01-052-0/+4
| | | | | | Reviewed by: brooks, melifaro Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33718
* domains: make domain_init() initialize only global stateGleb Smirnoff2022-01-031-1/+1
| | | | | | | Now that each module handles its global and VNET initialization itself, there is no VNET related stuff left to do in domain_init(). Differential revision: https://reviews.freebsd.org/D33541
* protocols: init with standard SYSINIT(9) or VNET_SYSINITGleb Smirnoff2022-01-033-4/+3
| | | | | | | | | | | | | | | The historical BSD network stack loop that rolls over domains and over protocols has no advantages over more modern SYSINIT(9). While doing the sweep, split global and per-VNET initializers. Getting rid of pr_init allows to achieve several things: o Get rid of ifdef's that protect against double foo_init() when both INET and INET6 are compiled in. o Isolate initializers statically to the module they init. o Makes code easier to understand and maintain. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33537
* Fix kernel build without INET6Ed Maste2021-12-301-0/+4
| | | | | | Reported by: Gary Jennejohn Fixes: ff3a85d32411 ("[lltable] Add per-family lltable ...") Sponsored by: The FreeBSD Foundation
* Make CPU_SET macros compliant with other implementationsStefan Eßer2021-12-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The introduction of <sched.h> improved compatibility with some 3rd party software, but caused the configure scripts of some ports to assume that they were run in a GLIBC compatible environment. Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being added to ports, but there still were compatibility issues due to invalid assumptions made in autoconfigure scripts. The differences between the FreeBSD version of macros like CPU_AND, CPU_OR, etc. and the GLIBC versions was in the number of arguments: FreeBSD used a 2-address scheme (one source argument is also used as the destination of the operation), while GLIBC uses a 3-adderess scheme (2 source operands and a separately passed destination). The GLIBC scheme provides a super-set of the functionality of the FreeBSD macros, since it does not prevent passing the same variable as source and destination arguments. In code that wanted to preserve both source arguments, the FreeBSD macros required a temporary copy of one of the source arguments. This patch set allows to unconditionally provide functions and macros expected by 3rd party software written for GLIBC based systems, but breaks builds of externally maintained sources that use any of the following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR. One contributed driver (contrib/ofed/libmlx5) has been patched to support both the old and the new CPU_OR signatures. If this commit is merged to -STABLE, the version test will have to be extended to cover more ranges. Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do no longer require that option. The FreeBSD version has been bumped to 1400046 to reflect this incompatible change. Reviewed by: kib MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D33451
* routing: Add unified level-based logging support for the routing subsystem.Alexander V. Chernikov2021-12-296-51/+292
| | | | | Summary: MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33664
* nhops: split nh_family into nh_upper_family and nh_neigh_family.Alexander V. Chernikov2021-12-293-13/+55
| | | | | | | | | | | With IPv4 over IPv6 nexthops and IP->MPLS support, there is a need to distingush "upper" e.g. traffic family and "neighbor" e.g. LLE/gateway address family. Store them explicitly in the private part of the nexthop data. While here, store nhop fibnum in nhop_prip datastructure to make it self-contained. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33663
* [lltable] Add per-family lltable getters.Alexander V. Chernikov2021-12-292-8/+22
| | | | | | | | | | Introduce a new function, lltable_get(), to retrieve lltable pointer for the specified interface and family. Use it to avoid all-iftable list traversal when adding or deleting ARP/ND records. Differential Revision: https://reviews.freebsd.org/D33660 MFC after: 2 weeks
* net: iflib: sync isc_capenable to if_capenableVincenzo Maffione2021-12-281-0/+1
| | | | | | | | | | | | | | | On SIOCSIFCAP, some bits in ifp->if_capenable may be toggled. When this happens, apply the same change to isc_capenable, which is the iflib private copy of if_capenable (for a subset of the IFCAP_* bits). In this way the iflib drivers can check the bits using isc_capenable rather than if_capenable. This is convenient because the latter access requires an additional indirection through the ifp, and it is also less likely to be in cache. PR: 260068 Reviewed by: kbowling, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33156
* pf: make if_pfsync.h self-containedKristof Provost2021-12-171-0/+6
| | | | | | Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33504
* pf: make if_pflog.h self-containedKristof Provost2021-12-171-0/+3
| | | | | | Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33503
* net: make if_bridgevar.h self-containedKristof Provost2021-12-171-0/+4
| | | | | | Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33502
* net: make ethernet.h self-containedKristof Provost2021-12-171-0/+2
| | | | | | Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33501
* pf: make pfvar.h self-containedKristof Provost2021-12-171-0/+1
| | | | | | | | | Ensure that the pfvar.h header can be included without including any other headers. Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33499
* if_stf: make if_stf.h self-containedKristof Provost2021-12-171-0/+2
| | | | | | | | | Ensure that the if_stf.h header can be included without including any other headers. Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33498
* Create wrapper for Giant taken for newbusWarner Losh2021-12-101-6/+6
| | | | | | | | | | | Create a wrapper for newbus to take giant and for busses to take it too. bus_topo_lock() should be called before interacting with newbus routines and unlocked with bus_topo_unlock(). If you need the topology lock for some reason, bus_topo_mtx() will provide that. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D31831
* net/if.c: plug set-but-not-unused varsMateusz Guzik2021-12-091-3/+4
| | | | Sponsored by: Rubicon Communications, LLC ("Netgate")
* ifnet: make V_if_index static to if.cGleb Smirnoff2021-12-063-16/+13
| | | | | | | | This requires moving net.link.generic sysctl declaration from if_mib.c to if.c. Ideally if_mib.c needs just to be merged to if.c, but they have different license texts. Differential revision: https://reviews.freebsd.org/D33263
* ifnet_byindex() actually requires network epochGleb Smirnoff2021-12-062-15/+11
| | | | | | | | | | | | | Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment. Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263
* ifnet: merge ifindex_alloc(), ifnet_setbyindex(), if_grow() and call magicGleb Smirnoff2021-12-061-74/+21
| | | | | | | | | Now it is possible to just merge all this complexity into single linear function. Note that IFNET_WLOCK() is a sleepable lock, so we can M_WAITOK and epoch_wait_preempt(). Reviewed by: melifaro, bz, kp Differential revision: https://reviews.freebsd.org/D33262
* ifnet: initial if_grow() shall always succeedGleb Smirnoff2021-12-061-6/+2
| | | | | | | | So let's just call malloc() directly. This also avoids hidden doubling of default V_if_indexlim. Reviewed by: melifaro, bz, kp Differential revision: https://reviews.freebsd.org/D33261
* ifnet: use ck_pr(3) store & load setting ifnet pointer in ifindexGleb Smirnoff2021-12-061-3/+3
| | | | | | | The lockless access to the array is protected by the network epoch. Reviewed by: bz, kp Differential revision: https://reviews.freebsd.org/D33260
* ifnet: allocate index at the end of if_alloc_domain()Gleb Smirnoff2021-12-061-23/+15
| | | | | | | | | Now that if_alloc_domain() never fails and actually doesn't expose ifnet to outside we can eliminate IFNET_HOLD and two step index allocation. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33259
* nhop: hash ifnet pointer instead of if_indexGleb Smirnoff2021-12-041-18/+12
| | | | | | | | | | | | | | Yet another problem created by VIMAGE/if_vmove/epair design that relocates ifnet between vnets and changes if_index. Since if_index changes, nhop hash values also changes, unlink_nhop() isn't able to find entry in hash and leaks the nhop. Since nhop references ifnet, the latter is also leaked. As result running network tests leaks memory on every single test that creates vnet jail. While here, rewrite whole hash_priv() to use static initializer, per Alexander's suggestion. Reviewed by: melifaro
* if_pflog: fix packet lengthKristof Provost2021-12-041-2/+6
| | | | | | | | | | | | | | | | | | | | | There were two issues with the new pflog packet length. The first is that the length is expected to be a multiple of sizeof(long), but we'd assumed it had to be a multiple of sizeof(uint32_t). The second is that there's some broken software out there (such as Wireshark) that makes incorrect assumptions about the amount of padding. That is, Wireshark assumes there's always three bytes of padding, rather than however much is needed to get to a multiple of sizeof(long). Fix this by adding extra padding, and a fake field to maintain Wireshark's assumption. Reported by: Ozkan KIRIK <ozkan.kirik@gmail.com> Tested by: Ozkan KIRIK <ozkan.kirik@gmail.com> MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33236
* Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"Cy Schubert2021-12-022-6/+2
| | | | | | | | This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.
* wpa: Import wpa_supplicant/hostapd commit 14ab4a816Cy Schubert2021-12-022-2/+6
| | | | | | This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month
* ifnet: enable & fix if_debug buildGleb Smirnoff2021-12-021-1/+2
| | | | Fixes: ce40632a316c5