aboutsummaryrefslogtreecommitdiff
path: root/sys/net
Commit message (Collapse)AuthorAgeFilesLines
* ifconfig: Minor documentation fixJose Luis Duran2021-05-031-2/+2
| | | | | | | | | | | | | Fix what appears to have been a small copy/paste typo in ifconfig(8)'s documentation (man page and header file). Not that it matters anymore. Reference: Table I-2 in IEEE Std 802.1Q-2014. PR: 255557 Submitted by: Jose Luis Duran <jlduran@gmail.com> MFC after: 1 week
* iflib: Take iri_pad into account when processing small framesMarcin Wojtas2021-04-301-1/+3
| | | | | | | | | | | | | | | | Drivers can specify padding of received frames with iri_pad field. This can be used to enforce ip alignment by hardware. Iflib ignored that padding when processing small frames, which rendered this feature inoperable. I found it while writing a driver for a NIC that can ip align received packets. Note that this doesn't change behavior of existing drivers as they all set iri_pad to 0. Submitted by: Kornel Duleba <mindal@semihalf.com> Reviewed by: gallatin Obtained from: Semihalf Sponsored by: Alstom Group Differential Revision: https://reviews.freebsd.org/D30009
* [fib algo] Update fib_gen counter under FIB_MOD_LOCK.Alexander V. Chernikov2021-04-281-3/+3
| | | | MFC after: 3 days
* Add rib_walk_from() wrapper for selective rib tree traversal.Alexander V. Chernikov2021-04-282-0/+38
| | | | | | | | | | Provide wrapper for the rnh_walktree_from() rib callback. As currently `struct rib_head` is considered internal to the routing subsystem, this wrapper is necessary to maintain isolation from the external code. Differential Revision: https://reviews.freebsd.org/D29971 MFC after: 1 week
* [fib algo] Delay algo init at fib growth to to allow to reliably use rib KPI.Alexander V. Chernikov2021-04-273-33/+76
| | | | | | | | | | | | | | | | | | | | | Currently, most of the rib(9) KPI does not use rnh pointers, using fibnum and family parameters to determine the rib pointer instead. This works well except for the case when we initialize new rib pointers during fib growth. In that case, there is no mapping between fib/family and the new rib, as an entirely new rib pointer array is populated. Address this by delaying fib algo initialization till after switching to the new pointer array and updating the number of fibs. Set datapath pointer to the dummy function, so the potential callers won't crash the kernel in the brief moment when the rib exists, but no fib algo is attached. This change allows to avoid creating duplicates of existing rib functions, with altered signature. Differential Revision: https://reviews.freebsd.org/D29969 MFC after: 1 week
* [fib algo] always commit static routes synchronously.Alexander V. Chernikov2021-04-271-4/+12
| | | | | | | | | | | | | | | | | | | | | Modular fib lookup framework features logic that allows route update batching for the algorithms that cannot easily apply the routing change without rebuilding. As a result, dataplane lookups may return old data until the the sync takes place. With the default sync timeout of 50ms, it is possible that new binary like ping(8) executed exactly after route(8) will still use the old fib data. To address some aspects of the problem, framework executes all rtable changes without RTF_GATEWAY synchronously. To fix the aforementioned problem, this diff extends sync execution for all RTF_STATIC routes (e.g. ones maintained by route(8). This fixes a bunch of tests in the networking space. Reported by: ci, arichardson MFC after: 2 weeks
* Fix rtsock sockaddr alignment.Alexander V. Chernikov2021-04-271-1/+1
| | | | | | | | b31fbebeb3 introduced alloc_sockaddr_aligned() which, in fact, failed to produce aligned addresses. Reported by: Oskar Holmlund <oskar.holmlund at yahoo.com> MFC after: immediately
* Fix drace CTF for the rib_head.Alexander V. Chernikov2021-04-271-3/+1
| | | | | | | | | | | 33cb3cb2e321 introduced an `rib_head` structure field under the FIB_ALGO define. This may be problematic for the CTF, as some of the files including `route_var.h` do not have `fib_algo` defined. Make dtrace happy by making the field unconditional. Suggested by: markj
* pfsync: Expose PFSYNCF_OK flag to userspaceKristof Provost2021-04-261-0/+2
| | | | | | | | | | | Add 'syncok' field to ifconfig's pfsync interface output. This allows userspace to figure out when pfsync has completed the initial bulk import. Reviewed by: donner MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29948
* pf: Allow multiple labels to be set on a ruleKristof Provost2021-04-261-1/+1
| | | | | | | | | | | | Allow up to 5 labels to be set on each rule. This offers more flexibility in using labels. For example, it replaces the customer 'schedule' keyword used by pfSense to terminate states according to a schedule. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29936
* iflib: Improve mapping of TX/RX queues to CPUsPatrick Kelsey2021-04-261-161/+293
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iflib now supports mapping each (TX,RX) queue pair to the same CPU (default), to separate CPUs, or to a pair of physical and logical CPUs that share the same L2 cache. The mapping mechanism supports unequal numbers of TX and RX queues, with the excess queues always being mapped to consecutive physical CPUs. When the platform cannot distinguish between physical and logical CPUs, all are treated as physical CPUs. See the comment on get_cpuid_for_queue() for the entire matrix. The following device-specific tunables influence the mapping process: dev.<device>.<unit>.iflib.core_offset (existing) dev.<device>.<unit>.iflib.separate_txrx (existing) dev.<device>.<unit>.iflib.use_logical_cores (new) The following new, read-only sysctls provide visibility of the mapping results: dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu When an iflib driver allocates TX softirqs without providing reference RX IRQs, iflib now binds those TX softirqs to CPUs using the above mapping mechanism (that is, treats them as if they were TX IRQs). Previously, such bindings were left up to the grouptaskqueue code and thus fell outside of the iflib CPU mapping strategy. Reviewed by: kbowling Tested by: olivier, pkelsey MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D24094
* Fix NOINET[6],!VIMAGE builds after FIB_ALGO addition to GENERICAlexander V. Chernikov2021-04-211-6/+10
| | | | | Reported by: jbeich PR: 255390
* Fix NOINET[6] build after enabling FIB_ALGO in GENERIC.Alexander V. Chernikov2021-04-211-0/+4
| | | | | Submitted by: jbeich PR: 255389
* [fib algo] Do not print algo attach/detach message on bootAlexander V. Chernikov2021-04-251-3/+5
| | | | MFC after: 1 day
* Make gcc happy by initializing error in rib_handle_ifaddr_info().Alexander V. Chernikov2021-04-251-1/+1
|
* Fix build with gccStefan Eßer2021-04-251-1/+1
| | | | Correctly declare function without arguments as f(void) instead of f().
* [rtsock] Enforce netmask/RTF_HOST consistency.Alexander V. Chernikov2021-04-241-0/+2
| | | | | | | | | | | | | | | | | | Traditionally we had 2 sources of information whether the added/delete route request targets network or a host route: netmask (RTA_NETMASK) and RTF_HOST flag. The former one is tricky: netmask can be empty or can explicitly specify the host netmask. Parsing netmask sockaddr requires per-family parsing and that's what rtsock code traditionally avoided. As a result, consistency was not enforced and it was possible to specify network with the RTF_HOST flag and vice versa. Continue normalization efforts from D29826 and D29826 and ensure that RTF_HOST flag always reflects host/network data from netmask field. Differential Revision: https://reviews.freebsd.org/D29958 MFC after: 2 days
* Re-enable network ioctls in capability modeMark Johnston2021-04-233-16/+2
| | | | | | | | | This reverts a portion of 274579831b61 ("capsicum: Limit socket operations in capability mode") as at least rtsol and dhcpcd rely on being able to configure network interfaces while in capability mode. Reported by: bapt, Greg V Sponsored by: The FreeBSD Foundation
* iflib: initialize LRO unconditionallyAndrew Gallatin2021-04-231-13/+9
| | | | | | | | | | | | | | | | | | | | Changes to the LRO code have exposed a bug in iflib where devices which are not capable of doing LRO are still calling tcp_lro_flush_all(), even when they have not initialized the LRO context. This used to be mostly harmless, but the LRO code now sets the VNET based on the ifp in the lro context and will try to access it through a NULL ifp resulting in a panic at boot. To fix this, we unconditionally initializes LRO so that we have a valid LRO context when calling tcp_lro_flush_all(). One alternative is to check the device capabilities before calling tcp_lro_flush_all() or adding a new state flag in the ctx. However, it seems unwise to add an extra, mostly useless test for higher performance devices when we can just initialize LRO for all devices. Reviewed by: erj, hselasky, markj, olivier Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D29928
* Fix rib generation count for fib algo.Alexander V. Chernikov2021-04-203-5/+29
| | | | | | | | | | | | | | | | | | | | | Currently, PCB caching mechanism relies on the rib generation counter (rnh_gen) to invalidate cached nhops/LLE entries. With certain fib algorithms, it is now possible that the datapath lookup state applies RIB changes with some delay. In that scenario, PCB cache will invalidate on the RIB change, but the new lookup may result in the same nexthop being returned. When fib algo finally gets in sync with the RIB changes, PCB cache will not receive any notification and will end up caching the stale data. To fix this, introduce additional counter, rnh_gen_rib, which is used only when FIB_ALGO is enabled. This counter is incremented by the control plane. Each time when fib algo synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value. Differential Revision: https://reviews.freebsd.org/D29812 Reviewed by: donner MFC after: 2 weeks
* Relax rtsock message restrictions.Alexander V. Chernikov2021-04-201-94/+177
| | | | | | | | | | | | | | | | | | | Address multiple issues with strict rtsock message validation. D28668 "normalisation" approach was based on the assumption that we always have at least "standard" sockaddr len. It turned out to be false - certain older applications like quagga or routed abuse sin[6]_len field and set it to the offset to the first fully-zero bit in the mask. It is impossible to normalise such sockaddrs without reallocation. With that in mind, change the approach to use a distinct memory buffer for the altered sockaddrs. This allows supporting the older software while maintaining the guarantee on the "standard" sockaddrs. PR: 255273,255089 Differential Revision: https://reviews.freebsd.org/D29826 MFC after: 3 days
* Improve error reporting in rtsock.cAlexander V. Chernikov2021-04-191-9/+12
| | | | MFC after: 3 days
* pf: Optionally attempt to preserve rule counter values across ruleset updatesKristof Provost2021-04-191-0/+3
| | | | | | | | | | Usually rule counters are reset to zero on every update of the ruleset. With keepcounters set pf will attempt to find matching rules between old and new rulesets and preserve the rule counters. MFC after: 4 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29780
* pf: PFRULE_REFS should not be user-visibleKristof Provost2021-04-191-0/+1
| | | | | | | | | | Split the PFRULE_REFS flag from the rule_flag field. PFRULE_REFS is a kernel-internal flag and should not be exposed to or read from userspace. MFC after: 4 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29778
* bridgestp: validate timer values in config BPDUJonah Caplan2021-04-191-0/+17
| | | | | | | | | | IEEE Std 802.1D-2004 Section 17.14 defines permitted ranges for timers. Incoming BPDU messages should be checked against the permitted ranges. The rest of 17.14 appears to be enforced already. PR: 254924 Reviewed by: kp, donner Differential Revision: https://reviews.freebsd.org/D29782
* fib algo: do not reallocate datapath index for datapath ptr update.Alexander V. Chernikov2021-04-181-0/+11
| | | | | | | | | | | | | Fib algo uses a per-family array indexed by the fibnum to store lookup function pointers and per-fib data. Each algorithm rebuild currently requires re-allocating this array to support atomic change of two pointers. As in reality most of the changes actually involve changing only data pointer, add a shortcut performing in-flight pointer update. MFC after: 2 weeks
* Fib algo: extend KPI by allowing algo to set datapath pointers.Alexander V. Chernikov2021-04-182-21/+42
| | | | | | | | | | | | | | | | Some algorithms may require updating datapath and control plane algo pointers after the (batched) updates. Export fib_set_datapath_ptr() to allow setting the new datapath function or data pointer from the algo. Add fib_set_algo_ptr() to allow updating algo control plane pointer from the algo. Add fib_epoch_call() epoch(9) wrapper to simplify freeing old datapath state. Reviewed by: zec Differential Revision: https://reviews.freebsd.org/D29799 MFC after: 1 week
* Add batched update support for the fib algo.Alexander V. Chernikov2021-04-142-6/+154
| | | | | | | | | | | | | | | | | | | | | | | | Initial fib algo implementation was build on a very simple set of principles w.r.t updates: 1) algorithm is ether able to apply the change synchronously (DIR24-8) or requires full rebuild (bsearch, lradix). 2) framework falls back to rebuild on every error (memory allocation, nhg limit, other internal algo errors, etc). This changes brings the new "intermediate" concept - batched updates. Algotirhm can indicate that the particular update has to be handled in batched fashion (FLM_BATCH). The framework will write this update and other updates to the temporary buffer instead of pushing them to the algo callback. Depending on the update rate, the framework will batch 50..1024 ms of updates and submit them to a different algo callback. This functionality is handy for the slow-to-rebuild algorithms like DXR. Differential Revision: https://reviews.freebsd.org/D29588 Reviewed by: zec MFC after: 2 weeks
* if_firewire: fixing panic upon packet reception for VNET buildTai-hwa Liang2021-04-131-0/+2
| | | | | | | | netisr_dispatch_src() needs valid VNET pointer or firewire_input() will panic when receiving a packet. Reviewed by: glebius MFC after: 2 weeks
* pf: Implement the NAT source port selection of MAP-E Customer EdgeKurosawa Takahiro2021-04-131-0/+1
| | | | | | | | | | | MAP-E (RFC 7597) requires special care for selecting source ports in NAT operation on the Customer Edge because a part of bits of the port numbers are used by the Border Relay to distinguish another side of the IPv4-over-IPv6 tunnel. PR: 254577 Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D29468
* Fix vlan creation for the older ifconfig(8) binaries.Alexander V. Chernikov2021-04-111-0/+8
| | | | | Reported by: allanjude MFC after: immediately
* Fix direct route installation with net/bird.Alexander V. Chernikov2021-04-101-5/+6
| | | | | | | | Slighly relax the gateway validation rules imposed by the 2fe5a79425c7, by requiring only first 8 bytes (everyhing before sdl_data to be present in the AF_LINK gateway. Reported by: olivier
* Appease -Wsign-compare in radix.cAlexander V. Chernikov2021-04-101-1/+1
| | | | | | Differential Revision: https://reviews.freebsd.org/D29661 Submitted by: zec MFC after 2 weeks
* Allow to specify debugnet fib in sysctl/tunable.Alexander V. Chernikov2021-04-101-1/+5
| | | | | | Differential Revision: https://reviews.freebsd.org/D29593 Reviewed by: donner MFC after: 2 weeks
* pf: Implement nvlist variant of DIOCGETRULEKristof Provost2021-04-101-0/+4
| | | | | | MFC after: 4 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29559
* pf: Introduce nvlist variant of DIOCADDRULEKristof Provost2021-04-101-0/+1
| | | | | | | | | | This will make future extensions of the API much easier. The intent is to remove support for DIOCADDRULE in FreeBSD 14. Reviewed by: markj (previous version), glebius (previous version) MFC after: 4 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29557
* Implement better rebuild-delay fib algo policy.Alexander V. Chernikov2021-04-091-61/+228
| | | | | | | | | | | | | | | | | | | | | The intent is to better handle time intervals with large amount of RIB updates (e.g. BGP peer going up or down), while still keeping low sync delay for the rest scenarios. The implementation is the following: updates are bucketed into the buckets of size 50ms. If the number of updates within a current bucket exceeds the threshold of 500 routes/sec (e.g. 10 updates per bucket interval), the update is delayed for another 50ms. This can be repeated until the maximum update delay (1 sec) is reached. All 3 variables are runtime tunables: * net.route.algo.fib_max_sync_delay_ms: 1000 * net.route.algo.bucket_change_threshold_rate: 500 * net.route.algo.bucket_time_ms: 50 Differential Review: https://reviews.freebsd.org/D29588 MFC after: 2 weeks
* Enforce check for using the return result for ifa?_try_ref().Alexander V. Chernikov2021-04-051-2/+2
| | | | | | Suggested by: hps MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29504
* pf: Remove unused variable rt_listid from struct pf_kruleKristof Provost2021-04-081-1/+0
| | | | | | | Reviewed by: donner MFC after: 4 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29639
* capsicum: Limit socket operations in capability modeMark Johnston2021-04-073-2/+17
| | | | | | | | | | | | | | | | | | Capsicum did not prevent certain privileged networking operations, specifically creation of raw sockets and network configuration ioctls. However, these facilities can be used to circumvent some of the restrictions that capability mode is supposed to enforce. Add capability mode checks to disallow network configuration ioctls and creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET internet sockets. Reviewed by: oshogbo Discussed with: emaste Reported by: manu Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29423
* iflib: add support for netmap offsetsVincenzo Maffione2021-04-051-4/+8
| | | | | | Follow-up change to a6d768d845c173823785c71bb18b40074e7a8998. This change adds iflib support for netmap offsets, enabling applications to use offsets on any driver backed by iflib.
* netmap: restore commit a56e6334d1b7ed6e6faaa8b4612d948005ba74f5Vincenzo Maffione2021-04-021-1/+2
| | | | | The fix in a56e6334d1b7ed6e6faaa8b4612d948005ba74f5 was accidentally reverted by commit 45c67e8f6b56b9744f01142747fadf291fe3fad2.
* netmap: several typo fixesVincenzo Maffione2021-04-022-10/+10
| | | | No functional changes intended.
* vxlan: correct interface MTU when using hw offloadsKonstantin Belousov2021-03-311-2/+15
| | | | | | | | | | | | Otherwise it breaks when offloading like checksum or TSO are used, because second (encapsulated) ip_output() processing passes fragments of the encapsulated packet down to the hardware interface. Diagnosed by: hselasky Reviewed by: np Sponsored by: Nvidia Networking / Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29501
* mbuf: add a way to mark flowid as calculated from the internal headersKonstantin Belousov2021-03-312-2/+4
| | | | | | | | | | | | | | | | | In some settings offload might calculate hash from decapsulated packet. Reserve a bit in packet header rsstype to indicate that. Add m_adj_decap() that acts similarly to m_adj, but also either clear flowid if it is not marked as inner, or transfer it to the decapsulated header, clearing inner indicator. It depends on the internals of m_adj() that reuses the argument packet header for the result. Use m_adj_decap() for decapsulating vxlan(4) and gif(4) input packets. Reviewed by: ae, hselasky, np Sponsored by: Nvidia Networking / Mellanox Technologies MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28773
* Fix typo in the 9fa8d1582b44b4850d40699c9adb104732328b7d.Alexander V. Chernikov2021-03-291-1/+1
| | | | Reported by: cy
* Put bandaid for nhgrp_dump_sysctl() malloc KASSERT().Alexander V. Chernikov2021-03-291-1/+3
| | | | | | | Recent rtsock changes widened epoch and covered nhgrp_dump_sysctl(), resulting in `netstat -4On` triggering with KASSERT. MFC after: 1 day
* Rename variables inside nexhtop group consider_resize() code.Alexander V. Chernikov2021-03-291-20/+20
| | | | | | No functional changes. MFC after: 3 days
* Fix nexhtop group index array scaling.Alexander V. Chernikov2021-03-291-2/+2
| | | | | | | | | The current code has the limit of 127 nexthop groups due to the wrongly-checked bitmask_copy() return value. PR: 254303 Reported by: Aleks <a.ivanov at veesp.com> MFC after: 1 day
* netmap: monitor: add a flag to distinguish packet directionVincenzo Maffione2021-03-291-0/+5
| | | | | | | | The netmap monitor intercepts any TX/RX packets on the monitored port. However, before this change there was no way to tell whether an intercepted packet was being transmitted or received on the monitored port. A TXMON flag in the netmap slot has been added for this purpose.