src - FreeBSD source tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	vlan: fix setting flags on a QinQ interface	Kristof Provost	2023-05-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting vlan flags needlessly takes the exclusive VLAN_XLOCK(). If we have stacked vlan devices (i.e. QinQ) and we set vlan flags (e.g. IFF_PROMISC) we call rtnl_handle_ifevent() to send a notification about the interface. This ends up calling SIOCGIFMEDIA, which requires the VLAN_SLOCK(). Trying to take that one with the VLAN_XLOCK() held deadlocks us. There's no need for the exclusive lock though, as we're only accessing parent/trunk information, not modifying it, so a shared lock is sufficient. While here also add a test case for this issue. Backtrace: shared lock of (sx) vlan_sx @ /usr/src/sys/net/if_vlan.c:2192 while exclusively locked from /usr/src/sys/net/if_vlan.c:2307 panic: excl->share cpuid = 29 time = 1683873033 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015d4ad4b0 vpanic() at vpanic+0x152/frame 0xfffffe015d4ad500 panic() at panic+0x43/frame 0xfffffe015d4ad560 witness_checkorder() at witness_checkorder+0xcb5/frame 0xfffffe015d4ad720 _sx_slock_int() at _sx_slock_int+0x67/frame 0xfffffe015d4ad760 vlan_ioctl() at vlan_ioctl+0xf8/frame 0xfffffe015d4ad7c0 dump_iface() at dump_iface+0x12f/frame 0xfffffe015d4ad840 rtnl_handle_ifevent() at rtnl_handle_ifevent+0xab/frame 0xfffffe015d4ad8c0 if_setflag() at if_setflag+0xf6/frame 0xfffffe015d4ad930 ifpromisc() at ifpromisc+0x2a/frame 0xfffffe015d4ad960 vlan_setflags() at vlan_setflags+0x60/frame 0xfffffe015d4ad990 vlan_ioctl() at vlan_ioctl+0x216/frame 0xfffffe015d4ad9f0 if_setflag() at if_setflag+0xe4/frame 0xfffffe015d4ada60 ifpromisc() at ifpromisc+0x2a/frame 0xfffffe015d4ada90 bridge_ioctl_add() at bridge_ioctl_add+0x499/frame 0xfffffe015d4adb10 bridge_ioctl() at bridge_ioctl+0x328/frame 0xfffffe015d4adbc0 ifioctl() at ifioctl+0x972/frame 0xfffffe015d4adcc0 kern_ioctl() at kern_ioctl+0x1fe/frame 0xfffffe015d4add30 sys_ioctl() at sys_ioctl+0x154/frame 0xfffffe015d4ade00 amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe015d4adf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015d4adf30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x22b0f0ef8d8a, rsp = 0x22b0ec63f2c8, rbp = 0x22b0ec63f380 --- KDB: enter: panic [ thread pid 5715 tid 101132 ] Sponsored by: Rubicon Communications, LLC ("Netgate")
*	netlink: add netlink interfaces to if_clone	Alexander V. Chernikov	2023-04-25	1	-3/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds netlink create/modify/dump interfaces to the `if_clone.c`. The previous attempt with storing the logic inside `netlink/route/iface_drivers.c` did not quite work, as, for example, dumping interface-specific state (like vlan id or vlan parent) required some peeking into the private interfaces. The new interfaces are added in a compatible way - callers don't have to do anything unless they are extended with Netlink. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D39032 MFC after: 1 month
*	net: unify mtu update code	Alexander V. Chernikov	2023-03-06	1	-14/+2
\| \| \| \| \| \|	Subscribers: imp, ae, glebius Differential Revision: https://reviews.freebsd.org/D38893
*	IfAPI: Add if_llsoftc member accessors for TOEDEV	Justin Hibbits	2023-01-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Summary: Keep TOEDEV() macro for backwards compatibility, and add a SETTOEDEV() macro to complement with the new accessors. Sponsored by: Juniper Networks, Inc. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D38199
*	ifnet/API: Move struct ifnet definition to a <net/if_private.h>	Justin Hibbits	2023-01-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Hide the ifnet structure definition, no user serviceable parts inside, it's a netstack implementation detail. Include it temporarily in <net/if_var.h> until all drivers are updated to use the accessors exclusively. Reviewed by: glebius Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38046
*	if_clone: migrate some consumers to the new KPI.	Alexander V. Chernikov	2022-09-22	1	-13/+19
\| \| \| \| \| \| \| \| \|	Convert most of the cloner customers who require custom params to the new if_clone KPI. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D36636 MFC after: 2 weeks
*	if_vlan: avoid hash table thrashing when adding and removing entries	Kristof Provost	2022-07-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vlan_remhash() uses incorrect value for b. When using the default value for VLAN_DEF_HWIDTH (4), the VLAN hash-list table expands from 16 chains to 32 chains as the 129th entry is added. trunk->hwidth becomes 5. Say a few more entries are added and there are now 135 entries. trunk-hwidth will still be 5. If an entry is removed, vlan_remhash() will calculate a value of 32 for b. refcnt will be decremented to 134. The if comparison at line 473 will return true and vlan_growhash() will be called. The VLAN hash-list table will be compressed from 32 chains wide to 16 chains wide. hwidth will become 4. This is an error, and it can be seen when a new VLAN is added. The table will again be expanded. If an entry is then removed, again the table is contracted. If the number of VLANS stays in the range of 128-512, each time an insert follows a remove, the table will expand. Each time a remove follows an insert, the table will be contracted. The fix is simple. The line 473 should test that the number of entries has decreased such that the table should be contracted using what would be the new value of hwidth. line 467 should be: b = 1 << (trunk->hwidth - 1); PR: 265382 Reviewed by: kp MFC after: 2 weeks Sponsored by: NetApp, Inc.
*	if_vlan: allow vlan and vlanproto to be changed	Kristof Provost	2022-07-21	1	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	It's currently not possible to change the vlan ID or vlan protocol (i.e. 802.1q vs. 802.1ad) without de-configuring the interface (i.e. ifconfig vlanX -vlandev). Add a specific flow for this, allowing both the protocol and id (but not parent interface) to be changed without going through the '-vlandev' step. Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D35846
*	vlan(4): Add support for allocating TLS receive tags.	Hans Petter Selasky	2022-06-07	1	-15/+32
\| \| \| \| \| \| \| \| \|	The TLS receive tags are allocated directly from the receiving interface, because mbufs are flowing in the opposite direction and then route change checks are not useful, because they only work for outgoing traffic. Differential revision: https://reviews.freebsd.org/D32356 Sponsored by: NVIDIA Networking
*	[vlan + lagg] add IFNET_EVENT_UPDATE_BAUDRATE event	Andrey V. Elsukov	2022-05-20	1	-0/+35
\| \| \| \| \| \|	use it to update if_baudrate for vlan interfaces created on the LACP lagg. Differential revision: https://reviews.freebsd.org/D33405
*	vlan: ifa is only used under #ifdef INET.	John Baldwin	2022-04-13	1	-0/+4
\|
*	vlan: allow net.link.vlan.mtag_pcp to be set per vnet	Kristof Provost	2022-02-14	1	-2/+3
\| \| \| \| \| \|	The primary reason for this change is to facilitate testing. MFC after: 1 week
*	Add a switch structure for send tags.	John Baldwin	2021-09-14	1	-7/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the type and function pointers for operations on existing send tags (modify, query, next, free) out of 'struct ifnet' and into a new 'struct if_snd_tag_sw'. A pointer to this structure is added to the generic part of send tags and is initialized by m_snd_tag_init() (which now accepts a switch structure as a new argument in place of the type). Previously, device driver ifnet methods switched on the type to call type-specific functions. Now, those type-specific functions are saved in the switch structure and invoked directly. In addition, this more gracefully permits multiple implementations of the same tag within a driver. In particular, NIC TLS for future Chelsio adapters will use a different implementation than the existing NIC TLS support for T6 adapters. Reviewed by: gallatin, hselasky, kib (older version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D31572
*	if_vlan: add the ALTQ support to if_vlan.	Luiz Otavio O Souza	2021-08-25	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \|	Inspired by the iflib implementation, allow ALTQ to be used with if_vlan interfaces. Reviewed by: donner Obtained from: pfsense MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31647
*	vlan: deduplicate bpf_setpcp() and pf_ieee8021q_setpcp()	Kristof Provost	2021-07-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	These two fuctions were identical, so move them into the common vlan_set_pcp() function, exposed in the if_vlan_var.h header. Reviewed by: donner MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31275
*	Retore the vnet before returning an error.	George V. Neville-Neil	2021-06-21	1	-1/+4
\| \| \| \| \| \|	Obtained from: Kanndula, Dheeraj <Dheeraj.Kandula@netapp.com> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30741
*	Fix vlan creation for the older ifconfig(8) binaries.	Alexander V. Chernikov	2021-04-11	1	-0/+8
\| \| \| \| \|	Reported by: allanjude MFC after: immediately
*	Fix subinterface vlan creation.	Alexander V. Chernikov	2021-01-29	1	-24/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	D26436 introduced support for stacked vlans that changed the way vlans are configured. In particular, this change broke setups that have same-number vlans as subinterfaces. Vlan support was initially created assuming "vlanX" semantics. In this paradigm, automatic number assignment supported by cloning (ifconfig vlan create) was a natural fit. When "ifaceX.Y" support was added, allowing to have the same vlan number on multiple devices, cloning code became more complex, as the is no unified "vlan" namespace anymore. Such interfaces got the first spare index from "vlan" cloner. This, in turn, led to the following problem: ifconfig ix0.333 create -> index 1 ifconfig ix0.444 create -> index 2 ifconfig vlan2 create -> allocation failure This change fixes such allocations by using cloning indexes only for "vlanX" interfaces. Reviewed by: hselasky MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D27505
*	Catch up with 6edfd179c86: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG.	Gleb Smirnoff	2021-01-29	1	-2/+2
\| \| \| \| \| \| \| \| \|	Originally IFCAP_NOMAP meant that the mbuf has external storage pointer that points to unmapped address. Then, this was extended to array of such pointers. Then, such mbufs were augmented with header/trailer. Basically, extended mbufs are extended, and set of features is subject to change. The new name should be generic enough to avoid further renaming.
*	This pulls over all the changes that are in the netflix	Randall Stewart	2021-01-28	1	-0/+30
\| \| \| \| \| \| \| \| \|	tree that fix the ratelimit code. There were several bugs in tcp_ratelimit itself and we needed further work to support the multiple tag format coming for the joint TLS and Ratelimit dances. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D28357
*	Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc().	John Baldwin	2020-10-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	This gives a more uniform API for send tag life cycle management. Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27000 Notes: svn path=/head/; revision=367151
*	Support hardware rate limiting (pacing) with TLS offload.	John Baldwin	2020-10-29	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Add a new send tag type for a send tag that supports both rate limiting (packet pacing) and TLS offload (mostly similar to D22669 but adds a separate structure when allocating the new tag type). - When allocating a send tag for TLS offload, check to see if the connection already has a pacing rate. If so, allocate a tag that supports both rate limiting and TLS offload rather than a plain TLS offload tag. - When setting an initial rate on an existing ifnet KTLS connection, set the rate in the TCP control block inp and then reset the TLS send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit send tag. This allocates the TLS send tag asynchronously from a task queue, so the TLS rate limit tag alloc is always sleepable. - When modifying a rate on a connection using KTLS, look for a TLS send tag. If the send tag is only a plain TLS send tag, assume we failed to allocate a TLS ratelimit tag (either during the TCP_TXTLS_ENABLE socket option, or during the send tag reset triggered by ktls_output_eagain) and ignore the new rate. If the send tag is a ratelimit TLS send tag, change the rate on the TLS tag and leave the inp tag alone. - Lock the inp lock when setting sb_tls_info for a socket send buffer so that the routines in tcp_ratelimit can safely dereference the pointer without needing to grab the socket buffer lock. - Add an IFCAP_TXTLS_RTLMT capability flag and associated administrative controls in ifconfig(8). TLS rate limit tags are only allocated if this capability is enabled. Note that TLS offload (whether unlimited or rate limited) always requires IFCAP_TXTLS[46]. Reviewed by: gallatin, hselasky Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26691 Notes: svn path=/head/; revision=367123
*	Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q).	Alexander V. Chernikov	2020-10-21	1	-47/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	802.1ad interfaces are created with ifconfig using the "vlanproto" parameter. Eg., the following creates a 802.1Q VLAN (id #42) over a 802.1ad S-VLAN (id #5) over a physical Ethernet interface (em0). ifconfig vlan5 create vlandev em0 vlan 5 vlanproto 802.1ad up ifconfig vlan42 create vlandev vlan5 vlan 42 inet 10.5.42.1/24 VLAN_MTU, VLAN_HWCSUM and VLAN_TSO capabilities should be properly supported. VLAN_HWTAGGING is only partially supported, as there is currently no IFCAP_VLAN_* denoting the possibility to set the VLAN EtherType to anything else than 0x8100 (802.1ad uses 0x88A8). Submitted by: Olivier Piras Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D26436 Notes: svn path=/head/; revision=366917
*	Store the send tag type in the common send tag header.	John Baldwin	2020-10-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with their own identical headers that stored the type that the type-specific tag structures inherited from, so in practice it seems drivers need this in the tag anyway. This permits removing these extra header indirections (struct cxgbe_snd_tag and struct mlx5e_snd_tag). In addition, this permits driver-independent code to query the type of a tag, e.g. to know what type of tag is being queried via if_snd_query. Reviewed by: gallatin, hselasky, np, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26689 Notes: svn path=/head/; revision=366491
*	net: clean up empty lines in .c and .h files	Mateusz Guzik	2020-09-01	1	-4/+3
\| \| \| \|	Notes: svn path=/head/; revision=365071
*	vlan: Fix panic when vnet jail with a vlan interface is destroyed	Kristof Provost	2020-01-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During vnet cleanup vnet_if_uninit() checks that no more interfaces remain in the vnet. Any interface borrowed from another vnet is returned by vnet_if_return(). Other interfaces (i.e. cloned interfaces) should have been destroyed by their cloner at this point. The if_vlan VNET_SYSUNINIT had priority SI_ORDER_FIRST, which means it had equal priority as vnet_if_uninit(). In other words: it was possible for it to be called after vnet_if_uninit(), which would lead to assertion failures. Set the priority to SI_ORDER_ANY, like other cloners to ensure that vlan interfaces are destroyed before we enter vnet_if_uninit(). The sys/net/if_vlan test provoked this. Notes: svn path=/head/; revision=357356
*	Plug parent iface refcount leak on <ifname>.X vlan creation.	Alexander V. Chernikov	2020-01-29	1	-1/+5
\| \| \| \| \| \| \| \| \|	PR: kern/242270 Submitted by: Andrew Boyer <aboyer at pensando.io> MFC after: 2 weeks Notes: svn path=/head/; revision=357263
*	Update route MTUs for bridge, lagg and vlan interfaces.	Alexander Motin	2020-01-22	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \|	Those interfaces may implicitly change their MTU on addition of parent interface in addition to normal SIOCSIFMTU ioctl path, where the route MTUs are updated normally. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=356993
*	Introduce NET_EPOCH_CALL() macro and use it everywhere where we free	Gleb Smirnoff	2020-01-15	1	-2/+2
\| \| \| \| \| \| \| \|	data based on the network epoch. The macro reverses the argument order of epoch_call(9) - first function, then its argument. NFC Notes: svn path=/head/; revision=356755
*	Enqueue lladdr_task to update link level address of vlan, when its parent	Andrey V. Elsukov	2019-11-07	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	interface has changed. During vlan reconfiguration without destroying interface, it is possible, that parent interface will be changed. This usually means, that link layer address of vlan will be different. Therefore we need to update all associated with vlan's addresses permanent llentries - NDP for IPv6 addresses, and ARP for IPv4 addresses. This is done via lladdr_task execution. To avoid extra work, before execution do the check, that L2 address is different. No objection from: #network Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D22243 Notes: svn path=/head/; revision=354443
*	Revert two parts of r353292 that enter epoch when processing vlan capabilities.	Gleb Smirnoff	2019-10-17	1	-3/+9
\| \| \| \| \| \| \| \| \| \|	It could be that entering epoch isn't necessary here, but better take a conservative approach. Submitted by: kp Notes: svn path=/head/; revision=353695
*	vlan_config() isn't always called in epoch context.	Gleb Smirnoff	2019-10-13	1	-5/+9
\| \| \| \| \| \| \|	Reported by: kp Notes: svn path=/head/; revision=353467
*	Widen NET_EPOCH coverage.	Gleb Smirnoff	2019-10-07	1	-51/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111 Notes: svn path=/head/; revision=353292
*	style(9): remove extraneous empty lines	Gleb Smirnoff	2019-09-25	1	-1/+0
\| \| \| \|	Notes: svn path=/head/; revision=352725
*	Wrap a vlan's parent's if_output in a separate function.	Matt Joras	2019-08-30	1	-1/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a vlan interface is created, its if_output is set directly to the parent interface's if_output. This is fine in the normal case but has an unfortunate consequence if you end up with a certain combination of vlan and lagg interfaces. Consider you have a lagg interface with a single laggport member. When an interface is added to a lagg its if_output is set to lagg_port_output, which blackholes traffic from the normal networking stack but not certain frames from BPF (pseudo_AF_HDRCMPLT). If you now create a vlan with the laggport member (not the lagg interface) as its parent, its if_output is set to lagg_port_output as well. While this is confusing conceptually and likely represents a misconfigured system, it is not itself a problem. The problem arises when you then remove the lagg interface. Doing this resets the if_output of the laggport member back to its original state, but the vlan's if_output is left pointing to lagg_port_output. This gives rise to the possibility that the system will panic when e.g. bpf is used to send any frames on the vlan interface. Fix this by creating a new function, vlan_output, which simply wraps the parent's current if_output. That way when the parent's if_output is restored there is no stale usage of lagg_port_output. Reviewed by: rstone Differential Revision: D21209 Notes: svn path=/head/; revision=351629
*	Add kernel-side support for in-kernel TLS.	John Baldwin	2019-08-27	1	-5/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277 Notes: svn path=/head/; revision=351522
*	Support IFCAP_NOMAP in vlan(4).	John Baldwin	2019-06-29	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Enable IFCAP_NOMAP for a vlan interface if it is supported by the underlying trunk device. Reviewed by: gallatin, hselasky, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616 Notes: svn path=/head/; revision=349532
*	Restructure mbuf send tags to provide stronger guarantees.	John Baldwin	2019-05-24	1	-8/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117 Notes: svn path=/head/; revision=348254
*	This commit adds the missing release mechanism for the	Randall Stewart	2019-02-13	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	ratelimiting code. The two modules (lagg and vlan) did have allocation routines, and even though they are indirect (and vector down to the underlying interfaces) they both need to have a free routine (that also vectors down to the actual interface). Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D19032 Notes: svn path=/head/; revision=344099
*	Bring the comment up to date.	Gleb Smirnoff	2019-01-10	1	-1/+1
\| \| \| \|	Notes: svn path=/head/; revision=342906
*	Stop setting if_linkmib in vlan(4) ifnets.	Mark Johnston	2019-01-09	1	-21/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several reasons: - The structure being exported via IFDATA_LINKSPECIFIC doesn't appear to be a standard MIB. - The structure being exported is private to the kernel and always has been. - No other drivers in common use set the if_linkmib field. - Because IFDATA_LINKSPECIFIC can be used to overwrite the linkmib structure, a privileged user could use it to corrupt internal vlan(4) state. [1] PR: 219472 Reported by: CTurt <ecturt@gmail.com> [1] Reviewed by: kp (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18779 Notes: svn path=/head/; revision=342887
*	Mechanical cleanup of epoch(9) usage in network stack.	Gleb Smirnoff	2019-01-09	1	-38/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin Notes: svn path=/head/; revision=342872
*	Unbreak kernel build with VLAN_ARRAY defined.	Oleg Bulyzhin	2018-11-21	1	-3/+3
\| \| \| \| \| \| \|	MFC after: 1 week Notes: svn path=/head/; revision=340724
*	vlan: Fix panic with lagg and vlan	Kristof Provost	2018-10-21	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	vlan_lladdr_fn() is called from taskqueue, which means there's no vnet context set. We can end up trying to send ARP messages (through the iflladdr_event event), which requires a vnet context. PR: 227654 MFC after: 3 days Notes: svn path=/head/; revision=339547
*	Fix deadlock when destroying VLANs.	Hans Petter Selasky	2018-10-15	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Synchronizing the epoch before freeing the multicast addresses while holding the VLAN_XLOCK() might lead to a deadlock. Use deferred freeing of the VLAN multicast addresses to resolve deadlock. Backtrace: Thread1: epoch_block_handler_preempt() ck_epoch_synchronize_wait() epoch_wait_preempt() vlan_setmulti() vlan_ioctl() in6m_release_task() gtaskqueue_run_locked() gtaskqueue_thread_loop() fork_exit() fork_trampoline() Thread2: sleepq_switch() sleepq_wait() _sx_xlock_hard() _sx_xlock() in6_leavegroup() in6_purgeaddr() if_purgeaddrs() if_detach_internal() if_detach() vlan_clone_destroy() if_clone_destroyif() if_clone_destroy() ifioctl() kern_ioctl() sys_ioctl() amd64_syscall() fast_syscall_common() syscall() Differential revision: https://reviews.freebsd.org/D17496 Reviewed by: slavash, mmacy Approved by: re (kib) Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=339358
*	fix vlan locking to permit sx acquisition in ioctl calls	Matt Macy	2018-09-21	1	-143/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- update vlan(9) to handle changes earlier this year in multicast locking Tested by: np@, darkfiberu at gmail.com PR: 230510 Reviewed by: mjoras@, shurd@, sbruno@ Approved by: re (gjb@) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16808 Notes: svn path=/head/; revision=338850
*	if_vlan(4): A VLAN always has a PCP and its ifnet's if_pcp should be set	Navdeep Parhar	2018-08-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	to the PCP value in use instead of IFNET_PCP_NONE. MFC after: 1 week Sponsored by: Chelsio Communications Notes: svn path=/head/; revision=337943
*	Add the ability to look up the 3b PCP of a VLAN interface. Use it in	Navdeep Parhar	2018-08-16	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \|	toe_l2_resolve to fill up the complete vtag and not just the vid. Reviewed by: kib@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16752 Notes: svn path=/head/; revision=337932
*	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET	Andrew Turner	2018-07-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147 Notes: svn path=/head/; revision=336676
*	ifnet: Replace if_addr_lock rwlock with epoch + mutex	Matt Macy	2018-05-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366 Notes: svn path=/head/; revision=333813