aboutsummaryrefslogtreecommitdiff
path: root/sys/netinet/raw_ip.c
Commit message (Collapse)AuthorAgeFilesLines
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Reduce in_pcbinfo_init() by two params. No users supply any flags to thisGleb Smirnoff2017-05-151-1/+1
| | | | | | | | | | | | function (they used to say UMA_ZONE_NOFREE), so flag parameter goes away. The zone_fini parameter also goes away. Previously no protocols (except divert) supplied zone_fini function, so inpcb locks were leaked with slabs. This was okay while zones were allocated with UMA_ZONE_NOFREE flag, but now this is a leak. Fix that by suppling inpcb_fini() function as fini method for all inpcb zones. Notes: svn path=/head/; revision=318321
* Hide struct inpcb, struct tcpcb from the userland.Gleb Smirnoff2017-03-211-6/+1
| | | | | | | | | | | | | | | | | | | | | | | This is a painful change, but it is needed. On the one hand, we avoid modifying them, and this slows down some ideas, on the other hand we still eventually modify them and tools like netstat(1) never work on next version of FreeBSD. We maintain a ton of spares in them, and we already got some ifdef hell at the end of tcpcb. Details: - Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO. - Make struct xinpcb, struct xtcpcb pure API structures, not including kernel structures inpcb and tcpcb inside. Export into these structures the fields from inpcb and tcpcb that are known to be used, and put there a ton of spare space. - Make kernel and userland utilities compilable after these changes. - Bump __FreeBSD_version. Reviewed by: rrs, gnn Differential Revision: D10018 Notes: svn path=/head/; revision=315662
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Merge projects/ipsec into head/.Andrey V. Elsukov2017-02-061-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352 Notes: svn path=/head/; revision=313330
* Ensure that the buffer length and the length provided in the IPv4Michael Tuexen2017-01-131-1/+1
| | | | | | | | | | | | | | | | | header match when using a raw socket to send IPv4 packets and providing the header. If they don't match, let send return -1 and set errno to EINVAL. Before this patch is was only enforced that the length in the header is not larger then the buffer length. PR: 212283 Reviewed by: ae, gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9161 Notes: svn path=/head/; revision=312063
* Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead.Kevin Lo2016-09-151-2/+2
| | | | | | | | Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878 Notes: svn path=/head/; revision=305824
* The pr_destroy field does not allow us to run the teardown code in aBjoern A. Zeeb2016-06-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | specific order. VNET_SYSUNINITs however are doing exactly that. Thus remove the VIMAGE conditional field from the domain(9) protosw structure and replace it with VNET_SYSUNINITs. This also allows us to change some order and to make the teardown functions file local static. Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use internally. Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g., for pfil consumers (firewalls), partially for this commit and for others to come. Reviewed by: gnn, tuexen (sctp), jhb (kernel.h) Obtained from: projects/vnet MFC after: 2 weeks X-MFC: do not remove pr_destroy Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6652 Notes: svn path=/head/; revision=301114
* Send an ICMP packet indicating destination unreachable/protocolMichael Tuexen2016-05-251-1/+4
| | | | | | | | | | unreachable if we don't handle the packet in the kernel and not in userspace. MFC after: 1 week Notes: svn path=/head/; revision=300687
* Count packets as not being delivered only if they are neitherMichael Tuexen2016-05-251-2/+6
| | | | | | | | | processed by a kernel handler nor by a raw socket. MFC after: 1 week Notes: svn path=/head/; revision=300679
* netinet: for pointers replace 0 with NULL.Pedro F. Giffuni2016-04-151-1/+1
| | | | | | | | | | | These are mostly cosmetical, no functional change. Found with devel/coccinelle. Reviewed by: ae. tuexen Notes: svn path=/head/; revision=298066
* Mfp: r296345Bjoern A. Zeeb2016-04-091-2/+1
| | | | | | | | | | | | | | | No need to keep type stability on raw sockets zone. We've also been running with a KASSERT since r222488 to make sure the ipi_count is 0 on destroy. PR: 164763 Reviewed by: gnn MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5735 Notes: svn path=/head/; revision=297735
* Remove sys/eventhandler.h from net/route.hAlexander V. Chernikov2016-01-091-0/+1
| | | | | | | Reviewed by: ae Notes: svn path=/head/; revision=293470
* Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.Andrey V. Elsukov2015-07-291-6/+8
| | | | | | | | | | | | | | Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149 Notes: svn path=/head/; revision=286001
* o Use new function ip_fillid() in all places throughout the kernel,Gleb Smirnoff2015-04-011-1/+5
| | | | | | | | | | | | | | | | | | | | where we want to create a new IP datagram. o Add support for RFC6864, which allows to set IP ID for atomic IP datagrams to any value, to improve performance. The behaviour is controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by default. o In case if we generate IP ID, use counter(9) to improve performance. o Gather all code related to IP ID into ip_id.c. Differential Revision: https://reviews.freebsd.org/D2177 Reviewed by: adrian, cy, rpaulo Tested by: Emeric POUPON <emeric.poupon stormshield.eu> Sponsored by: Netflix Sponsored by: Nginx, Inc. Relnotes: yes Notes: svn path=/head/; revision=280971
* Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.Gleb Smirnoff2014-11-071-1/+1
| | | | | | | Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=274225
* Make SOCK_RAW sockets to be truly raw, not modifying received and sentGleb Smirnoff2014-09-011-14/+2
| | | | | | | | | | | | | | | | | packets at all. Swapping byte order on SOCK_RAW was actually a bug, an artifact from the BSD network stack, that used to convert a packet to native byte order once it is received by kernel. Other operating systems didn't follow this, and later other BSD descendants fixed this, leaving us alone with the bug. Now it is clear that we should fix the bug. In collaboration with: Olivier Cochard-Labbé <olivier cochard.me> See also: https://wiki.freebsd.org/SOCK_RAW Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=270929
* Change pr_output's prototype to avoid the need for explicit casts.Kevin Lo2014-08-151-1/+8
| | | | | | | | | | This is a follow up to r269699. Phabric: D564 Reviewed by: jhb Notes: svn path=/head/; revision=270008
* Merge 'struct ip6protosw' and 'struct protosw' into one. Now we haveKevin Lo2014-08-081-4/+7
| | | | | | | | | | only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb Notes: svn path=/head/; revision=269699
* Fix jailed raw sockets not setting the correct source address bySteven Hartland2014-04-241-7/+7
| | | | | | | | | calling in_pcbladdr instead of prison_get_ip4 MFC after: 1 month Notes: svn path=/head/; revision=264879
* netinet code no longer uses IFA_RTSELF.Gleb Smirnoff2013-11-051-4/+0
| | | | Notes: svn path=/head/; revision=257693
* Cleanup in_ifscrub(), which is just an entry to in_scrubprefix().Gleb Smirnoff2013-11-011-2/+2
| | | | Notes: svn path=/head/; revision=257499
* The r48589 promised to remove implicit inclusion of if_var.h soon. PrepareGleb Smirnoff2013-10-261-0/+1
| | | | | | | | | | | to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=257176
* Mechanically substitute flags from historic mbuf allocator withGleb Smirnoff2012-12-051-1/+1
| | | | | | | | | | | | malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually Notes: svn path=/head/; revision=243882
* Do not reduce ip_len by size of IP header in the ip_input()Gleb Smirnoff2012-10-231-6/+4
| | | | | | | | | | | | | | before passing a packet to protocol input routines. For several protocols this mean that now protocol needs to do subtraction itself, and for another half this means that we do not need to add header length back to the packet. Make ip_stripoptions() to adjust ip_len, since now we enter this function with a packet header whose ip_len does represent length of entire packet, not payload only. Notes: svn path=/head/; revision=241923
* Switch the entire IPv4 stack to keep the IP packet headerGleb Smirnoff2012-10-221-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me> Notes: svn path=/head/; revision=241913
* Merge the projects/pf/head branch, that was worked on for last six months,Gleb Smirnoff2012-09-081-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into head. The most significant achievements in the new code: o Fine grained locking, thus much better performance. o Fixes to many problems in pf, that were specific to FreeBSD port. New code doesn't have that many ifdefs and much less OpenBSDisms, thus is more attractive to our developers. Those interested in details, can browse through SVN log of the projects/pf/head branch. And for reference, here is exact list of revisions merged: r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330, r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656, r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782, r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868, r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223, r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456, r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505, r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168, r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230, r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398, r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548, r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672, r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169, r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442, r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522, r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661, r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212. I'd like to thank people who participated in early testing: Tested by: Florian Smeets <flo freebsd.org> Tested by: Chekaluk Vitaly <artemrts ukr.net> Tested by: Ben Wilber <ben desync.com> Tested by: Ian FREISLICH <ianf cloudseed.co.za> Notes: svn path=/head/; revision=240233
* As I came by and noticed add a comment that inp locking is a bit optisticBjoern A. Zeeb2012-01-021-0/+2
| | | | | | | (read: non-existent) here and should be fixed. Notes: svn path=/head/; revision=229265
* Add back the IP header length to the total packet length field onAndre Oppermann2011-10-071-0/+7
| | | | | | | | | | | | | | | | | raw IP sockets. It was deducted in ip_input() in preparation for protocols interested only in the payload. On raw sockets the IP header should be delivered as it at came in from the network except for the byte order swaps in some fields. This brings us in line with all other OS'es that provide raw IP sockets. Reported by: Matthew Cini Sarreo <mcins1-at-gmail.com> MFC after: 3 days Notes: svn path=/head/; revision=226105
* Update packet filter (pf) code to OpenBSD 4.5.Bjoern A. Zeeb2011-06-281-0/+3
| | | | | | | | | | | You need to update userland (world and ports) tools to be in sync with the kernel. Submitted by: mlaier Submitted by: eri Notes: svn path=/head/; revision=223637
* Implement a CPU-affine TCP and UDP connection lookup data structure,Robert Watson2011-06-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct inpcbgroup. pcbgroups, or "connection groups", supplement the existing inpcbinfo connection hash table, which when pcbgroups are enabled, might now be thought of more usefully as a per-protocol 4-tuple reservation table. Connections are assigned to connection groups base on a hash of their 4-tuple; wildcard sockets require special handling, and are members of all connection groups. During a connection lookup, a per-connection group lock is employed rather than the global pcbinfo lock. By aligning connection groups with input path processing, connection groups take on an effective CPU affinity, especially when aligned with RSS work placement (see a forthcoming commit for details). This eliminates cache line migration associated with global, protocol-layer data structures in steady state TCP and UDP processing (with the exception of protocol-layer statistics; further commit to follow). Elements of this approach were inspired by Willman, Rixner, and Cox's 2006 USENIX paper, "An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems". However, there are also significant differences: we maintain the inpcb lock, rather than using the connection group lock for per-connection state. Likewise, the focus of this implementation is alignment with NIC packet distribution strategies such as RSS, rather than pure software strategies. Despite that focus, software distribution is supported through the parallel netisr implementation, and works well in configurations where the number of hardware threads is greater than the number of NIC input queues, such as in the RMI XLR threaded MIPS architecture. Another important difference is the continued maintenance of existing hash tables as "reservation tables" -- these are useful both to distinguish the resource allocation aspect of protocol name management and the more common-case lookup aspect. In configurations where connection tables are aligned with hardware hashes, it is desirable to use the traditional lookup tables for loopback or encapsulated traffic rather than take the expense of hardware hashes that are hard to implement efficiently in software (such as RSS Toeplitz). Connection group support is enabled by compiling "options PCBGROUP" into your kernel configuration; for the time being, this is an experimental feature, and hence is not enabled by default. Subject to the limited MFCability of change dependencies in inpcb, and its change to the inpcbinfo init function signature, this change in principle could be merged to FreeBSD 8.x. Reviewed by: bz Sponsored by: Juniper Networks, Inc. Notes: svn path=/head/; revision=222748
* Decompose the current single inpcbinfo lock into two locks:Robert Watson2011-05-301-19/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit). - A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space. Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required. A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag: INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb Callers must pass exactly one of these flags (for the time being). Some notes: - All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?). This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary. Reviewed by: bz Sponsored by: Juniper Networks, Inc. Notes: svn path=/head/; revision=222488
* The statically configured (permanent) ARP entries are removed when anQing Li2011-05-201-1/+8
| | | | | | | | | | | | | interface is brought down, even though the interface address is still valid. This patch maintains the permanent ARP entries as long as the interface address (having the same prefix as that of the ARP entries) is valid. Reviewed by: delphij MFC after: 5 days Notes: svn path=/head/; revision=222143
* MfP4 CH=192004:Bjoern A. Zeeb2011-04-271-0/+5
| | | | | | | | | | | | | | Move ip_defttl to raw_ip.c where it is actually used. In an IPv6 only world we do not want to compile ip_input.c in for that and it is a shared default with INET6. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days Notes: svn path=/head/; revision=221131
* MFp4 CH=191760:Bjoern A. Zeeb2011-04-201-8/+17
| | | | | | | | | | | | | | | | When compiling out INET we still need the initialization routines as well as the tuning and montoring sysctls shared with IPv6. Move the two send/recvspace variables up from the middle of the file to ease compiling out the INET only code. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 3 days Notes: svn path=/head/; revision=220880
* Specify a CTLTYPE_FOO so that a future sysctl(8) change does not needMatthew D Fleming2011-01-181-1/+2
| | | | | | | | | to rely on the format string. For SYSCTL_PROC instances that I noticed a discrepancy between the CTLTYPE and the format specifier, fix the CTLTYPE. Notes: svn path=/head/; revision=217554
* Adding an address on an interface also requires the loopback route toQing Li2010-09-121-0/+2
| | | | | | | | | | | that address be installed. PR: kern/150481 Submitted by: Ingo Flaschberger <if at xip.at> MFC after: 5 days Notes: svn path=/head/; revision=212502
* Ensure a minimum "slop" of 10 extra pcb structures when providing aJohn Baldwin2010-08-171-2/+2
| | | | | | | | | | | | memory size estimate to userland for pcb list sysctls. The previous behavior of a "slop" of n/8 does not work well for small values of n (e.g. no slop at all if you have less than 8 open UDP connections). Reviewed by: bz MFC after: 1 week Notes: svn path=/head/; revision=211433
* Enhance the historic behaviour of raw sockets and jails in a wayBjoern A. Zeeb2010-04-271-5/+18
| | | | | | | | | | | | | | | | | that we allow all possible jail IPs as source address rather than forcing the "primary". While IPv6 naturally has source address selection, for legacy IP we do not go through the pain in case IP_HDRINCL was not set. People should bind(2) for that. This will, for example, allow ping(|6) -S to work correctly for non-primary addresses. Reported by: (ten 211.ru) Tested by: (ten 211.ru) MFC after: 4 days Notes: svn path=/head/; revision=207277
* Add pcb reference counting to the pcblist sysctl handler functionsBjoern A. Zeeb2010-03-171-3/+12
| | | | | | | | | | | to ensure type stability while caching the pcb pointers for the copyout. Reviewed by: rwatson MFC after: 7 days Notes: svn path=/head/; revision=205251
* Abstract out initialization of most aspects of struct inpcbinfo fromRobert Watson2010-03-141-17/+3
| | | | | | | | | | | | | | | their calling contexts in {IP divert, raw IP sockets, TCP, UDP} and create new helper functions: in_pcbinfo_init() and in_pcbinfo_destroy() to do this work in a central spot. As inpcbinfo becomes more complex due to ongoing work to add connection groups, this will reduce code duplication. MFC after: 1 month Reviewed by: bz Sponsored by: Juniper Networks Notes: svn path=/head/; revision=205157
* Following up on a request from Ermal Luci to makeLuigi Rizzo2010-01-071-3/+7
| | | | | | | | | | | | | | | | | | | | ip_divert work as a client of pf(4), make ip_divert not depend on ipfw. This is achieved by moving to ip_var.h the struct ipfw_rule_ref (which is part of the mtag for all reinjected packets) and other declarations of global variables, and moving to raw_ip.c global variables for filter and divert hooks. Note that names and locations could be made more generic (ipfw_rule_ref is really a generic reference robust to reconfigurations; the packet filter is not necessarily ipfw; filters and their clients are not necessarily limited to ipv4), but _right now_ most of this stuff works on ipfw and ipv4, so i don't feel like doing a gratuitous renaming, at least for the time being. Notes: svn path=/head/; revision=201735
* Throughout the network stack we have a few places ofBjoern A. Zeeb2009-12-131-2/+2
| | | | | | | | | | | | | | | | | | | | | if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days Notes: svn path=/head/; revision=200473
* Dispatch sockopt calls to ipfw and dummynetLuigi Rizzo2009-12-021-0/+4
| | | | | | | | | | | using the new option numbers, IP_FW3 and IP_DUMMYNET3. Right now the modules return an error if called with those arguments so there is no danger of unwanted behaviour. MFC after: 3 days Notes: svn path=/head/; revision=200034
* Fix a functional regression in multicast.Bruce M Simpson2009-11-151-8/+26
| | | | | | | | | | | | | Userland daemons need to see IGMP traffic regardless of the group; omit the imo filter check if the proto is IGMP. The kernel part of IGMP will have already filtered appropriately at this point. MFC after: ASAP Submitted by: Franz Struwig Reported by: Ivor Prebeg, Franz Struwig Notes: svn path=/head/; revision=199287
* Virtualize the pfil hooks so that different jails may chose differentJulian Elischer2009-10-111-6/+6
| | | | | | | | | | | packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months Notes: svn path=/head/; revision=197952
* Self pointing routes are installed for configured interface addressesQing Li2009-09-151-0/+1
| | | | | | | | | | | | | and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly. Reviewed by: bz MFC after: immediately Notes: svn path=/head/; revision=197227
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andRobert Watson2009-08-011-1/+0
| | | | | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket) Notes: svn path=/head/; revision=196019
* Remove unused VNET_SET() and related macros; only VNET_GET() isRobert Watson2009-07-161-2/+2
| | | | | | | | | | | | ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib) Notes: svn path=/head/; revision=195727
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorRobert Watson2009-07-141-25/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith) Notes: svn path=/head/; revision=195699