linux

Commit Graph

Author	SHA1	Message	Date
Pavel Emelyanov	f68635e627	[IPV6]: Cleanup the addconf_sysctl_register This only includes fixing the space-indented lines and removing one unneeded else after the goto. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:26 -08:00
Fred L. Templin	c7dc89c0ac	[IPV6]: Add RFC4214 support This patch includes support for the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) per RFC4214. It uses the SIT module, and is configured using extensions to the "iproute2" utility. The diffs are specific to the Linux 2.6.24-rc2 kernel distribution. This version includes the diff for ./include/linux/if.h which was missing in the v2.4 submission and is needed to make the patch compile. The patch has been installed, compiled and tested in a clean 2.6.24-rc2 kernel build area. Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:09 -08:00
Pavel Emelyanov	f126734735	[IPV6]: Correct the comment concerning inetsw6 table It seems that net/ipv6/af_inet6.c was copied from net/ipv4/af_inet.c, but one comment was not fixed. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:49 -08:00
Pavel Emelyanov	42a73808ed	[RAW]: Consolidate proc interface. Both ipv6/raw.c and ipv4/raw.c use the seq files to walk through the raw sockets hash and show them. The "walking" code is rather huge, but is identical in both cases. The difference is the hash table to walk over and the protocol family to check (this was not in the first virsion of the patch, which was noticed by YOSHIFUJI) Make the ->open store the needed hash table and the family on the allocated raw_iter_state and make the start/next/stop callbacks work with it. This removes most of the code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:32 -08:00
Pavel Emelyanov	ab70768ec7	[RAW]: Consolidate proto->unhash callback Same as the ->hash one, this is easily consolidated. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:31 -08:00
Pavel Emelyanov	65b4c50b47	[RAW]: Consolidate proto->hash callback Having the raw_hashinfo it's easy to consolidate the raw[46]_hash functions. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:31 -08:00
Pavel Emelyanov	b673e4dfc8	[RAW]: Introduce raw_hashinfo structure The ipv4/raw.c and ipv6/raw.c contain many common code (most of which is proc interface) which can be consolidated. Most of the places to consolidate deal with the raw sockets hashtable, so introduce a struct raw_hashinfo which describes the raw sockets hash. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:30 -08:00
Pavel Emelyanov	69d6da0b0f	[IPv6] RAW: Compact the API for the kernel Same as in the previous patch for ipv4, compact the API and hide hash table and rwlock inside the raw.c file. Plus fix some "bad" places from checkpatch.pl point of view (assignments inside if()). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:29 -08:00
Denis V. Lunev	97c53cacf0	[NET]: Make rtnetlink infrastructure network namespace aware (v3) After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:25 -08:00
Denis V. Lunev	b854272b3c	[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2) Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2008-01-28 14:54:24 -08:00
YOSHIFUJI Hideaki	2a8cc6c890	[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table. Policy table is implemented as an RCU linear list since we do not expect large list nor frequent updates. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:58 -08:00
YOSHIFUJI Hideaki	303065a854	[IPV6] ADDRCONF: Allow address selection policy with ifindex. This patch allows ifindex to be a key for address selection policy table. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:57 -08:00
YOSHIFUJI Hideaki	c1ee656ccb	[IPV6] ADDRCONF: Rename ipv6_saddr_label() to ipv6_addr_label(). This patch renames ipv6_saddr_label() to ipv6_addr_label() because address label is used for both of source address and destination address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:56 -08:00
David S. Miller	294b4baf29	[IPSEC]: Kill afinfo->nf_post_routing After changeset: [NETFILTER]: Introduce NF_INET_ hook values It always evaluates to NF_INET_POST_ROUTING. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:55 -08:00
Patrick McHardy	6e23ae2a48	[NETFILTER]: Introduce NF_INET_ hook values The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:55 -08:00
Herbert Xu	1bf06cd2e3	[IPSEC]: Add async resume support on input This patch adds support for async resumptions on input. To do so, the transform would return -EINPROGRESS and subsequently invoke the function xfrm_input_resume to resume processing. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:54 -08:00
Herbert Xu	60d5fcfb19	[IPSEC]: Remove nhoff from xfrm_input The nhoff field isn't actually necessary in xfrm_input. For tunnel mode transforms we now throw away the output IP header so it makes no sense to fill in the nexthdr field. For transport mode we can now let the function transport_finish do the setting and it knows where the nexthdr field is. The only other thing that needs the nexthdr field to be set is the header extraction code. However, we can simply move the protocol extraction out of the generic header extraction. We want to minimise the amount of info we have to carry around between transforms as this simplifies the resumption process for async crypto. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:53 -08:00
Herbert Xu	d26f398400	[IPSEC]: Make x->lastused an unsigned long Currently x->lastused is u64 which means that it cannot be read/written atomically on all architectures. David Miller observed that the value stored in it is only an unsigned long which is always atomic. So based on his suggestion this patch changes the internal representation from u64 to unsigned long while the user-interface still refers to it as u64. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:52 -08:00
Herbert Xu	0ebea8ef35	[IPSEC]: Move state lock into x->type->input This patch releases the lock on the state before calling x->type->input. It also adds the lock to the spots where they're currently needed. Most of those places (all except mip6) are expected to disappear with async crypto. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:52 -08:00
Herbert Xu	668dc8af31	[IPSEC]: Move integrity stat collection into xfrm_input Similar to the moving out of the replay processing on the output, this patch moves the integrity stat collectin from x->type->input into xfrm_input. This would eventually allow transforms such as AH/ESP to be lockless. The error value EBADMSG (currently unused in the crypto layer) is used to indicate a failed integrity check. In future this error can be directly returned by the crypto layer once we switch to aead algorithms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:51 -08:00
Herbert Xu	716062fd4c	[IPSEC]: Merge most of the input path As part of the work on asynchronous cryptographic operations, we need to be able to resume from the spot where they occur. As such, it helps if we isolate them to one spot. This patch moves most of the remaining family-specific processing into the common input code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:50 -08:00
Herbert Xu	862b82c6f9	[IPSEC]: Merge most of the output path As part of the work on asynchrnous cryptographic operations, we need to be able to resume from the spot where they occur. As such, it helps if we isolate them to one spot. This patch moves most of the remaining family-specific processing into the common output code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:48 -08:00
Herbert Xu	ef76bc23ef	[IPV6]: Add ip6_local_out Most callers of the LOCAL_OUT chain will set the IP packet length before doing so. They also share the same output function dst_output. This patch creates a new function called ip6_local_out which does all of that and converts the appropriate users over to it. Apart from removing duplicate code, it will also help in merging the IPsec output path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:47 -08:00
Herbert Xu	227620e295	[IPSEC]: Separate inner/outer mode processing on input With inter-family transforms the inner mode differs from the outer mode. Attempting to handle both sides from the same function means that it needs to handle both IPv4 and IPv6 which creates duplication and confusion. This patch separates the two parts on the input path so that each function deals with one family only. In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut moves the pertinent fields from the IPv4/IPv6 IP headers into a neutral format stored in skb->cb. This is then used by the inner mode input functions to modify the inner IP header. In this way the input function no longer has to know about the outer address family. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:46 -08:00
Herbert Xu	36cf9acf93	[IPSEC]: Separate inner/outer mode processing on output With inter-family transforms the inner mode differs from the outer mode. Attempting to handle both sides from the same function means that it needs to handle both IPv4 and IPv6 which creates duplication and confusion. This patch separates the two parts on the output path so that each function deals with one family only. In particular, the functions xfrm4_extract_output/xfrm6_extract_output moves the pertinent fields from the IPv4/IPv6 IP headers into a neutral format stored in skb->cb. This is then used by the outer mode output functions to write the outer IP header. In this way the output function no longer has to know about the inner address family. Since the extract functions are only called by tunnel modes (the only modes that can support inter-family transforms), I've also moved the xfrm*_tunnel_check_size calls into them. This allows the correct ICMP message to be sent as opposed to now where you might call icmp_send with an IPv6 packet and vice versa. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:45 -08:00
Herbert Xu	29bb43b4ec	[INET]: Give outer DSCP directly to ip*_copy_dscp This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so that they directly take the outer DSCP rather than the outer IP header. This will help us to unify the code for inter-family tunnels. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:45 -08:00
Herbert Xu	a2deb6d26f	[IPSEC]: Move x->outer_mode->output out of locked section RO mode is the only one that requires a locked output function. So it's easier to move the lock into that function rather than requiring everyone else to run under the lock. In particular, this allows us to move the size check into the output function without causing a potential dead-lock should the ICMP error somehow hit the same SA on transmission. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:44 -08:00
Herbert Xu	e40b328615	[IPSEC]: Forbid BEET + ipcomp for now While BEET can theoretically work with IPComp the current code can't do that because it tries to construct a BEET mode tunnel type which doesn't (and cannot) exist. In fact as it is it won't even attach a tunnel object at all for BEET which is bogus. To support this fully we'd also need to change the policy checks on input to recognise a plain tunnel as a legal variant of an optional BEET transform. This patch simply fails such constructions for now. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:43 -08:00
Herbert Xu	25ee3286dc	[IPSEC]: Merge common code into xfrm_bundle_create Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are common. This patch extracts that logic and puts it into xfrm_bundle_create. The rest of it are then accessed through afinfo. As a result this fixes the problem with inter-family transforms where we treat every xfrm dst in the bundle as if it belongs to the top family. This patch also fixes a long-standing error-path bug where we may free the xfrm states twice. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:43 -08:00
Herbert Xu	66cdb3ca27	[IPSEC]: Move flow construction into xfrm_dst_lookup This patch moves the flow construction from the callers of xfrm_dst_lookup into that function. It also changes xfrm_dst_lookup so that it takes an xfrm state as its argument instead of explicit addresses. This removes any address-specific logic from the callers of xfrm_dst_lookup which is needed to correctly support inter-family transforms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:42 -08:00
Herbert Xu	f04e7e8d7f	[IPSEC]: Replace x->type->{local,remote}_addr with flags The functions local_addr and remote_addr are more than what they're needed for. The same thing can be done easily with flags on the type object. This patch does that and simplifies the wrapper functions in xfrm6_policy accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:41 -08:00
Herbert Xu	fff6938880	[IPSEC]: Make sure idev is consistent with dev in xfrm_dst Previously we took the device from the bottom route and idev from the top route. This is bad because idev may well point to a different device. This patch changes it so that we get the idev from the device directly. It also makes it an error if either dev or idev is NULL. This is consistent with the rest of the routing code which also treats these cases as errors. I've removed the err initialisation in xfrm6_policy.c because it achieves no purpose and hid a bug when an initial version of this patch neglected to set err to -ENODEV (fortunately the IPv4 version warned about it). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:40 -08:00
Herbert Xu	45ff5a3f9a	[IPSEC]: Set dst->input to dst_discard The input function should never be invoked on IPsec dst objects. This is because we don't apply IPsec on input until after we've made the routing decision. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:40 -08:00
Herbert Xu	8ce68ceb55	[IPSEC]: Only set neighbour on top xfrm dst The neighbour field is only used by dst_confirm which only ever happens on the top-most xfrm dst. So it's a waste to duplicate for every other xfrm dst. This patch moves its setting out of the loop so that only the top one gets set. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:39 -08:00
Herbert Xu	352e512c32	[NET]: Eliminate duplicate copies of dst_discard We have a number of copies of dst_discard scattered around the place which all do the same thing, namely free a packet on the input or output paths. This patch deletes all of them except dst_discard and points all the users to it. The only non-trivial bit is decnet where it returns an error. However, conceptually this is identical to the blackhole functions used in IPv4 and IPv6 which do not return errors. So they should either all return errors or all return zero. For now I've stuck with the majority and picked zero as the return value. It doesn't really matter in practice since few if any driver would react differently depending on a zero return value or NET_RX_DROP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:37 -08:00
Herbert Xu	b4ce92775c	[IPV6]: Move nfheader_len into rt6_info The dst member nfheader_len is only used by IPv6. It's also currently creating a rather ugly alignment hole in struct dst. Therefore this patch moves it from there into struct rt6_info. It also reorders the fields in rt6_info to minimize holes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:37 -08:00
Herbert Xu	0148894223	[IPV6]: Only set nfheader_len for top xfrm dst We only need to set nfheader_len in the top xfrm dst. This is because we only ever read the nfheader_len from the top xfrm dst. It is also easier to count nfheader_len as part of header_len which then lets us remove the ugly wrapper functions for incrementing and decrementing header lengths in xfrm6_policy.c. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:35 -08:00
Pavel Emelyanov	b24b8a247f	[NET]: Convert init_timer into setup_timer Many-many code in the kernel initialized the timer->function and timer->data together with calling init_timer(timer). There is already a helper for this. Use it for networking code. The patch is HUGE, but makes the code 130 lines shorter (98 insertions(+), 228 deletions(-)). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:35 -08:00
Wang Chen	a92aa318b4	[IPV6]: Add raw6 drops counter. Add raw drops counter for IPv6 in /proc/net/raw6 . Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:34 -08:00
Jens Axboe	a0974dd3da	[TCP] splice: add tcp_splice_read() to IPV6 Thanks to YOSHIFUJI Hideaki for the hint! Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:32 -08:00
Rolf Manderscheid	a9e527e3f9	IPoIB: improve IPv4/IPv6 to IB mcast mapping functions An IPoIB subnet on an IB fabric that spans multiple IB subnets can't use link-local scope in multicast GIDs. The existing routines that map IP/IPv6 multicast addresses into IB link-level addresses hard-code the scope to link-local, and they also leave the partition key field uninitialised. This patch adds a parameter (the link-level broadcast address) to the mapping routines, allowing them to initialise both the scope and the P_Key appropriately, and fixes up the call sites. The next step will be to add a way to configure the scope for an IPoIB interface. Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:37 -08:00
Herbert Xu	f945fa7ad9	[INET]: Fix truesize setting in ip_append_data As it is ip_append_data only counts page fragments to the skb that allocated it. As such it means that the first skb gets hit with a 4K charge even though it might have only used a fraction of it while all subsequent skb's that use the same page gets away with no charge at all. This bug was exposed by the UDP accounting patch. [ The wmem_alloc bumping needs to be moved with the truesize, noticed by Takahiro Yasui. -DaveM ] Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-23 03:11:43 -08:00
Wang Chen	fa95c28322	[IPV6]: RFC 2011 compatibility broken The snmp6 entry name was changed, and it broke compatibility to RFC 2011. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-21 03:05:43 -08:00
Wang Chen	c964ff4ffb	[IPV6]: ICMP6_MIB_OUTMSGS increment duplicated icmpv6_send() calls ip6_push_pending_frames() indirectly. Both ip6_push_pending_frames() and icmpv6_send() increment counter ICMP6_MIB_OUTMSGS. This patch remove the increment from icmpv6_send. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-21 03:05:20 -08:00
YOSHIFUJI Hideaki	398bcbebb6	[IPV6] ROUTE: Make sending algorithm more friendly with RFC 4861. We omit (or delay) sending NSes for known-to-unreachable routers (in NUD_FAILED state) according to RFC 4191 (Default Router Preferences and More-Specific Routes). But this is not fully compatible with RFC 4861 (Neighbor Discovery Protocol for IPv6), which does not remember unreachability of neighbors. So, let's avoid mixing sending algorithm of RFC 4191 and that of RFC 4861, and make the algorithm more friendly with RFC 4861 if RFC 4191 is disabled. Issue was found by IPv6 Ready Logo Core Self_Test 1.5.0b2 (by TAHI Project), and has been tracked down by Mitsuru Chinen <mitch@linux.vnet.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-20 20:31:40 -08:00
Pavel Emelyanov	b3652b2dc5	[IPV6]: Mischecked tw match in __inet6_check_established. When looking for a conflicting connection the !sk->sk_bound_dev_if check is performed only for live sockets, but not for timewait-ed. This is not the case for ipv4, for __inet6_lookup_established in both ipv4 and ipv6 and for other places that check for tw-s. Was this missed accidentally? If so, then this patch fixes it and besides makes use if the dif variable declared in the function. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-20 20:31:36 -08:00
Yasuyuki Kozakai	8f41f01786	[NETFILTER]: ip6t_eui64: Fixes calculation of Universal/Local bit RFC2464 says that the next to lowerst order bit of the first octet of the Interface Identifier is formed by complementing the Universal/Local bit of the EUI-64. But ip6t_eui64 uses OR not XOR. Thanks Peter Ivancik for reporing this bug and posting a patch for it. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-10 22:40:39 -08:00
Brian Haley	1ac4f00885	[IPV6]: IPV6_MULTICAST_IF setting is ignored on link-local connect() Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-08 23:52:21 -08:00
Joe Perches	bea8519547	[IPV6]: Spelling fixes Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-20 14:01:35 -08:00
Wei Yongjun	cf6fc4a924	[IPV6]: Fix the return value of ipv6_getsockopt If CONFIG_NETFILTER if not selected when compile the kernel source code, ipv6_getsockopt will returen an EINVAL error if optname is not supported by the kernel. But if CONFIG_NETFILTER is selected, ENOPROTOOPT error will be return. This patch fix to always return ENOPROTOOPT error if optname argument of ipv6_getsockopt is not supported by the kernel. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-16 13:39:57 -08:00
Thomas Graf	95a02cfd4d	[IPv6] ESP: Discard dummy packets introduced in rfc4303 RFC4303 introduces dummy packets with a nexthdr value of 59 to implement traffic confidentiality. Such packets need to be dropped silently and the payload may not be attempted to be parsed as it consists of random chunk. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-11 02:45:27 -08:00
YOSHIFUJI Hideaki	1df2e44560	[IPV6] XFRM: Fix auditing rt6i_flags; use RTF_xxx flags instead of RTCF_xxx. RTCF_xxx flags, defined in include/linux/in_route.h) are available for IPv4 route (rtable) entries only. Use RTF_xxx flags instead, defined in include/linux/ipv6_route.h, for IPv6 route entries (rt6_info). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-11 02:45:24 -08:00
Mitsuru Chinen	ca46f9c834	[IPv6] SNMP: Increment OutNoRoutes when connecting to unreachable network IPv6 stack doesn't increment OutNoRoutes counter when IP datagrams is being discarded because no route could be found to transmit them to their destination. IPv6 stack should increment the counter. Incidentally, IPv4 stack increments that counter in such situation. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-07 01:06:30 -08:00
Evgeniy Polyakov	d31c7b8fa3	[IPV6]: Restore IPv6 when MTU is big enough Avaid provided test application, so bug got fixed. IPv6 addrconf removes ipv6 inner device from netdev each time cmu changes and new value is less than IPV6_MIN_MTU (1280 bytes). When mtu is changed and new value is greater than IPV6_MIN_MTU, it does not add ipv6 addresses and inner device bac. This patch fixes that. Tested with Avaid's application, which works ok now. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-11-30 23:36:08 +11:00
YOSHIFUJI Hideaki	77adefdc98	[IPV6] TCPMD5: Fix deleting key operation. Due to the bug, refcnt for md5sig pool was leaked when an user try to delete a key if we have more than one key. In addition to the leakage, we returned incorrect return result value for userspace. This fix should close Bug #9418, reported by <ming-baini@163.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-20 17:31:23 -08:00
YOSHIFUJI Hideaki	aacbe8c880	[IPV6] TCPMD5: Check return value of tcp_alloc_md5sig_pool(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-20 17:30:56 -08:00
Joe Perches	3b6d821c4f	[IPV6]: Add missing "space" Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-19 23:47:25 -08:00
Pierre Ynard	dbb2ed2485	[IPV6]: Add ifindex field to ND user option messages. Userland neighbor discovery options are typically heavily involved with the interface on which thay are received: add a missing ifindex field to the original struct. Thanks to R�mi Denis-Courmont. Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-12 17:58:35 -08:00
Denis V. Lunev	2994c63863	[INET]: Small possible memory leak in FIB rules This patch fixes a small memory leak. Default fib rules can be deleted by the user if the rule does not carry FIB_RULE_PERMANENT flag, f.e. by ip rule flush Such a rule will not be freed as the ref-counter has 2 on start and becomes clearly unreachable after removal. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 22:12:03 -08:00
Pavel Emelyanov	03f49f3457	[NET]: Make helper to get dst entry and "use" it There are many places that get the dst entry, increase the __use counter and set the "lastuse" time stamp. Make a helper for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:28:34 -08:00
Eric Dumazet	230140cffa	[INET]: Remove per bucket rwlock in tcp/dccp ehash table. As done two years ago on IP route cache table (commit `22c047ccbc`) , we can avoid using one lock per hash bucket for the huge TCP/DCCP hash tables. On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle performance differences. (we hit a different cache line for the rwlock, but then the bucket cache line have a better sharing factor among cpus, since we dirty it less often). For netstat or ss commands that want a full scan of hash table, we perform fewer memory accesses. Using a 'small' table of hashed rwlocks should be more than enough to provide correct SMP concurrency between different buckets, without using too much memory. Sizing of this table depends on num_possible_cpus() and various CONFIG settings. This patch provides some locking abstraction that may ease a future work using a different model for TCP/DCCP table. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:11 -08:00
Herbert Xu	4999f3621f	[IPSEC]: Fix crypto_alloc_comp error checking The function crypto_alloc_comp returns an errno instead of NULL to indicate error. So it needs to be tested with IS_ERR. This is based on a patch by Vicen� Beltran Querol. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:03 -08:00
Alexey Dobriyan	33120b30cc	[IPV6]: Convert /proc/net/ipv6_route to seq_file interface This removes last proc_net_create() user. Kudos to Benjamin Thery and Stephen Hemminger for comments on previous version. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:09:18 -08:00
Eric Dumazet	c5a432f1a1	[IPV6]: Use the {DEFINE\|REF}_PROTO_INUSE infrastructure Trivial patch to make "tcpv6,udpv6,udplitev6,rawv6" protocols uses the fast "inuse sockets" infrastructure Each protocol use then a static percpu var, instead of a dynamic one. This saves some ram and some cpu cycles Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:59 -08:00
Eric Dumazet	286ab3d460	[NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. "struct proto" currently uses an array stats[NR_CPUS] to track change on 'inuse' sockets per protocol. If NR_CPUS is big, this means we use a big memory area for this. Moreover, all this memory area is located on a single node on NUMA machines, increasing memory pressure on the boot node. In this patch, I tried to : - Keep a fast !CONFIG_SMP implementation - Keep a fast CONFIG_SMP implementation for often used protocols (tcp,udp,raw,...) - Introduce a NUMA efficient implementation Some helper macros are defined in include/net/sock.h These macros take into account CONFIG_SMP If a "struct proto" is declared without using DEFINE_PROTO_INUSE / REF_PROTO_INUSE macros, it will automatically use a default implementation, using a dynamically allocated percpu zone. This default implementation will be NUMA efficient, but might use 32/64 bytes per possible cpu because of current alloc_percpu() implementation. However it still should be better than previous implementation based on stats[NR_CPUS] field. When a "struct proto" is changed to use the new macros, we use a single static "int" percpu variable, lowering the memory and cpu costs, still preserving NUMA efficiency. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:57 -08:00
Mitsuru Chinen	7a0ff716c2	[IPv6] SNMP: Restore Udp6InErrors incrementation As the checksum verification is postponed till user calls recv or poll, the inrementation of Udp6InErrors counter should be also postponed. Currently, it is postponed in non-blocking operation case. However it should be postponed in all case like the IPv4 code. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:54 -08:00
Pavel Emelyanov	bf138862b1	[IPV6]: Consolidate the ip cork destruction in ip6_output.c The ip6_push_pending_frames and ip6_flush_pending_frames do the same things to flush the sock's cork. Move this into a separate function and save ~100 bytes from the .text Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:26 -08:00
Jan Engelhardt	0795c65d9f	[NETFILTER]: Clean up Makefile Sort matches and targets in the NF makefiles. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:22 -08:00
Alexey Dobriyan	7351a22a3a	[NETFILTER]: ip{,6}_queue: convert to seq_file interface I plan to kill ->get_info which means killing proc_net_create(). Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:20 -08:00
Jens Axboe	c46f2334c8	[SG] Get rid of __sg_mark_end() sg_mark_end() overwrites the page_link information, but all users want __sg_mark_end() behaviour where we just set the end bit. That is the most natural way to use the sg list, since you'll fill it in and then mark the end point. So change sg_mark_end() to only set the termination bit. Add a sg_magic debug check as well, and clear a chain pointer if it is set. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-02 08:47:06 +01:00
Adrian Bunk	87ae9afdca	cleanup asm/scatterlist.h includes Not architecture specific code should not #include <asm/scatterlist.h>. This patch therefore either replaces them with #include <linux/scatterlist.h> or simply removes them if they were unused. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-02 08:47:06 +01:00
Pavel Emelyanov	6257ff2177	[NET]: Forget the zero_it argument of sk_alloc() Finally, the zero_it argument can be completely removed from the callers and from the function prototype. Besides, fix the checkpatch.pl warnings about using the assignments inside if-s. This patch is rather big, and it is a part of the previous one. I splitted it wishing to make the patches more readable. Hope this particular split helped. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:39:31 -07:00
David S. Miller	51c739d1f4	[NET]: Fix incorrect sg_mark_end() calls. This fixes scatterlist corruptions added by commit `68e3f5dd4d` [CRYPTO] users: Fix up scatterlist conversion errors The issue is that the code calls sg_mark_end() which clobbers the sg_page() pointer of the final scatterlist entry. The first part fo the fix makes skb_to_sgvec() do __sg_mark_end(). After considering all skb_to_sgvec() call sites the most correct solution is to call __sg_mark_end() in skb_to_sgvec() since that is what all of the callers would end up doing anyways. I suspect this might have fixed some problems in virtio_net which is the sole non-crypto user of skb_to_sgvec(). Other similar sg_mark_end() cases were converted over to __sg_mark_end() as well. Arguably sg_mark_end() is a poorly named function because it doesn't just "mark", it clears out the page pointer as a side effect, which is what led to these bugs in the first place. The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable() and arguably it could be converted to __sg_mark_end() if only so that we can delete this confusing interface from linux/scatterlist.h Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 21:29:29 -07:00
Daniel Lezcano	1675c7b254	[IPV6]: remove duplicate call to proc_net_remove The file /proc/net/if_inet6 is removed twice. First time in: inet6_exit ->addrconf_cleanup And followed a few lines after by: inet6_exit -> if6_proc_exit Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 21:16:24 -07:00
Matthias M. Dellweg	b0a713e9e6	[TCP] MD5: Remove some more unnecessary casting. while reviewing the tcp_md5-related code further i came across with another two of these casts which you probably have missed. I don't actually think that they impose a problem by now, but as you said we should remove them. Signed-off-by: Matthias M. Dellweg <2500@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:27 -07:00
YOSHIFUJI Hideaki	ad02ac145d	[IPV6] NDISC: Fix setting base_reachable_time_ms variable. This bug was introduced by the commit `d12af679bc` (sysctl: fix neighbour table sysctls). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:22 -07:00
Herbert Xu	68e3f5dd4d	[CRYPTO] users: Fix up scatterlist conversion errors This patch fixes the errors made in the users of the crypto layer during the sg_init_table conversion. It also adds a few conversions that were missing altogether. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-27 00:52:07 -07:00
Adrian Bunk	72998d8c84	[INET] ESP: Must #include <linux/scatterlist.h> This patch fixes the following compile errors in some configurations: <-- snip --> ... CC net/ipv4/esp4.o /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv4/esp4.c: In function 'esp_output': /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv4/esp4.c:113: error: implicit declaration of function 'sg_init_table' make[3]: * [net/ipv4/esp4.o] Error 1 ... /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv6/esp6.c: In function 'esp6_output': /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv6/esp6.c:112: error: implicit declaration of function 'sg_init_table' make[3]: * [net/ipv6/esp6.o] Error 1 <-- snip --> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 22:53:58 -07:00
Jeff Garzik	18134bed02	[TCP] IPV6: fix softnet build breakage net/ipv6/tcp_ipv6.c: In function 'tcp_v6_rcv': net/ipv6/tcp_ipv6.c:1736: error: implicit declaration of function 'get_softnet_dma' net/ipv6/tcp_ipv6.c:1736: warning: assignment makes pointer from integer without a cast Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 22:53:14 -07:00
David S. Miller	b4caea8aa8	[TCP]: Add missing I/O AT code to ipv6 side. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 04:20:13 -07:00
David S. Miller	c7da57a183	[TCP]: Fix scatterlist handling in MD5 signature support. Use sg_init_table() and sg_mark_end() as needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 00:41:21 -07:00
David S. Miller	ed0e7e0ca3	[IPSEC]: Add missing sg_init_table() calls to ESP. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 00:38:39 -07:00
Chuck Lever	c2636b4d9e	[NET]: Treat the sign of the result of skb_headroom() consistently In some places, the result of skb_headroom() is compared to an unsigned integer, and in others, the result is compared to a signed integer. Make the comparisons consistent and correct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-23 21:27:55 -07:00
Masahide NAKAMURA	ea2c47b42f	[IPSEC] IPV6: Fix to add tunnel mode SA correctly. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:58 -07:00
Anton Arapov	a25de534f8	[INET]: Justification for local port range robustness. There is a justifying patch for Stephen's patches. Stephen's patches disallows using a port range of one single port and brakes the meaning of the 'remaining' variable, in some places it has different meaning. My patch gives back the sense of 'remaining' variable. It should mean how many ports are remaining and nothing else. Also my patch allows using a single port. I sure we must be able to use mentioned port range, this does not restricted by documentation and does not brake current behavior. usefull links: Patches posted by Stephen Hemminger http://marc.info/?l=linux-netdev&m=119206106218187&w=2 http://marc.info/?l=linux-netdev&m=119206109918235&w=2 Andrew Morton's comment http://marc.info/?l=linux-kernel&m=119248225007737&w=2 1. Allows using a port range of one single port. 2. Gives back sense of 'remaining' variable. Signed-off-by: Anton Arapov <aarapov@redhat.com> Acked-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 22:00:17 -07:00
Linus Torvalds	a57793651f	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (51 commits) [IPV6]: Fix again the fl6_sock_lookup() fixed locking [NETFILTER]: nf_conntrack_tcp: fix connection reopening fix [IPV6]: Fix race in ipv6_flowlabel_opt() when inserting two labels [IPV6]: Lost locking in fl6_sock_lookup [IPV6]: Lost locking when inserting a flowlabel in ipv6_fl_list [NETFILTER]: xt_sctp: fix mistake to pass a pointer where array is required [NET]: Fix OOPS due to missing check in dev_parse_header(). [TCP]: Remove lost_retrans zero seqno special cases [NET]: fix carrier-on bug? [NET]: Fix uninitialised variable in ip_frag_reasm() [IPSEC]: Rename mode to outer_mode and add inner_mode [IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP [IPSEC]: Use the top IPv4 route's peer instead of the bottom [IPSEC]: Store afinfo pointer in xfrm_mode [IPSEC]: Add missing BEET checks [IPSEC]: Move type and mode map into xfrm_state.c [IPSEC]: Fix length check in xfrm_parse_spi [IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi [IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi [IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input ...	2007-10-18 14:40:30 -07:00
Eric W. Biederman	064b5bba0c	sysctl: remove broken netfilter binary sysctls No one has bothered to set strategy routine for the the netfilter sysctls that return jiffies to be sysctl_jiffies. So it appears the sys_sysctl path is unused and untested, so this patch removes the binary sysctl numbers. Which fixes the netfilter oops in 2.6.23-rc2-mm2 for me. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:23 -07:00
Eric W. Biederman	428b367bff	sysctl: ipv6 route flushing (kill binary path) We don't preoperly support the sysctl binary path for flushing the ipv6 routes. So remove support for a binary path. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Eric W. Biederman	d12af679bc	sysctl: fix neighbour table sysctls. - In ipv6 ndisc_ifinfo_syctl_change so it doesn't depend on binary sysctl names for a function that works with proc. - In neighbour.c reorder the table to put the possibly unused entries at the end so we can remove them by terminating the table early. - In neighbour.c kill the entries with questionable binary sysctl handling behavior. - In neighbour.c if we don't have a strategy routine remove the binary path. So we don't the default sysctl strategy routine on data that is not ready for it. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Pavel Emelyanov	52f095ee88	[IPV6]: Fix again the fl6_sock_lookup() fixed locking YOSHIFUJI fairly pointed out, that the users increment should be done under the ip6_sk_fl_lock not to give IPV6_FL_A_PUT a chance to put this count to zero and release the flowlabel. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 05:38:48 -07:00
Pavel Emelyanov	78c2e50253	[IPV6]: Fix race in ipv6_flowlabel_opt() when inserting two labels In the IPV6_FL_A_GET case the hash is checked for flowlabels with the given label. If it is not found, the lock, protecting the hash, is dropped to be re-get for writing. After this a newly allocated entry is inserted, but no checks are performed to catch a classical SMP race, when the conflicting label may be inserted on another cpu. Use the (currently unused) return value from fl_intern() to return the conflicting entry (if found) and re-check, whether we can reuse it (IPV6_FL_F_EXCL) or return -EEXISTS. Also add the comment, about why not re-lookup the current sock for conflicting flowlabel entry. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 05:18:56 -07:00
Pavel Emelyanov	bd0bf57700	[IPV6]: Lost locking in fl6_sock_lookup This routine scans the ipv6_fl_list whose update is protected with the socket lock and the ip6_sk_fl_lock. Since the socket lock is not taken in the lookup, use the other one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 05:15:57 -07:00
Pavel Emelyanov	04028045a1	[IPV6]: Lost locking when inserting a flowlabel in ipv6_fl_list The new flowlabels should be inserted into the sock list under the ip6_sk_fl_lock. This was lost in one place. This list is naturally protected with the socket lock, but the fl6_sock_lookup() is called without it, so another protection is required. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 05:14:58 -07:00
Herbert Xu	13996378e6	[IPSEC]: Rename mode to outer_mode and add inner_mode This patch adds a new field to xfrm states called inner_mode. The existing mode object is renamed to outer_mode. This is the first part of an attempt to fix inter-family transforms. As it is we always use the outer family when determining which mode to use. As a result we may end up shoving IPv4 packets into netfilter6 and vice versa. What we really want is to use the inner family for the first part of outbound processing and the outer family for the second part. For inbound processing we'd use the opposite pairing. I've also added a check to prevent silly combinations such as transport mode with inter-family transforms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:35:51 -07:00
Herbert Xu	ca68145f16	[IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP Combining RO and AH/ESP/IPCOMP does not make sense. So this patch adds a check in the state initialisation function to prevent this. This allows us to safely remove the mode input function of RO since it can never be called anymore. Indeed, if somehow it does get called we'll know about it through an OOPS instead of it slipping past silently. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:35:15 -07:00
Herbert Xu	17c2a42a24	[IPSEC]: Store afinfo pointer in xfrm_mode It is convenient to have a pointer from xfrm_state to address-specific functions such as the output function for a family. Currently the address-specific policy code calls out to the xfrm state code to get those pointers when we could get it in an easier way via the state itself. This patch adds an xfrm_state_afinfo to xfrm_mode (since they're address-specific) and changes the policy code to use it. I've also added an owner field to do reference counting on the module providing the afinfo even though it isn't strictly necessary today since IPv6 can't be unloaded yet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:33:12 -07:00
Herbert Xu	1bfcb10f67	[IPSEC]: Add missing BEET checks Currently BEET mode does not reinject the packet back into the stack like tunnel mode does. Since BEET should behave just like tunnel mode this is incorrect. This patch fixes this by introducing a flags field to xfrm_mode that tells the IPsec code whether it should terminate and reinject the packet back into the stack. It then sets the flag for BEET and tunnel mode. I've also added a number of missing BEET checks elsewhere where we check whether a given mode is a tunnel or not. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:31:50 -07:00
Herbert Xu	7aa68cb906	[IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi Not every transform needs to zap ip_summed. For example, a pure tunnel mode encapsulation does not affect the hardware checksum at all. In fact, every algorithm (that needs this) other than AH6 already does its own ip_summed zapping. This patch moves the zapping into AH6 which is in line with what IPv4 does. Possible future optimisation: Checksum the data as we copy them in IPComp. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:30:07 -07:00
Herbert Xu	33b5ecb8f6	[IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi Currently xfrm6_rcv_spi gets the nexthdr value itself from the packet. This means that we need to fix up the value in case we have a 4-on-6 tunnel. Moving this logic into the caller simplifies things and allows us to merge the code with IPv4. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:29:25 -07:00
Herbert Xu	04663d0b8b	[IPSEC]: Fix pure tunnel modes involving IPv6 I noticed that my recent patch broke 6-on-4 pure IPsec tunnels (the ones that are only used for incompressible IPsec packets). Subsequent reviews show that I broke 6-on-6 pure tunnels more than three years ago and nobody ever noticed. I suppose every must be testing 6-on-6 IPComp with large pings which are very compressible :) This patch fixes both cases. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:28:06 -07:00
Pavel Emelyanov	aaf70ec7fd	[IPV6]: Cleanup snmp6_alloc_dev() This functions is never called with NULL or not setup argument, so the checks inside are redundant. Also, the return value is always -ENOMEM, so no need in additional variable for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:25:32 -07:00
Pavel Emelyanov	16910b9829	[IPV6]: Fix return type for snmp6_free_dev() This call is essentially void. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:23:43 -07:00
Pavel Emelyanov	c95477090a	[INET]: Consolidate frag queues freeing Since we now allocate the queues in inet_fragment.c, we can safely free it in the same place. The ->destructor callback thus becomes optional for inet_frags. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 19:48:26 -07:00
Pavel Emelyanov	48d6005638	[INET]: Remove no longer needed ->equal callback Since this callback is used to check for conflicts in hashtable when inserting a newly created frag queue, we can do the same by checking for matching the queue with the argument, used to create one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 19:47:56 -07:00
Pavel Emelyanov	abd6523d15	[INET]: Consolidate xxx_find() in fragment management Here we need another callback ->match to check whether the entry found in hash matches the key passed. The key used is the same as the creation argument for inet_frag_create. Yet again, this ->match is the same for netfilter and ipv6. Running a frew steps forward - this callback will later replace the ->equal one. Since the inet_frag_find() uses the already consolidated inet_frag_create() remove the xxx_frag_create from protocol codes. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 19:47:21 -07:00
Pavel Emelyanov	c6fda28229	[INET]: Consolidate xxx_frag_create() This one uses the xxx_frag_intern() and xxx_frag_alloc() routines, which are already consolidated, so remove them from protocol code (as promised). The ->constructor callback is used to init the rest of the frag queue and it is the same for netfilter and ipv6. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 19:46:47 -07:00
Pavel Emelyanov	e521db9d79	[INET]: Consolidate xxx_frag_alloc() Just perform the kzalloc() allocation and setup common fields in the inet_frag_queue(). Then return the result to the caller to initialize the rest. The inet_frag_alloc() may return NULL, so check the return value before doing the container_of(). This looks ugly, but the xxx_frag_alloc() will be removed soon. The xxx_expire() timer callbacks are patches, because the argument is now the inet_frag_queue, not the protocol specific queue. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 19:45:23 -07:00
Pavel Emelyanov	2588fe1d78	[INET]: Consolidate xxx_frag_intern This routine checks for the existence of a given entry in the hash table and inserts the new one if needed. The ->equal callback is used to compare two frag_queue-s together, but this one is temporary and will be removed later. The netfilter code and the ipv6 one use the same routine to compare frags. The inet_frag_intern() always returns non-NULL pointer, so convert the inet_frag_queue into protocol specific one (with the container_of) without any checks. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 19:44:34 -07:00
Pavel Emelyanov	fd9e63544c	[INET]: Omit double hash calculations in xxx_frag_intern Since the hash value is already calculated in xxx_find, we can simply use it later. This is already done in netfilter code, so make the same in ipv4 and ipv6. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 19:43:37 -07:00
Pavel Emelyanov	dc8a82ad28	[IPV6]: Fix memory leak in cleanup_ipv6_mibs() The icmpv6msg mib statistics is not freed. This is almost not critical for current kernel, since ipv6 module is unloadable, but this can happen on load error and will happen every time we stop the network namespace (when we have one, of course). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 19:30:40 -07:00
Pavel Emelyanov	4acad72ded	[IPV6]: Consolidate the ip6_pol_route_(input\|output) pair The difference in both functions is in the "id" passed to the rt6_select, so just pass it as an extra argument from two outer helpers. This is minus 60 lines of code and 360 bytes of .text Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 13:02:51 -07:00
Denis V. Lunev	f1673ca52c	[INET]: kmalloc+memset -> kzalloc in frag_alloc_queue kmalloc + memset -> kzalloc in frag_alloc_queue Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:53:13 -07:00
Herbert Xu	e5bbef20e0	[IPV6]: Replace sk_buff ** with sk_buff * in input handlers With all the users of the double pointers removed from the IPv6 input path, this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input handlers. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:50:28 -07:00
Pavel Emelyanov	762cc40801	[INET]: Consolidate the xxx_put These ones use the generic data types too, so move them in one place. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:43 -07:00
Pavel Emelyanov	4b6cb5d8e3	[INET]: Small cleanup for xxx_put after evictor consolidation After the evictor code is consolidated there is no need in passing the extra pointer to the xxx_put() functions. The only place when it made sense was the evictor code itself. Maybe this change must got with the previous (or with the next) patch, but I try to make them shorter as much as possible to simplify the review (but they are still large anyway), so this change goes in a separate patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:43 -07:00
Pavel Emelyanov	8e7999c44e	[INET]: Consolidate the xxx_evictor The evictors collect some statistics for ipv4 and ipv6, so make it return the number of evicted queues and account them all at once in the caller. The XXX_ADD_STATS_BH() macros are just for this case, but maybe there are places in code, that can make use of them as well. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:42 -07:00
Pavel Emelyanov	1e4b82873a	[INET]: Consolidate the xxx_frag_destroy To make in possible we need to know the exact frag queue size for inet_frags->mem management and two callbacks: * to destoy the skb (optional, used in conntracks only) * to free the queue itself (mandatory, but later I plan to move the allocation and the destruction of frag_queues into the common place, so this callback will most likely be optional too). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:42 -07:00
Pavel Emelyanov	321a3a99e4	[INET]: Consolidate xxx_the secret_rebuild This code works with the generic data types as well, so move this into inet_fragment.c This move makes it possible to hide the secret_timer management and the secret_rebuild routine completely in the inet_fragment.c Introduce the ->hashfn() callback in inet_frags() to get the hashfun for a given inet_frag_queue() object. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:41 -07:00
Pavel Emelyanov	277e650ddf	[INET]: Consolidate the xxx_frag_kill Since now all the xxx_frag_kill functions now work with the generic inet_frag_queue data type, this can be moved into a common place. The xxx_unlink() code is moved as well. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:41 -07:00
Pavel Emelyanov	04128f233f	[INET]: Collect common frag sysctl variables together Some sysctl variables are used to tune the frag queues management and it will be useful to work with them in a common way in the future, so move them into one structure, moreover they are the same for all the frag management codes. I don't place them in the existing inet_frags object, introduced in the previous patch for two reasons: 1. to keep them in the __read_mostly section; 2. not to export the whole inet_frags objects outside. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:40 -07:00
Pavel Emelyanov	7eb95156d9	[INET]: Collect frag queues management objects together There are some objects that are common in all the places which are used to keep track of frag queues, they are: * hash table * LRU list * rw lock * rnd number for hash function * the number of queues * the amount of memory occupied by queues * secret timer Move all this stuff into one structure (struct inet_frags) to make it possible use them uniformly in the future. Like with the previous patch this mostly consists of hunks like - write_lock(&ipfrag_lock); + write_lock(&ip4_frags.lock); To address the issue with exporting the number of queues and the amount of memory occupied by queues outside the .c file they are declared in, I introduce a couple of helpers. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:39 -07:00
Pavel Emelyanov	5ab11c98d3	[INET]: Move common fields from frag_queues in one place. Introduce the struct inet_frag_queue in include/net/inet_frag.h file and place there all the common fields from three structs: * struct ipq in ipv4/ip_fragment.c * struct nf_ct_frag6_queue in nf_conntrack_reasm.c * struct frag_queue in ipv6/reassembly.c After this, replace these fields on appropriate structures with this structure instance and fix the users to use correct names i.e. hunks like - atomic_dec(&fq->refcnt); + atomic_dec(&fq->q.refcnt); (these occupy most of the patch) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:38 -07:00
Patrick McHardy	ad643a793b	[IPV6]: Uninline netfilter okfns Uninline netfilter okfns for those cases where gcc can generate tail-calls. Before: text data bss dec hex filename 8994153 1016524 524652 10535329 a0c1a1 vmlinux After: text data bss dec hex filename 8992761 1016524 524652 10533937 a0bc31 vmlinux ------------------------------------------------------- -1392 All cases have been verified to generate tail-calls with and without netfilter. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:36 -07:00
Adrian Bunk	1dff92e09e	[IPV6] __inet6_csk_dst_store(): fix check-after-use The Coverity checker spotted that we have already oops'ed if "dst" was NULL. Since "dst" being NULL doesn't seem to be possible at this point this patch removes the NULL check. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:32 -07:00
Herbert Xu	65c8846660	[IPV6]: Avoid skb_copy/pskb_copy/skb_realloc_headroom on input This patch replaces unnecessary uses of skb_copy by pskb_expand_head on the IPv6 input path. This allows us to remove the double pointers later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:31 -07:00
Herbert Xu	f61944efdf	[IPV6]: Make ipv6_frag_rcv return the same packet This patch implements the same change taht was done to ip_defrag. It makes ipv6_frag_rcv return the last packet received of a train of fragments rather than the head of that sequence. This allows us to get rid of the sk_buff ** argument later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:30 -07:00
Herbert Xu	3db05fea51	[NETFILTER]: Replace sk_buff ** with sk_buff * With all the users of the double pointers removed, this patch mops up by finally replacing all occurances of sk_buff ** in the netfilter API by sk_buff *. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:29 -07:00
Herbert Xu	2ca7b0ac02	[NETFILTER]: Avoid skb_copy/pskb_copy/skb_realloc_headroom This patch replaces unnecessary uses of skb_copy, pskb_copy and skb_realloc_headroom by functions such as skb_make_writable and pskb_expand_head. This allows us to remove the double pointers later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:28 -07:00
Herbert Xu	37d4187922	[NETFILTER]: Do not copy skb in skb_make_writable Now that all callers of netfilter can guarantee that the skb is not shared, we no longer have to copy the skb in skb_make_writable. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:27 -07:00
Brian Haley	4953f0fcc0	[IPv6]: Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2 From RFC 3493, Section 5.2: IPV6_MULTICAST_IF Set the interface to use for outgoing multicast packets. The argument is the index of the interface to use. If the interface index is specified as zero, the system selects the interface (for example, by looking up the address in a routing table and using the resulting interface). This patch adds support for (index == 0) to reset the value to it's original state, allowing the system to choose the best interface. IPv4 already behaves this way. Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-11 14:39:29 -07:00
Pierre Ynard	31910575a9	[IPv6]: Export userland ND options through netlink (RDNSS support) As discussed before, this patch provides userland with a way to access relevant options in Router Advertisements, after they are processed and validated by the kernel. Extra options are processed in a generic way; this patch only exports RDNSS options described in RFC5006, but support to control which options are exported could be easily added. A new rtnetlink message type is defined, to transport Neighbor Discovery options, along with optional context information. At the moment only the address of the router sending an RDNSS option is included, but additional attributes may be later defined, if needed by new use cases. Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 21:22:05 -07:00
Denis V. Lunev	cd40b7d398	[NET]: make netlink user -> kernel interface synchronious This patch make processing netlink user -> kernel messages synchronious. This change was inspired by the talk with Alexey Kuznetsov about current netlink messages processing. He says that he was badly wrong when introduced asynchronious user -> kernel communication. The call netlink_unicast is the only path to send message to the kernel netlink socket. But, unfortunately, it is also used to send data to the user. Before this change the user message has been attached to the socket queue and sk->sk_data_ready was called. The process has been blocked until all pending messages were processed. The bad thing is that this processing may occur in the arbitrary process context. This patch changes nlk->data_ready callback to get 1 skb and force packet processing right in the netlink_unicast. Kernel -> user path in netlink_unicast remains untouched. EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock drop, but the process remains in the cycle until the message will be fully processed. So, there is no need to use this kludges now. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 21:15:29 -07:00
Stephen Hemminger	227b60f510	[INET]: local port range robustness Expansion of original idea from Denis V. Lunev <den@openvz.org> Add robustness and locking to the local_port_range sysctl. 1. Enforce that low < high when setting. 2. Use seqlock to ensure atomic update. The locking might seem like overkill, but there are cases where sysadmin might want to change value in the middle of a DoS attack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 17:30:46 -07:00
Herbert Xu	ceb1eec829	[IPSEC]: Move IP length/checksum setting out of transforms This patch moves the setting of the IP length and checksum fields out of the transforms and into the xfrmX_output functions. This would help future efforts in merging the transforms themselves. It also adds an optimisation to ipcomp due to the fact that the transport offset is guaranteed to be zero. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:56 -07:00
Herbert Xu	87bdc48d30	[IPSEC]: Get rid of ipv6_{auth,esp,comp}_hdr This patch removes the duplicate ipv6_{auth,esp,comp}_hdr structures since they're identical to the IPv4 versions. Duplicating them would only create problems for ourselves later when we need to add things like extended sequence numbers. I've also added transport header type conversion headers for these types which are now used by the transforms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:55 -07:00
Herbert Xu	37fedd3aab	[IPSEC]: Use IPv6 calling convention as the convention for x->mode->output The IPv6 calling convention for x->mode->output is more general and could help an eventual protocol-generic x->type->output implementation. This patch adopts it for IPv4 as well and modifies the IPv4 type output functions accordingly. It also rewrites the IPv6 mac/transport header calculation to be based off the network header where practical. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:54 -07:00
Herbert Xu	7b277b1a5f	[IPSEC]: Set skb->data to payload in x->mode->output This patch changes the calling convention so that on entry from x->mode->output and before entry into x->type->output skb->data will point to the payload instead of the IP header. This is essentially a redistribution of skb_push/skb_pull calls with the aim of minimising them on the common path of tunnel + ESP. It'll also let us use the same calling convention between IPv4 and IPv6 with the next patch. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:54 -07:00
Herbert Xu	bee0b40c06	[IPSEC] beet: Fix extension header support on output The beet output function completely kills any extension headers by replacing them with the IPv6 header. This is because it essentially ignores the result of ip6_find_1stfragopt by simply acting as if there aren't any extension headers. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:53 -07:00
Mitsuru Chinen	f24e3d658c	[IPV6]: Defer IPv6 device initialization until a valid qdisc is specified To judge the timing for DAD, netif_carrier_ok() is used. However, there is a possibility that dev->qdisc stays noop_qdisc even if netif_carrier_ok() returns true. In that case, DAD NS is not sent out. We need to defer the IPv6 device initialization until a valid qdisc is specified. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:52 -07:00
Pavel Emelyanov	cf7732e4cc	[NET]: Make core networking code use seq_open_private This concerns the ipv4 and ipv6 code mostly, but also the netlink and unix sockets. The netlink code is an example of how to use the __seq_open_private() call - it saves the net namespace on this private. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:33 -07:00
Herbert Xu	b7c6538cd8	[IPSEC]: Move state lock into x->type->output This patch releases the lock on the state before calling x->type->output. It also adds the lock to the spots where they're currently needed. Most of those places (all except mip6) are expected to disappear with async crypto. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:03 -07:00
Herbert Xu	007f0211a8	[IPSEC]: Store IPv6 nh pointer in mac_header on output Current the x->mode->output functions store the IPv6 nh pointer in the skb network header. This is inconvenient because the network header then has to be fixed up before the packet can leave the IPsec stack. The mac header field is unused on output so we can use that to store this instead. This patch does that and removes the network header fix-up in xfrm_output. It also uses ipv6_hdr where appropriate in the x->type->output functions. There is also a minor clean-up in esp4 to make it use the same code as esp6 to help any subsequent effort to merge the two. Lastly it kills two redundant skb_set_* statements in BEET that were simply copied over from transport mode. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:00 -07:00
Benjamin Thery	0a8891a0a4	[IPv6]: use container_of() macro in fib6_clean_node() In ip6_fib.c, fib6_clean_node() casts a fib6_walker_t pointer to a fib6_cleaner_t pointer assuming a struct fib6_walker_t (field 'w') is the first field in struct fib6_walker_t. To prevent any future problems that may occur if one day a field is inadvertently inserted before the 'w' field in struct fib6_cleaner_t, (and to improve readability), this patch uses the container_of() macro. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:58 -07:00
Herbert Xu	45b17f48ea	[IPSEC]: Move RO-specific output code into xfrm6_mode_ro.c The lastused update check in xfrm_output can be done just as well in the mode output function which is specific to RO. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:56 -07:00
Herbert Xu	436a0a4022	[IPSEC]: Move output replay code into xfrm_output The replay counter is one of only two remaining things in the output code that requires a lock on the xfrm state (the other being the crypto). This patch moves it into the generic xfrm_output so we can remove the lock from the transforms themselves. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:54 -07:00
Herbert Xu	406ef77c89	[IPSEC]: Move common output code to xfrm_output Most of the code in xfrm4_output_one and xfrm6_output_one are identical so this patch moves them into a common xfrm_output function which will live in net/xfrm. In fact this would seem to fix a bug as on IPv4 we never reset the network header after a transform which may upset netfilter later on. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:53 -07:00
Herbert Xu	bc31d3b2c7	[IPSEC] ah: Remove keys from ah_data structure The keys are only used during initialisation so we don't need to carry them in esp_data. Since we don't have to allocate them again, there is no need to place a limit on the authentication key length anymore. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:53 -07:00
Herbert Xu	4b7137ff8f	[IPSEC] esp: Remove keys from esp_data structure The keys are only used during initialisation so we don't need to carry them in esp_data. Since we don't have to allocate them again, there is no need to place a limit on the authentication key length anymore. This patch also kills the unused auth.icv member. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:52 -07:00
Stephen Hemminger	cfcabdcc2d	[NET]: sparse warning fixes Fix a bunch of sparse warnings. Mostly about 0 used as NULL pointer, and shadowed variable declarations. One notable case was that hash size should have been unsigned. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:48 -07:00
Patrick McHardy	f73e924cdd	[NETFILTER]: ctnetlink: use netlink policy Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:35 -07:00
Patrick McHardy	fdf708322d	[NETFILTER]: nfnetlink: rename functions containing 'nfattr' There is no struct nfattr anymore, rename functions to 'nlattr'. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:32 -07:00
Patrick McHardy	df6fb868d6	[NETFILTER]: nfnetlink: convert to generic netlink attribute functions Get rid of the duplicated rtnetlink macros and use the generic netlink attribute functions. The old duplicated stuff is moved to a new header file that exists just for userspace. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:31 -07:00
Stephen Hemminger	3b04ddde02	[NET]: Move hardware header operations out of netdevice. Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:52 -07:00
Stephen Hemminger	b95cce3576	[NET]: Wrap hard_header_parse Wrap the hard_header_parse function to simplify next step of header_ops conversion. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:51 -07:00
Stephen Hemminger	0c4e85813d	[NET]: Wrap netdevice hardware header creation. Add inline for common usage of hardware header creation, and fix bug in IPV6 mcast where the assumption about negative return is an errno. Negative return from hard_header means not enough space was available,(ie -N bytes). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:50 -07:00
Eric W. Biederman	2774c7aba6	[NET]: Make the loopback device per network namespace. This patch makes loopback_dev per network namespace. Adding code to create a different loopback device for each network namespace and adding the code to free a loopback device when a network namespace exits. This patch modifies all users the loopback_dev so they access it as init_net.loopback_dev, keeping all of the code compiling and working. A later pass will be needed to update the users to use something other than the initial network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:49 -07:00
Daniel Lezcano	de3cb747ff	[NET]: Dynamically allocate the loopback device, part 1. This patch replaces all occurences to the static variable loopback_dev to a pointer loopback_dev. That provides the mindless, trivial, uninteressting change part for the dynamic allocation for the loopback. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-By: Kirill Korotaev <dev@sw.ru> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:14 -07:00
David L Stevens	14878f75ab	[IPV6]: Add ICMPMsgStats MIB (RFC 4293) [rev 2] Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches "remove" (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for "Icmp" to get the named data from new counters. [new to 2nd revision] 6) support per-interface ICMP stats 7) use common macro for per-device stat macros Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:27 -07:00
Denis V. Lunev	76c72d4f44	[IPV4/IPV6/DECNET]: Small cleanup for fib rules. This patch slightly cleanups FIB rules framework. rules_list as a pointer on struct fib_rules_ops is useless. It is always assigned with a static per/subsystem list in IPv4, IPv6 and DecNet. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:22 -07:00
Milan Kocian	0b69d4bd26	[IPV6]: Remove redundant RTM_DELLINK message. Remove useless message. We get the right message from another subsystem. Signed-off-by: Milan Kocian <milon@wq.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:18 -07:00
Ralf Baechle	10d024c1b2	[NET]: Nuke SET_MODULE_OWNER macro. It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:13 -07:00
Thomas Graf	8f4c1f9b04	[NETLINK]: Introduce nested and byteorder flag to netlink attribute This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:16 -07:00
Eric W. Biederman	881d966b48	[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:10 -07:00
Eric W. Biederman	b4b510290b	[NET]: Support multiple network namespaces with netlink Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Eric W. Biederman	e9dc865340	[NET]: Make device event notification network namespace safe Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Eric W. Biederman	e730c15519	[NET]: Make packet reception network namespace safe This patch modifies every packet receive function registered with dev_add_pack() to drop packets if they are not from the initial network namespace. This should ensure that the various network stacks do not receive packets in a anything but the initial network namespace until the code has been converted and is ready for them. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:08 -07:00
Eric W. Biederman	1b8d7ae42d	[NET]: Make socket creation namespace safe. This patch passes in the namespace a new socket should be created in and has the socket code do the appropriate reference counting. By virtue of this all socket create methods are touched. In addition the socket create methods are modified so that they will fail if you attempt to create a socket in a non-default network namespace. Failing if we attempt to create a socket outside of the default network namespace ensures that as we incrementally make the network stack network namespace aware we will not export functionality that someone has not audited and made certain is network namespace safe. Allowing us to partially enable network namespaces before all of the exotic protocols are supported. Any protocol layers I have missed will fail to compile because I now pass an extra parameter into the socket creation code. [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:07 -07:00
Eric W. Biederman	457c4cbc5a	[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:06 -07:00
Micah Gruber	1dfcae7765	[IPV6]: Remove unneeded pointer iph from ipcomp6_input() in net/ipv6/ipcomp6.c This trivial patch removes the unneeded pointer iph, which is never used. Signed-off-by: Micah Gruber <micah.gruber@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:58 -07:00
Masahide NAKAMURA	1e5dc14617	[IPV6] IPSEC: Omit redirect for tunnelled packet. IPv6 IPsec tunnel gateway incorrectly sends redirect to router or sender when network device the IPsec tunnelled packet is arrived is the same as the one the decapsulated packet is sent. With this patch, it omits to send the redirect when the forwarding skbuff carries secpath, since such skbuff should be assumed as a decapsulated packet from IPsec tunnel by own. It may be a rare case for an IPsec security gateway, however it is not rare when the gateway is MIPv6 Home Agent since the another tunnel end-point is Mobile Node and it changes the attached network. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:33 -07:00
Noriaki TAKAMIYA	a47ed4cd8c	[IPV6] XFRM: Fix connected socket to use transformation. When XFRM policy and state are ready after TCP connection is started, the traffic should be transformed immediately, however it does not on IPv6 TCP. It depends on a dst cache replacement policy with connected socket. It seems that the replacement is always done for IPv4, however, on IPv6 case it is done only when routing cookie is changed. This patch fix that non-transformation dst can be changed to transformation one. This behavior is required by MIPv6 and improves IPv6 IPsec. Fixes by Masahide NAKAMURA. Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:32 -07:00
Brian Haley	e773e4faa1	[IPV6]: Add v4mapped address inline Add v4mapped address inline to avoid calls to ipv6_addr_type(). Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:32 -07:00
Brian Haley	bf0b48dfc3	[IPv6]: Fix ICMPv6 redirect handling with target multicast address When the ICMPv6 Target address is multicast, Linux processes the redirect instead of dropping it. The problem is in this code in ndisc_redirect_rcv(): if (ipv6_addr_equal(dest, target)) { on_link = 1; } else if (!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) { ND_PRINTK2(KERN_WARNING "ICMPv6 Redirect: target address is not link-local.\n"); return; } This second check will succeed if the Target address is, for example, FF02::1 because it has link-local scope. Instead, it should be checking if it's a unicast link-local address, as stated in RFC 2461/4861 Section 8.1: - The ICMP Target Address is either a link-local address (when redirected to a router) or the same as the ICMP Destination Address (when redirected to the on-link destination). I know this doesn't explicitly say unicast link-local address, but it's implied. This bug is preventing Linux kernels from achieving IPv6 Logo Phase II certification because of a recent error that was found in the TAHI test suite - Neighbor Disovery suite test 206 (v6LC.2.3.6_G) had the multicast address in the Destination field instead of Target field, so we were passing the test. This won't be the case anymore. The patch below fixes this problem, and also fixes ndisc_send_redirect() to not send an invalid redirect with a multicast address in the Target field. I re-ran the TAHI Neighbor Discovery section to make sure Linux passes all 245 tests now. Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-08 00:12:05 -07:00
David S. Miller	f8ab18d2d9	[TCP]: Fix MD5 signature handling on big-endian. Based upon a report and initial patch by Peter Lieven. tcp4_md5sig_key and tcp6_md5sig_key need to start with the exact same members as tcp_md5sig_key. Because they are both cast to that type by tcp_v{4,6}_md5_do_lookup(). Unfortunately tcp{4,6}_md5sig_key use a u16 for the key length instead of a u8, which is what tcp_md5sig_key uses. This just so happens to work by accident on little-endian, but on big-endian it doesn't. Instead of casting, just place tcp_md5sig_key as the first member of the address-family specific structures, adjust the access sites, and kill off the ugly casts. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-28 15:18:35 -07:00
Jiri Kosina	6ae5f983cf	[IPV6]: Fix source address selection. The commit 95c385 broke proper source address selection for cases in which there is a address which is makred 'deprecated'. The commit mistakenly changed ifa->flags to ifa_result->flags (probably copy/paste error from a few lines above) in the 'Rule 3' address selection code. The patch restores the previous RFC-compliant behavior. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-16 14:48:21 -07:00
YOSHIFUJI Hideaki	cd562c9859	[IPV6]: Just increment OutDatagrams once per a datagram. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-14 17:15:01 -07:00
YOSHIFUJI Hideaki	3ef9d943d2	[IPV6]: Fix unbalanced socket reference with MSG_CONFIRM. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-14 16:45:40 -07:00
YOSHIFUJI Hideaki	e1f52208bb	[IPv6]: Fix NULL pointer dereference in ip6_flush_pending_frames Some of skbs in sk->write_queue do not have skb->dst because we do not fill skb->dst when we allocate new skb in append_data(). BTW, I think we may not need to (or we should not) increment some stats when using corking; if 100 sendmsg() (with MSG_MORE) result in 2 packets, how many should we increment? If 100, we should set skb->dst for every queued skbs. If 1 (or 2 ()), we increment the stats for the first queued skb and we should just skip incrementing OutDiscards for the rest of queued skbs, adn we should also impelement this semantics in other places; e.g., we should increment other stats just once, not 100 times. : depends on the place we are discarding the datagram. I guess should just increment by 1 (or 2). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-11 11:31:43 +02:00
Neil Horman	16fcec35e7	[NETFILTER]: Fix/improve deadlock condition on module removal netfilter So I've had a deadlock reported to me. I've found that the sequence of events goes like this: 1) process A (modprobe) runs to remove ip_tables.ko 2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket, increasing the ip_tables socket_ops use count 3) process A acquires a file lock on the file ip_tables.ko, calls remove_module in the kernel, which in turn executes the ip_tables module cleanup routine, which calls nf_unregister_sockopt 4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the calling process into uninterruptible sleep, expecting the process using the socket option code to wake it up when it exits the kernel 4) the user of the socket option code (process B) in do_ipt_get_ctl, calls ipt_find_table_lock, which in this case calls request_module to load ip_tables_nat.ko 5) request_module forks a copy of modprobe (process C) to load the module and blocks until modprobe exits. 6) Process C. forked by request_module process the dependencies of ip_tables_nat.ko, of which ip_tables.ko is one. 7) Process C attempts to lock the request module and all its dependencies, it blocks when it attempts to lock ip_tables.ko (which was previously locked in step 3) Theres not really any great permanent solution to this that I can see, but I've developed a two part solution that corrects the problem Part 1) Modifies the nf_sockopt registration code so that, instead of using a use counter internal to the nf_sockopt_ops structure, we instead use a pointer to the registering modules owner to do module reference counting when nf_sockopt calls a modules set/get routine. This prevents the deadlock by preventing set 4 from happening. Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking remove operations (the same way rmmod does), and add an option to explicity request blocking operation. So if you select blocking operation in modprobe you can still cause the above deadlock, but only if you explicity try (and since root can do any old stupid thing it would like.... :) ). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-11 11:28:26 +02:00
Denis V. Lunev	9e3be4b343	[IPV6]: Freeing alive inet6 address From: Denis V. Lunev <den@openvz.org> addrconf_dad_failure calls addrconf_dad_stop which takes referenced address and drops the count. So, in6_ifa_put perrformed at out: is extra. This results in message: "Freeing alive inet6 address" and not released dst entries. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-11 11:04:49 +02:00
Flavio Leitner	a96fb49be3	[NET]: Fix IP_ADD/DROP_MEMBERSHIP to handle only connectionless Fix IP[V6]_ADD_MEMBERSHIP and IP[V6]_DROP_MEMBERSHIP to return -EPROTO for connection oriented sockets. Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-26 18:35:35 -07:00
Wei Yongjun	8984e41d18	[IPV6]: Fix kernel panic while send SCTP data with IP fragments If ICMP6 message with "Packet Too Big" is received after send SCTP DATA, kernel panic will occur when SCTP DATA is send again. This is because of a bad dest address when call to skb_copy_bits(). The messages sequence is like this: Endpoint A Endpoint B <------- SCTP DATA (size=1432) ICMP6 message -------> (Packet Too Big pmtu=1280) <------- Resend SCTP DATA (size=1432) ------------kernel panic--------------- printing eip: c05be62a pde = 00000000 Oops: 0002 [#1] SMP Modules linked in: scomm l2cap bluetooth ipv6 dm_mirror dm_mod video output sbs battery lp floppy sg i2c_piix4 i2c_core pcnet32 mii button ac parport_pc parport ide_cd cdrom serio_raw mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<c05be62a>] Not tainted VLI EFLAGS: 00010282 (2.6.23-rc2 #1) EIP is at skb_copy_bits+0x4f/0x1ef eax: 000004d0 ebx: ce12a980 ecx: 00000134 edx: cfd5a880 esi: c8246858 edi: 00000000 ebp: c0759b14 esp: c0759adc ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Process swapper (pid: 0, ti=c0759000 task=c06d0340 task.ti=c0713000) Stack: c0759b88 c0405867 ce12a980 c8bff838 c789c084 00000000 00000028 cfd5a880 d09f1890 000005dc 0000007b ce12a980 cfd5a880 c8bff838 c0759b88 d09bc521 000004d0 fffff96c 00000200 00000100 c0759b50 cfd5a880 00000246 c0759bd4 Call Trace: [<c0405e1d>] show_trace_log_lvl+0x1a/0x2f [<c0405ecd>] show_stack_log_lvl+0x9b/0xa3 [<c040608d>] show_registers+0x1b8/0x289 [<c0406271>] die+0x113/0x246 [<c0625dbc>] do_page_fault+0x4ad/0x57e [<c0624642>] error_code+0x72/0x78 [<d09bc521>] ip6_output+0x8e5/0xab2 [ipv6] [<d09bcec1>] ip6_xmit+0x2ea/0x3a3 [ipv6] [<d0a3f2ca>] sctp_v6_xmit+0x248/0x253 [sctp] [<d0a3c934>] sctp_packet_transmit+0x53f/0x5ae [sctp] [<d0a34bf8>] sctp_outq_flush+0x555/0x587 [sctp] [<d0a34d3c>] sctp_retransmit+0xf8/0x10f [sctp] [<d0a3d183>] sctp_icmp_frag_needed+0x57/0x5b [sctp] [<d0a3ece2>] sctp_v6_err+0xcd/0x148 [sctp] [<d09cf1ce>] icmpv6_notify+0xe6/0x167 [ipv6] [<d09d009a>] icmpv6_rcv+0x7d7/0x849 [ipv6] [<d09be240>] ip6_input+0x1dc/0x310 [ipv6] [<d09be965>] ipv6_rcv+0x294/0x2df [ipv6] [<c05c3789>] netif_receive_skb+0x2d2/0x335 [<c05c5733>] process_backlog+0x7f/0xd0 [<c05c58f6>] net_rx_action+0x96/0x17e [<c042e722>] __do_softirq+0x64/0xcd [<c0406f37>] do_softirq+0x5c/0xac ======================= Code: 00 00 29 ca 89 d0 2b 45 e0 89 55 ec 85 c0 7e 35 39 45 08 8b 55 e4 0f 4e 45 08 8b 75 e0 8b 7d dc 89 c1 c1 e9 02 03 b2 a0 00 00 00 <f3> a5 89 c1 83 e1 03 74 02 f3 a4 29 45 08 0f 84 7b 01 00 00 01 EIP: [<c05be62a>] skb_copy_bits+0x4f/0x1ef SS:ESP 0068:c0759adc Kernel panic - not syncing: Fatal exception in interrupt Arnaldo says: ==================== Thanks! I'm to blame for this one, problem was introduced in: `b0e380b1d8` @@ -761,7 +762,7 @@ slow_path: / * Copy a block of the IP datagram. */ - if (skb_copy_bits(skb, ptr, frag->h.raw, len)) + if (skb_copy_bits(skb, ptr, skb_transport_header(skb), len)) BUG(); left -= len; ==================== Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-21 20:59:08 -07:00
Ilpo Järvinen	660adc6e60	[IPv6]: Invalid semicolon after if statement A similar fix to netfilter from Eric Dumazet inspired me to look around a bit by using some grep/sed stuff as looking for this kind of bugs seemed easy to automate. This is one of them I found where it looks like this semicolon is not valid. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-15 15:07:30 -07:00
Jesper Juhl	703310e645	[IPV6]: Clean up duplicate includes in net/ipv6/ This patch cleans up duplicate includes in net/ipv6/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-13 22:52:03 -07:00
David S. Miller	3516ffb0fe	[TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg(). As discovered by Evegniy Polyakov, if we try to sendmsg after a connection reset, we can do incredibly stupid things. The core issue is that inet_sendmsg() tries to autobind the socket, but we should never do that for TCP. Instead we should just go straight into TCP's sendmsg() code which will do all of the necessary state and pending socket error checks. TCP's sendpage already directly vectors to tcp_sendpage(), so this merely brings sendmsg() in line with that. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:42:28 -07:00
Adrian Bunk	1a3a206f7f	[NETFILTER]: Make nf_ct_ipv6_skip_exthdr() static. nf_ct_ipv6_skip_exthdr() can now become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:26 -07:00
Dave Johnson	c61a7d10ef	[IPV6]: ipv6_addr_type() doesn't know about RFC4193 addresses. ipv6_addr_type() doesn't check for 'Unique Local IPv6 Unicast Addresses' (RFC4193) and returns IPV6_ADDR_RESERVED for that range. SCTP uses this function and will fail bind() and connect() calls that use RFC4193 addresses, SCTP will also ignore inbound connections from RFC4193 addresses if listening on IPV6_ADDR_ANY. There may be other users of ipv6_addr_type() that could also have problems. Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:21 -07:00
Herbert Xu	b217d616a1	[IPV4/IPV6]: Fail registration if inet device construction fails Now that netdev notifications can fail, we can use this to signal errors during registration for IPv4/IPv6. In particular, if we fail to allocate memory for the inet device, we can fail the netdev registration. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:16 -07:00
Simon Arlott	566cfd8f0e	[IPV6]: Don't update ADVMSS on routes where the MTU is not also updated The ADVMSS value was incorrectly updated for ALL routes when the MTU is updated because it's outside the effect of the if statement's condition. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:04 -07:00
Al Viro	704eae1f32	ip6_tunnel - endianness annotations Convert rel_info to host-endian before calling ip6_tnl_err(). The things become much more straightforward that way. The key observation (and the reason why that code actually worked) is that after ip6_tnl_err() we either immediately bailed out or had rel_info set to 0 or had it set to host-endian and guaranteed to hit (rel_type == ICMP_DEST_UNREACH && rel_code == ICMP_FRAG_NEEDED) case. So inconsistent endianness didn't really lead to bugs, but it had been subtle and prone to breakage. New variant is saner and obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:11:56 -07:00
Patrick McHardy	7e2acc7e27	[NETFILTER]: Fix logging regression Loading one of the LOG target fails if a different target has already registered itself as backend for the same family. This can affect the ipt_LOG and ipt_ULOG modules when both are loaded. Reported and tested by: <t.artem@mailcity.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-24 15:29:55 -07:00
YOSHIFUJI Hideaki	ca983cefd9	[TCPv6] MD5SIG: Ensure to reset allocation count to avoid panic. After clearing all passwords for IPv6 peers, we need to set allocation count to zero as well as we free the storage. Otherwise, we panic when a user trys to (re)add a password. Discovered and fixed by MIYAJIMA Mitsuharu <miyajima.mitsuharu@anchor.jp>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-24 15:27:30 -07:00
Al Viro	b77f2fa629	[IPV6]: endianness bug in ip6_tunnel Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-21 19:09:41 -07:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Vlad Yasevich	063ed369c9	[IPV6]: Call inet6addr_chain notifiers on link down Currently if the link is brought down via ip link or ifconfig down, the inet6addr_chain notifiers are not called even though all the addresses are removed from the interface. This caused SCTP to add duplicate addresses to it's list. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-15 00:16:35 -07:00
Dmitry Butskoy	f13ec93fba	[IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets From: Dmitry Butskoy <dmitry@butskoy.name> Taken from http://bugzilla.kernel.org/show_bug.cgi?id=8747 Problem Description: It is related to the possibility to obtain MSG_ERRQUEUE messages from the udp and raw sockets, both connected and unconnected. There is a little typo in net/ipv6/icmp.c code, which prevents such messages to be delivered to the errqueue of the correspond raw socket, when the socket is CONNECTED. The typo is due to swap of local/remote addresses. Consider __raw_v6_lookup() function from net/ipv6/raw.c. When a raw socket is looked up usual way, it is something like: sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif); where "daddr" is a destination address of the incoming packet (IOW our local address), "saddr" is a source address of the incoming packet (the remote end). But when the raw socket is looked up for some icmp error report, in net/ipv6/icmp.c:icmpv6_notify() , daddr/saddr are obtained from the echoed fragment of the "bad" packet, i.e. "daddr" is the original destination address of that packet, "saddr" is our local address. Hence, for icmpv6_notify() must use "saddr, daddr" in its arguments, not "daddr, saddr" ... Steps to reproduce: Create some raw socket, connect it to an address, and cause some error situation: f.e. set ttl=1 where the remote address is more than 1 hop to reach. Set IPV6_RECVERR . Then send something and wait for the error (f.e. poll() with POLLERR\|POLLIN). You should receive "time exceeded" icmp message (because of "ttl=1"), but the socket do not receive it. If you do not connect your raw socket, you will receive MSG_ERRQUEUE successfully. (The reason is that for unconnected socket there are no actual checks for local/remote addresses). Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-14 23:53:08 -07:00
Patrick McHardy	61075af51f	[NETFILTER]: nf_conntrack: mark protocols __read_mostly Also remove two unnecessary EXPORT_SYMBOLs and move the nf_conntrack_l3proto_ipv4 declaration to the correct file. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-14 20:48:19 -07:00
Patrick McHardy	a887c1c148	[NETFILTER]: Lower *tables printk severity Lower ip6tables, arptables and ebtables printk severity similar to Dan Aloni's patch for iptables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-14 20:46:15 -07:00
Yasuyuki Kozakai	e2a3123fbe	[NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it nf_ct_get_tuple() requires the offset to transport header and that bothers callers such as icmp[v6] l4proto modules. This introduces new function to simplify them. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-14 20:45:14 -07:00
Yasuyuki Kozakai	ffc3069048	[NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it The icmp[v6] l4proto modules parse headers in ICMP[v6] error to get tuple. But they have to find the offset to transport protocol header before that. Their processings are almost same as prepare() of l3proto modules. This makes prepare() more generic to simplify icmp[v6] l4proto module later. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-14 20:44:50 -07:00
Yasuyuki Kozakai	d87d8469e2	[NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-14 20:44:23 -07:00
Philippe De Muyter	56b3d975bb	[NET]: Make all initialized struct seq_operations const. Make all initialized struct seq_operations in net/ const Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 23:07:31 -07:00
Micah Gruber	dffe4f048b	[IPV6]: Remove unneeded pointer idev from addrconf_cleanup(). This trivial patch removes the unneeded pointer idev returned from __in6_dev_get(), which is never used. The check for NULL can be simply done by if (__in6_dev_get(dev) == NULL). Signed-off-by: Micah Gruber <micah.gruber@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 23:04:19 -07:00
YOSHIFUJI Hideaki	4c752098f5	[IPV6]: Make IPV6_{RECV,2292}RTHDR boolean options. Because reversing RH0 is no longer supported by deprecation of RH0, let's make IPV6_{RECV,2292}RTHDR boolean options. Boolean are more appropriate from standard POV. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:56:31 -07:00
YOSHIFUJI Hideaki	bb4dbf9e61	[IPV6]: Do not send RH0 anymore. Based on <draft-ietf-ipv6-deprecate-rh0-00.txt>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:55:49 -07:00
YOSHIFUJI Hideaki	c382bb9d32	[IPV6]: Restore semantics of Routing Header processing. The "fix" for emerging security threat was overkill and it broke basic semantic of IPv6 routing header processing. We should assume RT0 (or even RT2, depends on configuration) as "unknown" RH type so that we - silently ignore the routing header if segleft == 0 - send ICMPv6 Parameter Problem message back to the sender, otherwise. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:47:58 -07:00
Patrick McHardy	cfbba49d80	[NET]: Avoid copying writable clones in tunnel drivers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:19:05 -07:00
Patrick McHardy	0d53778e81	[NETFILTER]: Convert DEBUGP to pr_debug Convert DEBUGP to pr_debug and fix lots of non-compiling debug statements. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:18:20 -07:00
Patrick McHardy	330f7db5e5	[NETFILTER]: nf_conntrack: remove 'ignore_conntrack' argument from nf_conntrack_find_get All callers pass NULL, this also doesn't seem very useful for modules. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:17:41 -07:00
Yasuyuki Kozakai	dacd2a1a5c	[NETFILTER]: nf_conntrack: remove old memory allocator of conntrack Now memory space for help and NAT are allocated by extension infrastructure. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:17:35 -07:00
Patrick McHardy	9f15c5302d	[NETFILTER]: x_tables: mark matches and targets __read_mostly Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:17:15 -07:00
Jozsef Kadlecsik	ba9dda3ab5	[NETFILTER]: x_tables: add TRACE target The TRACE target can be used to follow IP and IPv6 packets through the ruleset. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick NcHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:17:14 -07:00
Jan Engelhardt	7c4e36bc17	[NETFILTER]: Remove redundant parentheses/braces Removes redundant parentheses and braces (And add one pair in a xt_tcpudp.c macro). Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:17:11 -07:00
Jan Engelhardt	a47362a226	[NETFILTER]: add some consts, remove some casts Make a number of variables const and/or remove unneeded casts. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:17:01 -07:00
Jan Engelhardt	e1931b784a	[NETFILTER]: x_tables: switch xt_target->checkentry to bool Switch the return type of target checkentry functions to boolean. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:59 -07:00
Jan Engelhardt	ccb79bdce7	[NETFILTER]: x_tables: switch xt_match->checkentry to bool Switch the return type of match functions to boolean Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:58 -07:00
Jan Engelhardt	1d93a9cbad	[NETFILTER]: x_tables: switch xt_match->match to bool Switch the return type of match functions to boolean Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:57 -07:00
Jan Engelhardt	cff533ac12	[NETFILTER]: x_tables: switch hotdrop to bool Switch the "hotdrop" variables to boolean Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:56 -07:00
Stephen Hemminger	d212f87b06	[NET]: IPV6 checksum offloading in network devices The existing model for checksum offload does not correctly handle devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag implies device can do any arbitrary protocol. This patch: * adds NETIF_F_IPV6_CSUM for those devices * fixes bnx2 and tg3 devices that need it * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO) * fixes assumptions about NETIF_F_ALL_CSUM in nat * adjusts bridge union of checksumming computation Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:52 -07:00
Masahide NAKAMURA	d3d6dd3ada	[XFRM]: Add module alias for transformation type. It is clean-up for XFRM type modules and adds aliases with its protocol: ESP, AH, IPCOMP, IPIP and IPv6 for IPsec ROUTING and DSTOPTS for MIPv6 It is almost the same thing as XFRM mode alias, but it is added new defines XFRM_PROTO_XXX for preprocessing since some protocols are defined as enum. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Acked-by: Ingo Oeser <netdev@axxeo.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:43 -07:00
Masahide NAKAMURA	59fbb3a61e	[IPV6] MIP6: Loadable module support for MIPv6. This patch makes MIPv6 loadable module named "mip6". Here is a modprobe.conf(5) example to load it automatically when user application uses XFRM state for MIPv6: alias xfrm-type-10-43 mip6 alias xfrm-type-10-60 mip6 Some MIPv6 feature is not included by this modular, however, it should not be affected to other features like either IPsec or IPv6 with and without the patch. We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt separately for future work. Loadable features: * MH receiving check (to send ICMP error back) * RO header parsing and building (i.e. RH2 and HAO in DSTOPTS) * XFRM policy/state database handling for RO These are NOT covered as loadable: * Home Address flags and its rule on source address selection * XFRM sub policy (depends on its own kernel option) * XFRM functions to receive RO as IPv6 extension header * MH sending/receiving through raw socket if user application opens it (since raw socket allows to do so) * RH2 sending as ancillary data * RH2 operation with setsockopt(2) Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:42 -07:00
Masahide NAKAMURA	136ebf08b4	[IPV6] MIP6: Kill unnecessary ifdefs. Kill unnecessary CONFIG_IPV6_MIP6. o It is redundant for RAW socket to keep MH out with the config then it can handle any protocol. o Clean-up at AH. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:41 -07:00
Jay Vosburgh	c2edacf80e	bonding / ipv6: no addrconf for slaves separately from master At present, when a device is enslaved to bonding, if ipv6 is active then addrconf will be initated on the slave (because it is closed then opened during the enslavement processing). This causes DAD and RS packets to be sent from the slave. These packets in turn can confuse switches that perform ipv6 snooping, causing them to incorrectly update their forwarding tables (if, e.g., the slave being added is an inactve backup that won't be used right away) and direct traffic away from the active slave to a backup slave (where the incoming packets will be dropped). This patch alters the behavior so that addrconf will only run on the master device itself. I believe this is logically correct, as it prevents slaves from having an IPv6 identity independent from the master. This is consistent with the IPv4 behavior for bonding. This is accomplished by (a) having bonding set IFF_SLAVE sooner in the enslavement processing than currently occurs (before open, not after), and (b) having ipv6 addrconf ignore UP and CHANGE events on slave devices. The eql driver also uses the IFF_SLAVE flag. I inspected eql, and I believe this change is reasonable for its usage of IFF_SLAVE, but I did not test it. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-07-10 12:41:19 -04:00
YOSHIFUJI Hideaki	6d5b78cdd5	[IPV6] NDISC: Fix thinko to control Router Preference support. Bug reported by Haruhito Watanabe <haruhito@sfc.keio.ac.jp>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-22 16:07:04 -07:00
Herbert Xu	74235a25c6	[IPV6] addrconf: Fix IPv6 on tuntap tunnels The recent patch that added ipv6_hwtype is broken on tuntap tunnels. Indeed, it's broken on any device that does not pass the ipv6_hwtype test. The reason is that the original test only applies to autoconfiguration, not IPv6 support. IPv6 support is allowed on any device. In fact, even with the ipv6_hwtype patch applied you can still add IPv6 addresses to any interface that doesn't pass thw ipv6_hwtype test provided that they have a sufficiently large MTU. This is a serious problem because come deregistration time these devices won't be cleaned up properly. I've gone back and looked at the rationale for the patch. It appears that the real problem is that we were creating IPv6 devices even if the MTU was too small. So here's a patch which fixes that and reverts the ipv6_hwtype stuff. Thanks to Kanru Chen for reporting this issue. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-14 13:02:55 -07:00
David S. Miller	3d7dbeac58	[TCP]: Disable TSO if MD5SIG is enabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-12 14:36:42 -07:00
David S. Miller	df2bc459a3	[UDP]: Revert 2-pass hashing changes. This reverts changesets: `6aaf47fa48` `b7b5f487ab` `de34ed91c4` `fc038410b4` There are still some correctness issues recently discovered which do not have a known fix that doesn't involve doing a full hash table scan on port bind. So revert for now. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:50 -07:00
Patrick McHarrdy	3c158f7f57	[NETFILTER]: nf_conntrack: fix helper module unload races When a helper module is unloaded all conntracks refering to it have their helper pointer NULLed out, leading to lots of races. In most places this can be fixed by proper use of RCU (they do already check for != NULL, but in a racy way), additionally nf_conntrack_expect_related needs to bail out when no helper is present. Also remove two paranoid BUG_ONs in nf_conntrack_proto_gre that are racy and not worth fixing. Signed-off-by: Patrick McHarrdy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:26 -07:00
Patrick McHardy	ef7c79ed64	[NETLINK]: Mark netlink policies const Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:10 -07:00
Bill Nottingham	75202e7689	[NET]: Fix comparisons of unsigned < 0. Recent gcc versions emit warnings when unsigned variables are compared < 0 or >= 0. Signed-off-by: Bill Nottingham <notting@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:47 -07:00
David S. Miller	8c7fc03e27	[IPV6]: Fix build warning. net/ipv6/ip6_fib.c: In function ‘fib6_add_rt2node’: net/ipv6/ip6_fib.c:661: warning: label ‘out’ defined but not used Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:31 -07:00
Kazunori MIYAZAWA	f282d45cb4	[IPSEC]: Fix panic when using inter address familiy IPsec on loopback. Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:28 -07:00
YOSHIFUJI Hideaki	7ebba6d14f	[IPV6] ROUTE: No longer handle ::/0 specially. We do not need to handle ::/0 routes specially any longer. This should fix BUG #8349. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Yuji Sekiya <sekiya@wide.ad.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:26 -07:00
Kazunori MIYAZAWA	144466bdf8	[IPSEC]: Fix IPv6 AH calculation in outbound Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:25 -07:00
David S. Miller	14e50e57ae	[XFRM]: Allow packet drops during larval state resolution. The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 18:17:54 -07:00
Oliver Hartkopp	bbb711e633	[IPV6]: Ignore ipv6 events on non-IPV6 capable devices. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Urs Thuermann <urs@isnogud.escape.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 16:36:44 -07:00
Corey Mutter	ae7bf20a63	[IPV6]: Reverse sense of promisc tests in ip6_mc_input Reverse the sense of the promiscuous-mode tests in ip6_mc_input(). Signed-off-by: Corey Mutter <crm-netdev@mutternet.com> Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-14 03:00:27 -07:00
Patrick McHardy	3c2ad469c3	[NETFILTER]: Clean up table initialization - move arp_tables initial table structure definitions to arp_tables.h similar to ip_tables and ip6_tables - use C99 initializers - use initializer macros where possible Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:43 -07:00
David S. Miller	fc038410b4	[UDP]: Fix AF-specific references in AF-agnostic code. __udp_lib_port_inuse() cannot make direct references to inet_sk(sk)->rcv_saddr as that is ipv4 specific state and this code is used by ipv6 too. Use an operations vector to solve this, and this also paves the way for ipv6 support for non-wild saddr hashing in UDP. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:22 -07:00
YOSHIFUJI Hideaki	9a6bf6fe71	[IPV6] ROUTE: Assign rt6i_idev for ip6_{prohibit,blk_hole}_entry. I think this is less critical, but is also suitable for -stable release. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:46:12 -07:00
YOSHIFUJI Hideaki	e76b2b2567	[IPV6]: Do no rely on skb->dst before it is assigned. Because skb->dst is assigned in ip6_route_input(), it is really bad to use it in hop-by-hop option handler(s). Closes: Bug #8450 (Eric Sesterhenn <snakebyte@gmx.de>) Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:45:58 -07:00
David L Stevens	5bb1ab09e4	[IPV6]: Send ICMPv6 error on scope violations. When an IPv6 router is forwarding a packet with a link-local scope source address off-link, RFC 4007 requires it to send an ICMPv6 destination unreachable with code 2 ("not neighbor"), but Linux doesn't. Fix below. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:45:32 -07:00
David Sterba	3dde6ad8fc	Fix trivial typos in Kconfig* files Fix several typos in help text in Kconfig* files. Signed-off-by: David Sterba <dave@jikos.cz> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 07:12:20 +02:00
Randy Dunlap	e63340ae6b	header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Linus Torvalds	15700770ef	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (38 commits) kconfig: fix mconf segmentation fault kbuild: enable use of code from a different dir kconfig: error out if recursive dependencies are found kbuild: scripts/basic/fixdep segfault on pathological string-o-death kconfig: correct minor typo in Kconfig warning message. kconfig: fix path to modules.txt in Kconfig help usr/Kconfig: fix typo kernel-doc: alphabetically-sorted entries in index.html of 'htmldocs' kbuild: be more explicit on missing .config file kbuild: clarify the creation of the LOCALVERSION_AUTO string. kbuild: propagate errors from find in scripts/gen_initramfs_list.sh kconfig: refer to qt3 if we cannot find qt libraries kbuild: handle compressed cpio initramfs-es kbuild: ignore section mismatch warning for references from .paravirtprobe to .init.text kbuild: remove stale comment in modpost.c kbuild/mkuboot.sh: allow spaces in CROSS_COMPILE kbuild: fix make mrproper for Documentation/DocBook/man kbuild: remove kconfig binaries during make mrproper kconfig/menuconfig: do not hardcode '.config' kbuild: override build timestamp & version ...	2007-05-06 13:21:57 -07:00
Pavel Emelianov	7562f876cd	[NET]: Rework dev_base via list_head (v3) Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 15:13:45 -07:00
Alexander E. Patrakov	39f5fb3035	kconfig: fix path to modules.txt in Kconfig help Documentation/modules.txt doesn't exist, but Documentation/kbuild/modules.txt does. Signed-off-by: Alexander E. Patrakov Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2007-05-02 20:58:11 +02:00
Eric Sesterhenn	d0772b70fa	[IPV6]: Fix slab corruption running ip6sic From: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-28 21:26:23 -07:00
Stephen Hemminger	5632c5152a	[IPV6]: Track device renames in snmp6. When network device's are renamed, the IPV6 snmp6 code gets confused. It doesn't track name changes so it will OOPS when network device's are removed. The fix is trivial, just unregister/re-register in notify handler. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-28 21:16:39 -07:00
YOSHIFUJI Hideaki	ebbd90a730	[IPV6]: Fix thinko in ipv6_rthdr_rcv() changes. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-27 02:13:39 -07:00
Milind Arun Choudhary	4ef8d0aeaf	[NET]: SPIN_LOCK_UNLOCKED cleanup in drivers/atm, net SPIN_LOCK_UNLOCKED cleanup,use __SPIN_LOCK_UNLOCKED instead Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 01:37:44 -07:00
YOSHIFUJI Hideaki	e1ec7842df	[IPV6] NDISC: Unify main process of sending ND messages. Because ndisc_send_na(), ndisc_send_ns() and ndisc_send_rs() are almost identical, so let's unify their common part. With gcc (GCC) 3.3.5 (Debian 1:3.3.5-13) on i386, Before: text data bss dec hex filename 14689 364 24 15077 3ae5 net/ipv6/ndisc.o After: text data bss dec hex filename 12317 364 24 12705 31a1 net/ipv6/ndisc.o Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:29:59 -07:00
YOSHIFUJI Hideaki	c53b3590bb	[IPV6] XFRM: Use ip6addr_any where applicable. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:29:58 -07:00
YOSHIFUJI Hideaki	df8981dc19	[IPV6]: Export in6addr_any for future use. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:29:57 -07:00
YOSHIFUJI Hideaki	420fe234ad	[IPV6] SIT: Unify code path to get hash array index. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:29:54 -07:00
David S. Miller	30041e4af4	[IPV6]: Fix Makefile thinko. obj-$(CONFIG_PROC_FS) --> ipv6-$(CONFIG_PROC_FS) Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:53 -07:00
Herbert Xu	7f7d9a6b96	[IPV6]: Consolidate common SNMP code This patch moves the non-proc SNMP code into addrconf.c and reuses IPv4 SNMP code where applicable. As a result we can skip proc.o if /proc is disabled. Note that I've made a number of functions static since they're only used by addrconf.c for now. If they ever get used elsewhere we can always remove the static. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:52 -07:00
YOSHIFUJI Hideaki	97fc8d0bc5	[IPV6] SNMP: Use put_unaligned() instead of memcpy(). Hint from David Miller <davem@davemloft.net>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:37 -07:00
YOSHIFUJI Hideaki	952a10be32	[IPV6] SNMP: Fix several warnings without procfs. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:29:36 -07:00
YOSHIFUJI Hideaki	2334e97355	[IPV6] SNMP: Avoid unaligned accesses. Because stats pointer may not be aligned for u64, use memcpy to fill u64 values. Issue reported by David Miller <davem@davemloft.net>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:29:35 -07:00
Stephen Hemminger	3ff50b7997	[NET]: cleanup extra semicolons Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:24 -07:00
YOSHIFUJI Hideaki	1370b5a59b	[IPV6] SNMP: Export statistics via netlink without CONFIG_PROC_FS. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:13 -07:00
YOSHIFUJI Hideaki	49ed67a9ee	[IPV6] SNMP: Move some statistic bits to net/ipv6/proc.c. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:11 -07:00
YOSHIFUJI Hideaki	bf99f1bde3	[IPV6] SNMP: Netlink interface. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:10 -07:00
John Heffner	628a5c5618	[INET]: Add IP(V6)_PMTUDISC_RPOBE Add IP(V6)_PMTUDISC_PROBE value for IP(V6)_MTU_DISCOVER. This option forces us not to fragment, but does not make use of the kernel path MTU discovery. That is, it allows for user-mode MTU probing (or, packetization-layer path MTU discovery). This is particularly useful for diagnostic utilities, like traceroute/tracepath. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:10 -07:00
John Heffner	b881ef7603	[IPV6]: MTU discovery check in ip6_fragment() Adds a check in ip6_fragment() mirroring ip_fragment() for packets that we can't fragment, and sends an ICMP Packet Too Big message in response. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:09 -07:00
Patrick McHardy	6313c1e099	[RTNETLINK]: Remove unnecessary locking in dump callbacks Since we're now holding the rtnl during the entire dump operation, we can remove additional locking for rtnl protected data. This patch does that for all simple cases (dev_base_lock for dev_base walking, RCU protection for FIB rule dumping). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:05 -07:00
Patrick McHardy	af65bdfce9	[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:03 -07:00
Patrick McHardy	3b5018d676	[NETFILTER]: {eb,ip6,ip}t_LOG: remove remains of LOG target overloading All LOG targets always use their internal logging function nowadays, so remove the incorrect error message and handle real errors (!= -EEXIST) by failing to load. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:00 -07:00
Herbert Xu	604763722c	[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:43 -07:00
Herbert Xu	663ead3bb8	[NET]: Use csum_start offset instead of skb_transport_header The skb transport pointer is currently used to specify the start of the checksum region for transmit checksum offload. Unfortunately, the same pointer is also used during receive side processing. This creates a problem when we want to retransmit a received packet with partial checksums since the skb transport pointer would be overwritten. This patch solves this problem by creating a new 16-bit csum_start offset value to replace the skb transport header for the purpose of checksums. This offset is calculated from skb->head so that it does not have to change when skb->data changes. No extra space is required since csum_offset itself fits within a 16-bit word so we can use the other 16 bits for csum_start. For backwards compatibility, just before we push a packet with partial checksums off into the device driver, we set the skb transport header to what it would have been under the old scheme. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:40 -07:00
Patrick McHardy	c5c2523893	[XFRM]: Optimize MTU calculation Replace the probing based MTU estimation, which usually takes 2-3 iterations to find a fitting value and may underestimate the MTU, by an exact calculation. Also fix underestimation of the XFRM trailer_len, which causes unnecessary reallocations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:38 -07:00
Patrick McHardy	557922584d	[XFRM]: esp: fix skb_tail_pointer conversion bug Fix incorrect switch of "trailer" skb by "skb" during skb_tail_pointer conversion: - (u8)(trailer->tail - 1) = top_iph->protocol; + (skb_tail_pointer(skb) - 1) = top_iph->protocol; - (u8 )(trailer->tail - 1) = skb_network_header(skb); + (skb_tail_pointer(skb) - 1) = skb_network_header(skb); Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:37 -07:00
YOSHIFUJI Hideaki	29f6af7712	[IPV6] FIB6RULE: Find source address during looking up route. When looking up route for destination with rules with source address restrictions, we may need to find a source address for the traffic if not given. Based on patch from Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:35 -07:00
Arnaldo Carvalho de Melo	27d7ff46a3	[SK_BUFF]: Introduce skb_copy_to_linear_data{_offset} To clearly state the intent of copying to linear sk_buffs, _offset being a overly long variant but interesting for the sake of saving some bytes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-04-25 22:28:29 -07:00
Arnaldo Carvalho de Melo	d626f62b11	[SK_BUFF]: Introduce skb_copy_from_linear_data{_offset} To clearly state the intent of copying from linear sk_buffs, _offset being a overly long variant but interesting for the sake of saving some bytes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-04-25 22:28:23 -07:00
Herbert Xu	35fc92a9de	[NET]: Allow forwarding of ip_summed except CHECKSUM_COMPLETE Right now Xen has a horrible hack that lets it forward packets with partial checksums. One of the reasons that CHECKSUM_PARTIAL and CHECKSUM_COMPLETE were added is so that we can get rid of this hack (where it creates two extra bits in the skbuff to essentially mirror ip_summed without being destroyed by the forwarding code). I had forgotten that I've already gone through all the deivce drivers last time around to make sure that they're looking at ip_summed == CHECKSUM_PARTIAL rather than ip_summed != 0 on transmit. In any case, I've now done that again so it should definitely be safe. Unfortunately nobody has yet added any code to update CHECKSUM_COMPLETE values on forward so we I'm setting that to CHECKSUM_NONE. This should be safe to remove for bridging but I'd like to check that code path first. So here is the patch that lets us get rid of the hack by preserving ip_summed (mostly) on forwarded packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:16 -07:00
David S. Miller	b3da2cf37c	[INET]: Use jhash + random secret for ehash. The days are gone when this was not an issue, there are folks out there with huge bot networks that can be used to attack the established hash tables on remote systems. So just like the routing cache and connection tracking hash, use Jenkins hash with random secret input. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:06 -07:00
Patrick McHardy	e6f689db51	[NETFILTER]: Use setup_timer Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:43 -07:00
Patrick McHardy	1b53d9042c	[NETFILTER]: Remove changelogs and CVS IDs Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:35 -07:00
Thomas Graf	c454673da7	[NET] rules: Unified rules dumping Implements a unified, protocol independant rules dumping function which is capable of both, dumping a specific protocol family or all of them. This speeds up dumping as less lookups are required. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:17 -07:00
Thomas Graf	c127ea2c45	[IPv6]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:13 -07:00
Arnaldo Carvalho de Melo	6b88dd966b	[SK_BUFF] ipv6: Use skb_network_offset in some more places So that we reduce the number of direct accesses to skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-04-25 22:26:38 -07:00
Arnaldo Carvalho de Melo	b529ccf279	[NETLINK]: Introduce nlmsg_hdr() helper For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the number of direct accesses to skb->data and for consistency with all the other cast skb member helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:34 -07:00
Arnaldo Carvalho de Melo	27a884dc3c	[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:28 -07:00
Arnaldo Carvalho de Melo	b0e380b1d8	[SK_BUFF]: unions of just one member don't get anything done, kill them Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and skb->mac to skb->mac_header, to match the names of the associated helpers (skb[_[re]set]_{transport,network,mac}_header). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:20 -07:00
Arnaldo Carvalho de Melo	cfe1fc7759	[SK_BUFF]: Introduce skb_network_header_len For the common sequence "skb->h.raw - skb->nh.raw", similar to skb->mac_len, that is precalculated tho, don't think we need to bloat skb with one more member, so just use this new helper, reducing the number of non-skbuff.h references to the layer headers even more. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:19 -07:00
Arnaldo Carvalho de Melo	bff9b61ce3	[SK_BUFF]: Use the helpers to get the layer header pointer Some more cases... Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:18 -07:00
Arnaldo Carvalho de Melo	ddc7b8e32b	[SK_BUFF]: Some more layer header conversions Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:03 -07:00
Arnaldo Carvalho de Melo	d10ba34b00	[SK_BUFF]: More skb_put related skb_reset_transport_header This time we have to set it to skb->tail that is not anymore equal to skb->data, so we either add a new helper or just add the skb->tail - skb->data offset, for now do the later. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:01 -07:00
Arnaldo Carvalho de Melo	55f79cc0c0	[IPV6]: Reset the network header in ip6_nd_hdr ip6_nd_hdr is always called immediately after a alloc_skb + skb_reserve sequence, i.e. when skb->tail is equal to skb->data, making it correct to use skb_reset_network_header(). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:00 -07:00
Yasuyuki Kozakai	e7ac05f340	[NETFILTER]: nf_conntrack: add nf_copy() to safely copy members in skb This unifies the codes to copy netfilter related datas. Before copying, nf_copy() puts original members in destination skb. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:55 -07:00
Arnaldo Carvalho de Melo	9c70220b73	[SK_BUFF]: Introduce skb_transport_header(skb) For the places where we need a pointer to the transport header, it is still legal to touch skb->h.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:31 -07:00
Arnaldo Carvalho de Melo	bd82393ca2	[SK_BUFF]: More skb_reset_transport_header conversions These are a bit more subtle, they are of this type: - skb->h.raw = payload; __skb_pull(skb, payload - skb->data); + skb_reset_transport_header(skb); __skb_pull results in: skb->data = skb->data + payload - skb->data; skb->data = payload; So after __skb_pull we have skb->data pointing to payload and we can just call skb_reset_transport_header(skb), that will do: skb->h.raw = payload; The others are similar, allowing us to get rid of some more cases where a pointer was being attributed to the layer headers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:29 -07:00
Arnaldo Carvalho de Melo	39b89160df	[SK_BUFF]: Introduce ipipv6_hdr(), remove skb->h.ipv6h Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:28 -07:00
Arnaldo Carvalho de Melo	b0061ce49c	[SK_BUFF]: Introduce ipip_hdr(), remove skb->h.ipiph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:27 -07:00
Arnaldo Carvalho de Melo	aa8223c7bb	[SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo	ab6a5bb6b2	[TCP]: Introduce tcp_hdrlen() and tcp_optlen() The ip_hdrlen() buddy, created to reduce the number of skb->h.th-> uses and to avoid the longer, open coded equivalent. Ditched a no-op in bnx2 in the process. I wonder if we should have a BUG_ON(skb->h.th->doff < 5) in tcp_optlen()... Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:24 -07:00
Arnaldo Carvalho de Melo	88c7664f13	[SK_BUFF]: Introduce icmp_hdr(), remove skb->h.icmph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:23 -07:00
Arnaldo Carvalho de Melo	4bedb45203	[SK_BUFF]: Introduce udp_hdr(), remove skb->h.uh Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:22 -07:00
Arnaldo Carvalho de Melo	cc70ab261c	[ICMP6]: Introduce icmp6_hdr() For consistency with all the other skb->h.raw accessors. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:20 -07:00
Arnaldo Carvalho de Melo	967b05f64e	[SK_BUFF]: Introduce skb_set_transport_header For the cases where the transport header is being set to a offset from skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:17 -07:00
Arnaldo Carvalho de Melo	ea2ae17d64	[SK_BUFF]: Introduce skb_transport_offset() For the quite common 'skb->h.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:16 -07:00
Arnaldo Carvalho de Melo	badff6d01a	[SK_BUFF]: Introduce skb_reset_transport_header(skb) For the common, open coded 'skb->h.raw = skb->data' operation, so that we can later turn skb->h.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple cases: skb->h.raw = skb->data; skb->h.raw = {skb_push\|[__]skb_pull}() The next ones will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:15 -07:00
Arnaldo Carvalho de Melo	0660e03f6b	[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo	eddc9ec53b	[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo	c9bdd4b525	[IP]: Introduce ip_hdrlen() For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open coded skb->nh.iph uses, now to go after the rest... Just out of curiosity, here are the idioms found to get the same result: skb->nh.iph->ihl << 2 skb->nh.iph->ihl<<2 skb->nh.iph->ihl * 4 skb->nh.iph->ihl4 (skb->nh.iph)->ihl sizeof(u32) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:07 -07:00
Arnaldo Carvalho de Melo	c14d2450cb	[SK_BUFF]: Introduce skb_set_network_header For the cases where the network header is being set to a offset from skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:01 -07:00
Arnaldo Carvalho de Melo	d56f90a7c9	[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo	bbe735e424	[SK_BUFF]: Introduce skb_network_offset() For the quite common 'skb->nh.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:58 -07:00
Arnaldo Carvalho de Melo	1ced98e81d	[SK_BUFF] ipv6: More skb_reset_network_header conversions related to skb_pull Now related to this form: skb->nh.ipv6h = (struct ipv6hdr )skb_put(skb, length); That, as the others, is done when skb->tail is still equal to skb->data, making the conversion to skb_reset_network_header possible. Also one more case equivalent to skb->nh.raw = skb->data, of this form: iph = (struct ipv6hdr )skb->data; <SNIP> skb->nh.ipv6h = iph; Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:54 -07:00
Arnaldo Carvalho de Melo	e2d1bca7e6	[SK_BUFF]: Use skb_reset_network_header in skb_push cases skb_push updates and returns skb->data, so we can just call skb_reset_network_header after the call to skb_push. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:47 -07:00
Arnaldo Carvalho de Melo	c1d2bbe1cd	[SK_BUFF]: Introduce skb_reset_network_header(skb) For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:46 -07:00
Arnaldo Carvalho de Melo	57effc70a5	[IPV6]: Use skb->nh.ipv6h instead of casting skb->nh.raw nh.ipv6h is there exactly for this reason! Use it while it exists ;-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:45 -07:00
Arnaldo Carvalho de Melo	98e399f82a	[SK_BUFF]: Introduce skb_mac_header() For the places where we need a pointer to the mac header, it is still legal to touch skb->mac.raw directly if just adding to, subtracting from or setting it to another layer header. This one also converts some more cases to skb_reset_mac_header() that my regex missed as it had no spaces before nor after '=', ugh. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:41 -07:00
Arnaldo Carvalho de Melo	39f69c6f92	[SK_BUFF] xfrm: Use skb_set_mac_header in the memmove cases Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:37 -07:00
Arnaldo Carvalho de Melo	459a98ed88	[SK_BUFF]: Introduce skb_reset_mac_header(skb) For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:32 -07:00
YOSHIFUJI Hideaki	e5268f12f2	[IPV6]: Ensure to truncate result and return full length for sticky options. Bug noticed by Chris Wright <chrisw@sous-sol.org>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:17 -07:00
YOSHIFUJI Hideaki	4c6510a738	[IPV6]: Return correct result for sticky options. We returned incorrect result with IPV6_RTHDRDSTOPTS, IPV6_RTHDR and IPV6_DSTOPTS. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:16 -07:00
Stephen Hemminger	add459aa1a	[UDP]: ipv6 style cleanup Fix whitespace around keywords. Eliminate unnecessary ()'s on return statements. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:08 -07:00
Eric Dumazet	ae40eb1ef3	[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:04 -07:00
Herbert Xu	759e5d0064	[UDP]: Clean up UDP-Lite receive checksum This patch eliminates some duplicate code for the verification of receive checksums between UDP-Lite and UDP. It does this by introducing __skb_checksum_complete_head which is identical to __skb_checksum_complete_head apart from the fact that it takes a length parameter rather than computing the first skb->len bytes. As a result UDP-Lite will be able to use hardware checksum offload for packets which do not use partial coverage checksums. It also means that UDP-Lite loopback no longer does unnecessary checksum verification. If any NICs start support UDP-Lite this would also start working automatically. This patch removes the assumption that msg_flags has MSG_TRUNC clear upon entry in recvmsg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:51 -07:00
Herbert Xu	1ab6eb62b0	[UDP6]: Restore sk_filter optimisation This reverts the changeset [IPV6]: UDPv6 checksum. We always need to check UDPv6 checksum because it is mandatory. The sk_filter optimisation has nothing to do whether we verify the checksum. It simply postpones it to the point when the user calls recv or poll. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:50 -07:00
YOSHIFUJI Hideaki	ca04356939	[IPV6] ADDRCONF: Fix possible inet6_ifaddr leakage with CONFIG_OPTIMISTIC_DAD. The inet6_ifaddr for source address of RS is leaked if the address is not an optimistic address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:44 -07:00
Neil Horman	95c385b4d5	[IPV6] ADDRCONF: Optimistic Duplicate Address Detection (RFC 4429) Support. Nominally an autoconfigured IPv6 address is added to an interface in the Tentative state (as per RFC 2462). Addresses in this state remain in this state while the Duplicate Address Detection process operates on them to determine their uniqueness on the network. During this period, these tentative addresses may not be used for communication, increasing the time before a node may be able to communicate on a network. Using Optimistic Duplicate Address Detection, autoconfigured addresses may be used immediately for communication on the network, as long as certain rules are followed to avoid conflicts with other nodes during the Duplicate Address Detection process. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:43 -07:00
Yasuyuki Kozakai	502b093569	[IPV6] IP6TUNNEL: Enable to control the handled inner protocol. ip6_tunnel before supporting IPv4/IPv6 tunnel allows only IPPROTO_IPV6 in configurations from userland. This allows userland to set IPPROTO_IPIP and 0(wildcard). ip6_tunnel only handles allowed inner protocols. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:42 -07:00
Yasuyuki Kozakai	3144581cb0	[IPV6] IP6TUNNEL: Rename functions ip6ip6_* to ip6_tnl_*. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:41 -07:00
Yasuyuki Kozakai	c4d3efafcc	[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel. Some notes - Protocol number IPPROTO_IPIP is used for IPv4 over IPv6 packets. - If IP6_TNL_F_USE_ORIG_TCLASS is set, TOS in IPv4 header is copied to Traffic Class in outer IPv6 header on xmit. - IP6_TNL_F_USE_ORIG_FLOWLABEL is ignored on xmit of IPv4 packets, because IPv4 header does not have flow label. - Kernel sends ICMP error if IPv4 packet is too big on xmit, even if DF flag is not set. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:40 -07:00
Yasuyuki Kozakai	61ec2aec28	[IPV6] IP6TUNNEL: Split out generic routine in ip6ip6_xmit(). This enables to add IPv4/IPv6 specific handling later, Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:39 -07:00
Yasuyuki Kozakai	8359925be8	[IPV6] IP6TUNNEL: Split out generic routine in ip6ip6_rcv(). This enables to add IPv4/IPv6 specific handling later, Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:38 -07:00
Yasuyuki Kozakai	e490d1d85c	[IPV6] IP6TUNNEL: Split out generic routine in ip6ip6_err(). This enables to add IPv4/IPv6 specific error handling later, Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:37 -07:00
YOSHIFUJI Hideaki	7159039a12	[IPV6]: Decentralize EXPORT_SYMBOLs. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:23:36 -07:00
Eric Dumazet	b7aa0bf70c	[NET]: convert network timestamps to ktime_t We currently use a special structure (struct skb_timeval) and plain 'struct timeval' to store packet timestamps in sk_buffs and struct sock. This has some drawbacks : - Fixed resolution of micro second. - Waste of space on 64bit platforms where sizeof(struct timeval)=16 I suggest using ktime_t that is a nice abstraction of high resolution time services, currently capable of nanosecond resolution. As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte shrink of this structure on 64bit architectures. Some other structures also benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...) Once this ktime infrastructure adopted, we can more easily provide nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or SO_TIMESTAMPNS/SCM_TIMESTAMPNS) Note : this patch includes a bug correction in compat_sock_get_timestamp() where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 0) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> CC: John find <linux.kernel@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:34 -07:00
James Morris	9d729f72dc	[NET]: Convert xtime.tv_sec to get_seconds() Where appropriate, convert references to xtime.tv_sec to the get_seconds() helper function. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:32 -07:00
YOSHIFUJI Hideaki	a23cf14b16	IPv6: fix Routing Header Type 0 handling thinko Oops, thinko. The test for accempting a RH0 was exatly the wrong way around. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 19:26:06 -07:00
YOSHIFUJI Hideaki	0bcbc92629	[IPV6]: Disallow RH0 by default. A security issue is emerging. Disallow Routing Header Type 0 by default as we have been doing for IPv4. Note: We allow RH2 by default because it is harmless. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-24 14:58:30 -07:00
YOSHIFUJI Hideaki	612f09e849	[IPV6] SNMP: Fix {In,Out}NoRoutes statistics. A packet which is being discarded because of no routes in the forwarding path should not be counted as OutNoRoutes but as InNoRoutes. Additionally, on this occasion, a packet whose destinaion is not valid should be counted as InAddrErrors separately. Based on patch from Mitsuru Chinen <mitch@linux.vnet.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-13 16:18:02 -07:00
David S. Miller	161980f4c6	[IPV6]: Revert recent change to rt6_check_dev(). This reverts `a0d78ebf3a` It causes pings to link-local addresses to fail. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-06 11:42:27 -07:00
Mitsuru Chinen	60e5c16641	[IPv6]: Exclude truncated packets from InHdrErrors statistics Incoming trancated packets are counted as not only InTruncatedPkts but also InHdrErrors. They should be counted as InTruncatedPkts only. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-04 23:54:59 -07:00
YOSHIFUJI Hideaki	b59e139bbd	[IPv6]: Fix incorrect length check in rawv6_sendmsg() In article <20070329.142644.70222545.davem@davemloft.net> (at Thu, 29 Mar 2007 14:26:44 -0700 (PDT)), David Miller <davem@davemloft.net> says: > From: Sridhar Samudrala <sri@us.ibm.com> > Date: Thu, 29 Mar 2007 14:17:28 -0700 > > > The check for length in rawv6_sendmsg() is incorrect. > > As len is an unsigned int, (len < 0) will never be TRUE. > > I think checking for IPV6_MAXPLEN(65535) is better. > > > > Is it possible to send ipv6 jumbo packets using raw > > sockets? If so, we can remove this check. > > I don't see why such a limitation against jumbo would exist, > does anyone else? > > Thanks for catching this Sridhar. A good compiler should simply > fail to compile "if (x < 0)" when 'x' is an unsigned type, don't > you think :-) Dave, we use "int" for returning value, so we should fix this anyway, IMHO; we should not allow len > INT_MAX. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-02 13:30:54 -07:00
Herbert Xu	53aadcc909	[IPV6]: Set IF_READY if the device is up and has carrier We still need to set the IF_READY flag in ipv6_add_dev for the case where all addresses (including the link-local) are deleted and then recreated. In that case the IPv6 device too will be destroyed and then recreated. In order to prevent the original problem, we simply ensure that the device is up before setting IF_READY. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-27 14:31:52 -07:00
David S. Miller	f11e6659ce	[IPV6]: Fix routing round-robin locking. As per RFC2461, section 6.3.6, item #2, when no routers on the matching list are known to be reachable or probably reachable we do round robin on those available routes so that we make sure to probe as many of them as possible to detect when one becomes reachable faster. Each routing table has a rwlock protecting the tree and the linked list of routes at each leaf. The round robin code executes during lookup and thus with the rwlock taken as a reader. A small local spinlock tries to provide protection but this does not work at all for two reasons: 1) The round-robin list manipulation, as coded, goes like this (with read lock held): walk routes finding head and tail spin_lock(); rotate list using head and tail spin_unlock(); While one thread is rotating the list, another thread can end up with stale values of head and tail and then proceed to corrupt the list when it gets the lock. This ends up causing the OOPS in fib6_add() later onthat many people have been hitting. 2) All the other code paths that run with the rwlock held as a reader do not expect the list to change on them, they expect it to remain completely fixed while they hold the lock in that way. So, simply stated, it is impossible to implement this correctly using a manipulation of the list without violating the rwlock locking semantics. Reimplement using a per-fib6_node round-robin pointer. This way we don't need to manipulate the list at all, and since the round-robin pointer can only ever point to real existing entries we don't need to perform any locking on the changing of the round-robin pointer itself. We only need to reset the round-robin pointer to NULL when the entry it is pointing to is removed. The idea is from Thomas Graf and it is very similar to how this was implemented before the advanced router selection code when in. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:05 -07:00
Thomas Graf	e1701c68c1	[NET]: Fix fib_rules compatibility breakage Based upon a patch from Patrick McHardy. The fib_rules netlink attribute policy introduced in 2.6.19 broke userspace compatibilty. When specifying a rule with "from all" or "to all", iproute adds a zero byte long netlink attribute, but the policy requires all addresses to have a size equal to sizeof(struct in_addr)/sizeof(struct in6_addr), resulting in a validation error. Check attribute length of FRA_SRC/FRA_DST in the generic framework by letting the family specific rules implementation provide the length of an address. Report an error if address length is non zero but no address attribute is provided. Fix actual bug by checking address length for non-zero instead of relying on availability of attribute. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:00 -07:00
Dave Jones	b6f99a2119	[NET]: fix up misplaced inlines. Turning up the warnings on gcc makes it emit warnings about the placement of 'inline' in function declarations. Here's everything that was under net/ Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-22 12:27:49 -07:00
Masayuki Nakagawa	d35690beda	[IPV6]: ipv6_fl_socklist is inadvertently shared. The ipv6_fl_socklist from listening socket is inadvertently shared with new socket created for connection. This leads to a variety of interesting, but fatal, bugs. For example, removing one of the sockets may lead to the other socket's encountering a page fault when the now freed list is referenced. The fix is to not share the flow label list with the new socket. Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-16 16:14:03 -07:00
Chris Wright	d2b02ed948	[IPV6] fix ipv6_getsockopt_sticky copy_to_user leak User supplied len < 0 can cause leak of kernel memory. Use unsigned compare instead. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-09 16:19:17 -08:00
Olaf Kirch	dfee0a725b	[IPV6]: Fix for ipv6_setsockopt NULL dereference I came across this bug in http://bugzilla.kernel.org/show_bug.cgi?id=8155 Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-09 13:55:38 -08:00
Herbert Xu	c7ababbdc6	[IPV6]: Do not set IF_READY if device is down Now that we add the IPv6 device at registration time we don't need to set IF_READY in ipv6_add_dev anymore because we will always get a NETDEV_UP event later on should the device ever become ready. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-07 16:08:12 -08:00
David S. Miller	286930797d	[IPV6]: Handle np->opt being NULL in ipv6_getsockopt_sticky(). Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-07 16:08:05 -08:00
Patrick McHardy	dd63006b8f	[NETFILTER]: nf_conntrack_ipv6: fix incorrect classification of IPv6 fragments as ESTABLISHED The individual fragments of a packet reassembled by conntrack have the conntrack reference from the reassembled packet attached, but nfctinfo is not copied. This leaves it initialized to 0, which unfortunately is the value of IP_CT_ESTABLISHED. The result is that all IPv6 fragments are tracked as ESTABLISHED, allowing them to bypass a usual ruleset which accepts ESTABLISHED packets early. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-07 16:08:01 -08:00
Yasuyuki Kozakai	bc5f774347	[NETFILTER]: ip6_route_me_harder should take into account mark Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:25:27 -08:00
Patrick McHardy	e281db5cdf	[NETFILTER]: nf_conntrack/nf_nat: fix incorrect config ifdefs The nf_conntrack_netlink config option is named CONFIG_NF_CT_NETLINK, but multiple files use CONFIG_IP_NF_CONNTRACK_NETLINK or CONFIG_NF_CONNTRACK_NETLINK for ifdefs. Fix this and reformat all CONFIG_NF_CT_NETLINK ifdefs to only use a line. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:25:19 -08:00
David Stevens	aa6e4a96e7	[IPV6]: /proc/net/anycast6 unbalanced inet6_dev refcnt Reading /proc/net/anycast6 when there is no anycast address on an interface results in an ever-increasing inet6_dev reference count, as well as a reference to the netdevice you can't get rid of. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-28 09:42:10 -08:00
Michal Wrobel	2c12a74cc4	[IPV6]: anycast refcnt fix This patch fixes a bug in Linux IPv6 stack which caused anycast address to be added to a device prior DAD has been completed. This led to incorrect reference count which resulted in infinite wait for unregister_netdevice completion on interface removal. Signed-off-by: Michal Wrobel <xmxwx@asn.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-28 09:41:58 -08:00
David S. Miller	7401055b58	[IPV6]: Fix __ipv6_addr_type() export in correct place. It needs to be in net/ipv6/addrconf_core.c Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:42:57 -08:00
YOSHIFUJI Hideaki	45ba9dd200	[IPV6] ADDRCONF: Register inet6_dev earlier. Allocate inet6_dev earlier to allow users to set up per-interface variables. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-02-26 11:42:55 -08:00
YOSHIFUJI Hideaki	46d480468f	[IPV6] ADDRCONF: Manage prefix route corresponding to address manually added. It is more natural to manage prefix routes corresponding to address which is being added manually. With help from Masafumi Aramoto <aramoto@linux-ipv6.org>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-02-26 11:42:54 -08:00
Yasuyuki Kozakai	268920584b	[IPV6] IP6TUNNEL: Use update_pmtu() of dst on xmit. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-02-26 11:42:53 -08:00
YOSHIFUJI Hideaki	8c14b7ce22	[IPV6] ADDRCONF: Statically link __ipv6_addr_type() for sunrpc subsystem. Link __ipv6_addr_type() statically for sunrpc code even if IPv6 is built as module. Signed-off-by: YOSHIFUJI Hidaki <yoshfuji@linux-ipv6.org>	2007-02-26 11:42:52 -08:00
Joe Jin	ca17c23345	[IPV6]: Adjust inet6_exit() cleanup sequence against inet6_init() This patch for adjust inet6_exit() to inverse sequence to inet6_init(). At ipv6_init, it first create proc_root/net/dev_snmp6 entry by call ipv6_misc_proc_init(), then call addrconf_init() to create the corresponding device entry at this directory, but at inet6_exit, ipv6_misc_proc_exit() called first, then call addrconf_init(). Signed-off-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:42:44 -08:00
Noriaki TAKAMIYA	d3f23dfe8b	[IPSEC]: More fix is needed for __xfrm6_bundle_create(). Fixed to set fl_tunnel.fl6_src correctly in xfrm6_bundle_create(). Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Acked-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:42:43 -08:00
Eric W. Biederman	3fbfa98112	[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables It isn't needed anymore, all of the users are gone, and all of the ctl_table initializers have been converted to use explicit names of the fields they are initializing. [akpm@osdl.org: NTFS fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:10:00 -08:00
Eric W. Biederman	0b4d414714	[PATCH] sysctl: remove insert_at_head from register_sysctl The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Tim Schmielau	cd354f1ae7	[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
Kazunori MIYAZAWA	73d605d1ab	[IPSEC]: changing API of xfrm6_tunnel_register This patch changes xfrm6_tunnel register and deregister interface to prepare for solving the conflict of device tunnels with inter address family IPsec tunnel. There is no device which conflicts with IPv4 over IPv6 IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:55:55 -08:00
Kazunori MIYAZAWA	c73cb5a2d6	[IPSEC]: make sit use the xfrm4_tunnel_register This patch makes sit use xfrm4_tunnel_register instead of inet_add_protocol. It solves conflict of sit device with inter address family IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:55:25 -08:00
YOSHIFUJI Hideaki	6e1d9d04c4	[IPV6] HASHTABLES: Use appropriate seed for caluculating ehash index. Tetsuo Handa <handat@pm.nttdata.co.jp> told me that connect(2) with TCPv6 socket almost always took a few minutes to return when we did not have any ports available in the range of net.ipv4.ip_local_port_range. The reason was that we used incorrect seed for calculating index of hash when we check established sockets in __inet6_check_established(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 20:26:39 -08:00
Masahide NAKAMURA	138939e066	[NETFILTER]: ip6t_mh: drop piggyback payload packet on MH packets Regarding RFC3775, MH payload proto field should be IPPROTO_NONE. Otherwise it must be discarded (and the receiver should send ICMP error). We assume filter should drop such piggyback everytime to disallow slipping through firewall rules, even the final receiver will discard it. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:16:17 -08:00
Patrick McHardy	a3c941b08d	[NETFILTER]: Kconfig: improve dependency handling Instead of depending on internally needed options and letting users figure out what is needed, select them when needed: - IP_NF_IPTABLES, IP_NF_ARPTABLES and IP6_NF_IPTABLES select NETFILTER_XTABLES - NETFILTER_XT_TARGET_CONNMARK, NETFILTER_XT_MATCH_CONNMARK and IP_NF_TARGET_CLUSTERIP select NF_CONNTRACK_MARK - NETFILTER_XT_MATCH_CONNBYTES selects NF_CT_ACCT Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:15:02 -08:00
Patrick McHardy	c0e912d7ed	[NETFILTER]: nf_conntrack: fix invalid conntrack statistics RCU assumption NF_CT_STAT_INC assumes rcu_read_lock in nf_hook_slow disables preemption as well, making it legal to use __get_cpu_var without disabling preemption manually. The assumption is not correct anymore with preemptable RCU, additionally we need to protect against softirqs when not holding nf_conntrack_lock. Add NF_CT_STAT_INC_ATOMIC macro, which disables local softirqs, and use where necessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:13:43 -08:00
Patrick McHardy	923f4902fe	[NETFILTER]: nf_conntrack: properly use RCU API for nf_ct_protos/nf_ct_l3protos arrays Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in all paths not obviously only used within packet process context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:57 -08:00
Patrick McHardy	e92ad99c78	[NETFILTER]: nf_log: minor cleanups - rename nf_logging to nf_loggers since its an array of registered loggers - rename nf_log_unregister_logger() to nf_log_unregister() to make it symetrical to nf_log_register() and convert all users Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:11:55 -08:00
Arjan van de Ven	9a32144e9d	[PATCH] mark struct file_operations const 7 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Linus Torvalds	cb18eccff4	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits) [IPV4]: Restore multipath routing after rt_next changes. [XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch. [NET]: Reorder fields of struct dst_entry [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer [NET]: Introduce union in struct dst_entry to hold 'next' pointer [DECNET]: fix misannotation of linkinfo_dn [DECNET]: FRA_{DST,SRC} are le16 for decnet [UDP]: UDP can use sk_hash to speedup lookups [NET]: Fix whitespace errors. [NET] XFRM: Fix whitespace errors. [NET] X25: Fix whitespace errors. [NET] WANROUTER: Fix whitespace errors. [NET] UNIX: Fix whitespace errors. [NET] TIPC: Fix whitespace errors. [NET] SUNRPC: Fix whitespace errors. [NET] SCTP: Fix whitespace errors. [NET] SCHED: Fix whitespace errors. [NET] RXRPC: Fix whitespace errors. ...	2007-02-11 11:38:13 -08:00
Robert P. J. Day	c376222960	[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:27 -08:00
Masahide NAKAMURA	bda390d5c8	[XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch. It seems to miss RO mode path by IPv6 over IPv4 IPsec tunnel patch when it changed semantics to check the mode from "xfrm[i]->props.mode != XFRM_MODE_TRANSPORT" to "xfrm[i]->props.mode == XFRM_MODE_TUNNEL" before changing address. It also makes two incline functions __xfrm6_bundle_addr_{remote,local} are used by nobody. This patch fixes it. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:47 -08:00
Eric Dumazet	7cc482634f	[IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer This patch removes the next pointer from 'struct rt6_info.u' union, and renames u.next to u.dst.rt6_next. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:40 -08:00
Eric Dumazet	95f30b336b	[UDP]: UDP can use sk_hash to speedup lookups In a prior patch, I introduced a sk_hash field (__sk_common.skc_hash) to let tcp lookups use one cache line per unmatched entry instead of two. We can also use sk_hash to speedup UDP part as well. We store in sk_hash the hnum value, and use sk->sk_hash (same cache line than 'next' pointer), instead of inet->num (different cache line) Note : We still have a false sharing problem for SMP machines, because sock_hold(sock) dirties the cache line containing the 'next' pointer. Not counting the udp_hash_lock rwlock. (did someone mentioned RCU ? :) ) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:29 -08:00
YOSHIFUJI Hideaki	1ab1457c42	[NET] IPV6: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:19:42 -08:00
Eric Dumazet	dbca9b2750	[NET]: change layout of ehash table ehash table layout is currently this one : First half of this table is used by sockets not in TIME_WAIT state Second half of it is used by sockets in TIME_WAIT state. This is non optimal because of for a given hash or socket, the two chain heads are located in separate cache lines. Moreover the locks of the second half are never used. If instead of this halving, we use two list heads in inet_ehash_bucket instead of only one, we probably can avoid one cache miss, and reduce ram usage, particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep together related chains to speedup lookups and socket state change. In this patch I did not try to align struct inet_ehash_bucket, but a future patch could try to make this structure have a convenient size (a power of two or a multiple of L1_CACHE_SIZE). I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so maybe we dont need to scratch our heads to align the bucket... Note : In case struct inet_ehash_bucket is not a power of two, we could probably change alloc_large_system_hash() (in case it use __get_free_pages()) to free the unused space. It currently allocates a big zone, but the last quarter of it could be freed. Again, this should be a temporary 'problem'. Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 14:16:46 -08:00
Patrick McHardy	9934e81c8c	[NETFILTER]: ip6_tables: remove redundant structure definitions Move ip6t_standard/ip6t_error_target/ip6t_error definitions to ip6_tables.h instead of defining them in each table individually. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:23 -08:00
Masahide NAKAMURA	a0ca215a73	[NETFILTER]: ip6_tables: support MH match This introduces match for Mobility Header (MH) described by Mobile IPv6 specification (RFC3775). User can specify the MH type or its range to be matched. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: Yasuyuki Kozakai <kozakai@linux-ipv6.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:21 -08:00
Jan Engelhardt	e60a13e030	[NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined structure names Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:20 -08:00
Jan Engelhardt	6709dbbb19	[NETFILTER]: {ip,ip6}_tables: remove x_tables wrapper functions Use the x_tables functions directly to make it better visible which parts are shared between ip_tables and ip6_tables. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:19 -08:00
Jan Engelhardt	e1fd0586b0	[NETFILTER]: x_tables: fix return values for LOG/ULOG Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:18 -08:00
Jan Engelhardt	2822b0d926	[NETFILTER]: Remove useless comparisons before assignments Remove unnecessary if() constructs before assignment. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:11 -08:00
Stephen Hemminger	22f8cde5bc	[NET]: unregister_netdevice as void There was no real useful information from the unregister_netdevice() return code, the only error occurred in a situation that was a driver bug. So change it to a void function. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:06 -08:00
Masahide NAKAMURA	f48d5ff1e4	[IPV6] RAW: Add checksum default defines for MH. Add checksum default defines for mobility header(MH) which goes through raw socket. As the result kernel's behavior is to handle MH checksum as default. This patch also removes verifying inbound MH checksum at mip6_mh_filter() since it did not consider user specified checksum offset and was redundant check with raw socket code. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:05 -08:00
Alexey Dobriyan	cc63f70b8b	[IPV4/IPV6] multicast: Check add_grhead() return value add_grhead() allocates memory with GFP_ATOMIC and in at least two places skb from it passed to skb_put() without checking. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:04 -08:00
Miika Komu	4337226228	[IPSEC]: IPv4 over IPv6 IPsec tunnel This is the patch to support IPv4 over IPv6 IPsec. Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:02 -08:00
Miika Komu	c82f963efe	[IPSEC]: IPv6 over IPv4 IPsec tunnel This is the patch to support IPv6 over IPv4 IPsec Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:01 -08:00
Miika Komu	cdca72652a	[IPSEC]: exporting xfrm_state_afinfo This patch exports xfrm_state_afinfo. Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:00 -08:00
David S. Miller	8eb9086f21	[IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts. Do this even for non-blocking sockets. This avoids the silly -EAGAIN that applications can see now, even for non-blocking sockets in some cases (f.e. connect()). With help from Venkat Tekkirala. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:45 -08:00
YOSHIFUJI Hideaki	a0d78ebf3a	[IPV6] ROUTE: Do not route packets to link-local address on other device. With help from Wei Dong <weid@np.css.fujitsu.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:42 -08:00
Patrick McHardy	26932566a4	[NETLINK]: Don't BUG on undersized allocations Currently netlink users BUG when the allocated skb for an event notification is undersized. While this is certainly a kernel bug, its not critical and crashing the kernel is too drastic, especially when considering that these errors have appeared multiple times in the past and it BUGs even if no listeners are present. This patch replaces BUG by WARN_ON and changes the notification functions to inform potential listeners of undersized allocations using a unique error code (EMSGSIZE). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:41 -08:00
Li Yewang	29556526b9	[IPV6]: fix BUG of ndisc_send_redirect() When I tested IPv6 redirect function about kernel 2.6.19.1, and found that the kernel can send redirect packets whose target address is global address, and the target is not the actual endpoint of communication. But the criteria conform to RFC2461, the target address defines as following: Target Address An IP address that is a better first hop to use for he ICMP Destination Address. When the target is the actual endpoint of communication, i.e., the destination is a neighbor, the Target Address field MUST contain the same value as the ICMP Destination Address field. Otherwise the target is a better first-hop router and the Target Address MUST be the router's link-local address so that hosts can uniquely identify routers. According to this definition, when a router redirect to a host, the target address either the better first-hop router's link-local address or the same as the ICMP destination address field. But the function of ndisc_send_redirect() in net/ipv6/ndisc.c, does not check the target address correctly. There is another definition about receive Redirect message in RFC2461: 8.1. Validation of Redirect Messages A host MUST silently discard any received Redirect message that does not satisfy all of the following validity checks: ...... - The ICMP Target Address is either a link-local address (when redirected to a router) or the same as the ICMP Destination Address (when redirected to the on-link destination). ...... And the receive redirect function of ndisc_redirect_rcv() implemented this definition, checks the target address correctly. if (ipv6_addr_equal(dest, target)) { on_link = 1; } else if (!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) { ND_PRINTK2(KERN_WARNING "ICMPv6 Redirect: target address is not link-local.\n"); return; } So, I think the send redirect function must check the target address also. Signed-off-by: Li Yewang <lyw@nanjing-fnst.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-30 14:33:20 -08:00
Neil Horman	fa03ef38e1	[IPV6]: Fix up some CONFIG typos Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-30 14:30:10 -08:00
David S. Miller	e89862f4c5	[TCP]: Restore SKB socket owner setting in tcp_transmit_skb(). Revert `931731123a` We can't elide the skb_set_owner_w() here because things like certain netfilter targets (such as owner MATCH) need a socket to be set on the SKB for correct operation. Thanks to Jan Engelhardt and other netfilter list members for pointing this out. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-26 01:04:55 -08:00
Noriaki TAKAMIYA	6a2b9ce0a3	[IPV6]: Fixed the size of the netlink message notified by inet6_rt_notify(). I think the return value of rt6_nlmsg_size() should includes the amount of RTA_METRICS. Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-23 22:09:41 -08:00
YOSHIFUJI Hideaki	d88ae4cc97	[IPV6] MCAST: Fix joining all-node multicast group on device initialization. Join all-node multicast group after assignment of dev->ip6_ptr because it must be assigned when ipv6_dev_mc_inc() is called. This fixes Bug#7817, reported by <gernoth@informatik.uni-erlangen.de>. Closes: 7817 Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-23 20:25:40 -08:00
Paul Moore	469de9b90f	[INET]: style updates for the inet_sock->is_icsk assignment fix A quick patch to change the inet_sock->is_icsk assignment to better fit with existing kernel coding style. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-09 14:37:06 -08:00
Patrick McHardy	f9f02cca25	[NETFILTER]: nf_conntrack_ipv6: fix crash when handling fragments When IPv6 connection tracking splits up a defragmented packet into its original fragments, the packets are taken from a list and are passed to the network stack with skb->next still set. This causes dev_hard_start_xmit to treat them as GSO fragments, resulting in a use after free when connection tracking handles the next fragment. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-09 14:32:41 -08:00
Paul Moore	cbbd7d4f36	[INET]: Fix incorrect "inet_sock->is_icsk" assignment. The inet_create() and inet6_create() functions incorrectly set the inet_sock->is_icsk field. Both functions assume that the is_icsk field is large enough to hold at least a INET_PROTOSW_ICSK value when it is actually only a single bit. This patch corrects the assignment by doing a boolean comparison whose result will safely fit into a single bit field. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-09 00:29:51 -08:00
David L Stevens	30c4cf577f	[IPV4/IPV6]: Fix inet{,6} device initialization order. It is important that we only assign dev->ip{,6}_ptr only after all portions of the inet{,6} are setup. Otherwise we can receive packets before the multicast spinlocks et al. are initialized. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-04 12:31:14 -08:00
David S. Miller	9b54d5c610	[NETFILTER] IPV6: Fix dependencies. Although the menu dependencies in net/ipv6/netfilter/Kconfig guard the entries in that file from the Kconfig GUI, this does not prevent them from being selected still via "make oldconfig" when IPV6 etc. is disabled. So add explicit dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-17 21:59:18 -08:00
Kim Nordlund	8bce65b95a	[IPV6]: Make fib6_node subtree depend on IPV6_SUBTREES Make fib6_node 'subtree' depend on IPV6_SUBTREES. Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-13 16:48:31 -08:00
Brian Haley	befffe9016	[IPV6]: Fix IPV6_UNICAST_HOPS getsockopt(). > Relevant standard (RFC 3493) notes: > > The IPV6_UNICAST_HOPS option may be used with getsockopt() to > determine the hop limit value that the system will use for subsequent > unicast packets sent via that socket. > > I don't reckon -1 could be the hop limit value. -1 means un-initialized. > IMHO, the value from > case 1 (if socket is connected to some destination), otherwise case 2 > (if bound to a scope interface) or ultimately the default hop limit > ought to be returned instead, as it will be most often correct, while > the current behavior is always wrong, unless setsockopt() has been used > first. I don't if some people may think doing a route lookup in > getsockopt might be overly expensive, but at least the two other cases > should be ok, particularly the last one. The following patch seems to work for me, but this code has behaved this way for a while, so don't know if it will break any existing apps. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-13 16:48:25 -08:00
Al Viro	e1b4b9f398	[NETFILTER]: {ip,ip6,arp}_tables: fix exponential worst-case search for loops If we come to node we'd already marked as seen and it's not a part of path (i.e. we don't have a loop right there), we already know that it isn't a part of any loop, so we don't need to revisit it. That speeds the things up if some chain is refered to from several places and kills O(exp(table size)) worst-case behaviour (without sleeping, at that, so if you manage to self-LART that way, you are SOL for a long time)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-13 16:48:23 -08:00
Alexey Dobriyan	1f29bcd739	[PATCH] sysctl: remove unused "context" param Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Howells <dhowells@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:41 -08:00
Stephen Hemminger	3644f0cee7	[NET]: Convert hh_lock to seqlock. The hard header cache is in the main output path, so using seqlock instead of reader/writer lock should reduce overhead. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-08 17:19:20 -08:00
Linus Torvalds	2685b267bc	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits) [NETFILTER]: Fix non-ANSI func. decl. [TG3]: Identify Serdes devices more clearly. [TG3]: Use msleep. [TG3]: Use netif_msg_*. [TG3]: Allow partial speed advertisement. [TG3]: Add TG3_FLG2_IS_NIC flag. [TG3]: Add 5787F device ID. [TG3]: Fix Phy loopback. [WANROUTER]: Kill kmalloc debugging code. [TCP] inet_twdr_hangman: Delete unnecessary memory barrier(). [NET]: Memory barrier cleanups [IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries. audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n audit: Add auditing to ipsec [IRDA] irlan: Fix compile warning when CONFIG_PROC_FS=n [IrDA]: Incorrect TTP header reservation [IrDA]: PXA FIR code device model conversion [GENETLINK]: Fix misplaced command flags. [NETLIK]: Add a pointer to the Generic Netlink wiki page. [IPV6] RAW: Don't release unlocked sock. ...	2006-12-07 09:05:15 -08:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Christoph Lameter	54e6ecb239	[PATCH] slab: remove SLAB_ATOMIC SLAB_ATOMIC is an alias of GFP_ATOMIC Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
Alan Stern	a120586873	[PATCH] Allow NULL pointers in percpu_free The patch (as824b) makes percpu_free() ignore NULL arguments, as one would expect for a deallocation routine. (Note that free_percpu is #defined as percpu_free in include/linux/percpu.h.) A few callers are updated to remove now-unneeded tests for NULL. A few other callers already seem to assume that passing a NULL pointer to percpu_free() is okay! The patch also removes an unnecessary NULL check in percpu_depopulate(). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
Masahide NAKAMURA	4e33fa14fa	[IPV6] RAW: Don't release unlocked sock. When user builds IPv6 header and send it through raw socket, kernel tries to release unlocked sock. (Kernel log shows "BUG: bad unlock balance detected" with enabled debug option.) The lock is held only for non-hdrincl sock in this function then this patch fix to do nothing about lock for hdrincl one. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:39:09 -08:00
YOSHIFUJI Hideaki	9a217a1c7e	[IPV6]: Repair IPv6 Fragments The commit "[IPV6]: Use kmemdup" (commit-id: `af879cc704`) broke IPv6 fragments. Bug was spotted by Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:39:08 -08:00
Dmitry Mishin	74c9c0c17d	[NETFILTER]: Fix {ip,ip6,arp}_tables hook validation Commit `590bdf7fd2` introduced a regression in match/target hook validation. mark_source_chains builds a bitmask for each rule representing the hooks it can be reached from, which is then used by the matches and targets to make sure they are only called from valid hooks. The patch moved the match/target specific validation before the mark_source_chains call, at which point the mask is always zero. This patch returns back to the old order and moves the standard checks to mark_source_chains. This allows to get rid of a special case for standard targets as a nice side-effect. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:39:02 -08:00
Patrick McHardy	a3c479772c	[NETFILTER]: Mark old IPv4-only connection tracking scheduled for removal Also remove the references to "new connection tracking" from Kconfig. After some short stabilization period of the new connection tracking helpers/NAT code the old one will be removed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:11:01 -08:00
Patrick McHardy	bff9a89bca	[NETFILTER]: nf_conntrack: endian annotations Resync with Al Viro's ip_conntrack annotations and fix a missed spot in ip_nat_proto_icmp.c. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:05:08 -08:00
Andrew Morton	b6332e6cf9	[TCP]: Fix warnings with TCP_MD5SIG disabled. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:52 -08:00
Adrian Bunk	f5b99bcddd	[NET]: Possible cleanups. This patch contains the following possible cleanups: - make the following needlessly global functions statis: - ipv4/tcp.c: __tcp_alloc_md5sig_pool() - ipv4/tcp_ipv4.c: tcp_v4_reqsk_md5_lookup() - ipv4/udplite.c: udplite_rcv() - ipv4/udplite.c: udplite_err() - make the following needlessly global structs static: - ipv4/tcp_ipv4.c: tcp_request_sock_ipv4_ops - ipv4/tcp_ipv4.c: tcp_sock_ipv4_specific - ipv6/tcp_ipv6.c: tcp_request_sock_ipv6_ops - net/ipv{4,6}/udplite.c: remove inline's from static functions (gcc should know best when to inline them) Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:51 -08:00
Patrick McHardy	76592584be	[NETFILTER]: Fix PROC_FS=n warnings Fix some unused function/variable warnings. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:34 -08:00
Patrick McHardy	baf7b1e112	[NETFILTER]: x_tables: add NFLOG target Add new NFLOG target to allow use of nfnetlink_log for both IPv4 and IPv6. Currently we have two (unsupported by userspace) hacks in the LOG and ULOG targets to optionally call to the nflog API. They lack a few features, namely the IPv4 and IPv6 LOG targets can not specify a number of arguments related to nfnetlink_log, while the ULOG target is only available for IPv4. Remove those hacks and add a clean way to use nfnetlink_log. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:31 -08:00
Patrick McHardy	933a41e7e1	[NETFILTER]: nf_conntrack: move conntrack protocol sysctls to individual modules Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:18 -08:00
Patrick McHardy	f8eb24a89a	[NETFILTER]: nf_conntrack: move extern declaration to header files Using extern in a C file is a bad idea because the compiler can't catch type errors. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:16 -08:00
Martin Josefsson	605dcad6c8	[NETFILTER]: nf_conntrack: rename struct nf_conntrack_protocol Rename 'struct nf_conntrack_protocol' to 'struct nf_conntrack_l4proto' in order to help distinguish it from 'struct nf_conntrack_l3proto'. It gets rather confusing with 'nf_conntrack_protocol'. Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:09 -08:00
Gerrit Renker	4c0a6cb0db	[UDP(-Lite)]: consolidate v4 and v6 get\|setsockopt code This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The justification is that UDP(-Lite) is a transport-layer protocol and therefore the socket option code (at least in theory) should be AF-independent. Furthermore, there is the following code reduplication: * do_udp{,v6}_getsockopt is 100% identical between v4 and v6 * do_udp{,v6}_setsockopt is identical up to the following differerence --v4 in contrast to v4 additionally allows the experimental encapsulation types UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE --the remainder is identical between v4 and v6 I believe that this difference is of little relevance. The advantages in not duplicating twice almost completely identical code. The patch further simplifies the interface of udp{,v6}_push_pending_frames, since for the second argument (struct udp_sock *up) it always holds that up = udp_sk(sk); where sk is the first function argument. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:45 -08:00
Thomas Graf	e3703b3de1	[RTNETLINK]: Add rtnl_put_cacheinfo() to unify some code IPv4, IPv6, and DECNet all use struct rta_cacheinfo in a similiar way, therefore rtnl_put_cacheinfo() is added to reuse code. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:44 -08:00
Ville Nuorvala	107a5fe619	[IPV6]: Improve IPv6 tunnel error reporting Log an error if the remote tunnel endpoint is unable to handle tunneled packets. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:27 -08:00
Ville Nuorvala	6fb32ddeb2	[IPV6]: Don't allocate memory for Tunnel Encapsulation Limit Option Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:26 -08:00
Ville Nuorvala	305d4b3ce8	[IPV6]: Allow link-local tunnel endpoints Allow link-local tunnel endpoints if the underlying link is defined. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:25 -08:00
Ville Nuorvala	09c6bbf090	[IPV6]: Do mandatory IPv6 tunnel endpoint checks in realtime Doing the mandatory tunnel endpoint checks when the tunnel is set up isn't enough as interfaces can go up or down and addresses can be added or deleted after this. The checks need to be done realtime when the tunnel is processing a packet. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:24 -08:00
Ville Nuorvala	567131a722	[IPV6]: Fix SIOCCHGTUNNEL bug in IPv6 tunnels A logic bug in tunnel lookup could result in duplicate tunnels when changing an existing device. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:24 -08:00
Al Viro	ff1dcadb1b	[NET]: Split skb->csum ... into anonymous union of __wsum and __u32 (csum and csum_offset resp.) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:27:18 -08:00
Al Viro	8e5200f540	[NET]: Fix assorted misannotations (from md5 and udplite merges). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:27:16 -08:00
Adrian Bunk	89c8945815	[IPV6] net/ipv6/sit.c: make 2 functions static This patch makes two needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:26:15 -08:00
Arnaldo Carvalho de Melo	af879cc704	[IPV6]: Use kmemdup Code diff stats: [acme@newtoy net-2.6.20]$ codiff /tmp/ipv6.ko.before /tmp/ipv6.ko.after /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/ip6_output.c: ip6_output \| -52 ip6_append_data \| +2 2 functions changed, 2 bytes added, 52 bytes removed /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/addrconf.c: addrconf_sysctl_register \| -27 1 function changed, 27 bytes removed /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/tcp_ipv6.c: tcp_v6_syn_recv_sock \| -32 tcp_v6_parse_md5_keys \| -24 2 functions changed, 56 bytes removed /tmp/ipv6.ko.after: 5 functions changed, 2 bytes added, 135 bytes removed [acme@newtoy net-2.6.20]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:58 -08:00
David S. Miller	7d9e9b3df4	[IPV6]: udp.c build fix Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:46 -08:00
Al Viro	f6ab028804	[NET]: Make mangling a checksum (0 -> 0xffff on the wire) explicit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:39 -08:00
Al Viro	b51655b958	[NET]: Annotate __skb_checksum_complete() and friends. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:38 -08:00
Al Viro	5f92a7388a	[NET]: Annotate callers of the reset of checksum.h stuff. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:34 -08:00
Al Viro	868c86bcb5	[NET]: annotate csum_ipv6_magic() callers in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:31 -08:00
Al Viro	e69a4adc66	[IPV6]: Misc endianness annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:52 -08:00
Gerrit Renker	ba4e58eca8	[NET]: Supporting UDP-Lite (RFC 3828) in Linux This is a revision of the previously submitted patch, which alters the way files are organized and compiled in the following manner: * UDP and UDP-Lite now use separate object files * source file dependencies resolved via header files net/ipv{4,6}/udp_impl.h * order of inclusion files in udp.c/udplite.c adapted accordingly [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828) This patch adds support for UDP-Lite to the IPv4 stack, provided as an extension to the existing UDPv4 code: * generic routines are all located in net/ipv4/udp.c * UDP-Lite specific routines are in net/ipv4/udplite.c * MIB/statistics support in /proc/net/snmp and /proc/net/udplite * shared API with extensions for partial checksum coverage [NET/IPv6]: Extension for UDP-Lite over IPv6 It extends the existing UDPv6 code base with support for UDP-Lite in the same manner as per UDPv4. In particular, * UDPv6 generic and shared code is in net/ipv6/udp.c * UDP-Litev6 specific extensions are in net/ipv6/udplite.c * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6 * support for IPV6_ADDRFORM * aligned the coding style of protocol initialisation with af_inet6.c * made the error handling in udpv6_queue_rcv_skb consistent; to return `-1' on error on all error cases * consolidation of shared code [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support The UDP-Lite patch further provides * API documentation for UDP-Lite * basic xfrm support * basic netfilter support for IPv4 and IPv6 (LOG target) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:46 -08:00
Thomas Graf	6051e2f4fb	[IPv6] prefix: Convert RTM_NEWPREFIX notifications to use the new netlink api RTM_GETPREFIX is completely unused and is thus removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:45 -08:00
Thomas Graf	04561c1fe7	[IPv6] iflink: Convert IPv6's RTM_GETLINK to use the new netlink api By replacing the current method of exporting the device configuration which included allocating a temporary buffer, copying ipv6_devconf into it and copying that buffer into the message with a method that uses nla_reserve() allowing to copy the device configuration directly into the skb data buffer, a GFP_ATOMIC allocation could be removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:44 -08:00
David S. Miller	a928630a2f	[TCP]: Fix some warning when MD5 is disabled. Just some mis-placed ifdefs: net/ipv4/tcp_minisocks.c: In function ‘tcp_twsk_destructor’: net/ipv4/tcp_minisocks.c:364: warning: unused variable ‘twsk’ net/ipv6/tcp_ipv6.c:1846: warning: ‘tcp_sock_ipv6_specific’ defined but not used net/ipv6/tcp_ipv6.c:1877: warning: ‘tcp_sock_ipv6_mapped_specific’ defined but not used Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:43 -08:00
YOSHIFUJI Hideaki	cfb6eeb4c8	[TCP]: MD5 Signature Option (RFC2385) support. Based on implementation by Rick Payne. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:39 -08:00
Gerrit Renker	b9df3cb8cf	[TCP/DCCP]: Introduce net_xmit_eval Throughout the TCP/DCCP (and tunnelling) code, it often happens that the return code of a transmit function needs to be tested against NET_XMIT_CN which is a value that does not indicate a strict error condition. This patch uses a macro for these recurring situations which is consistent with the already existing macro net_xmit_errno, saving on duplicated code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:27 -08:00
Brian Haley	d3a1be9cba	[IPv6]: Only modify checksum for UDP Only change upper-layer checksum from 0 to 0xFFFF for UDP (as RFC 768 states), not for others as RFC 4443 doesn't require it. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:13 -08:00
Thomas Graf	f465e489c4	[IPv6] rules: Remove bogus tos validation check Noticed by Al Viro: (frh->tos & ~IPV6_FLOWINFO_MASK)) where IPV6_FLOWINFO_MASK is htonl(0xfffffff) and frh->tos is u8, which makes no sense here... Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:12 -08:00
Thomas Graf	339bf98ffc	[NETLINK]: Do precise netlink message allocations where possible Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:11 -08:00
Gerrit Renker	a94f723d59	[TCP]: Remove dead code in init_sequence This removes two redundancies: 1) The test (skb->protocol == htons(ETH_P_IPV6) in tcp_v6_init_sequence() is always true, due to * tcp_v6_conn_request() is the only function calling this one * tcp_v6_conn_request() redirects all skb's with ETH_P_IP protocol to tcp_v4_conn_request() [ cf. top of tcp_v6_conn_request()] 2) The first argument, `struct sock *sk' of tcp_v{4,6}_init_sequence() is never used. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:10 -08:00
YOSHIFUJI Hideaki	a11d206d0f	[IPV6]: Per-interface statistics support. For IP MIB (RFC4293). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-12-02 21:22:08 -08:00
YOSHIFUJI Hideaki	7a3025b1b3	[IPV6]: Introduce ip6_dst_idev() to get inet6_dev{} stored in dst_entry{}. Otherwise, we will see a lot of casts... Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-12-02 21:22:07 -08:00
YOSHIFUJI Hideaki	40aa7b90a9	[IPV6] ROUTE: Use &rt->u.dst instead of cast. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-12-02 21:22:06 -08:00
YOSHIFUJI Hideaki	33e93c9699	[IPV6] ROUTE: Use macros to format /proc/net/ipv6_route. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-12-02 21:22:05 -08:00
David S. Miller	931731123a	[TCP]: Don't set SKB owner in tcp_transmit_skb(). The data itself is already charged to the SKB, doing the skb_set_owner_w() just generates a lot of noise and extra atomics we don't really need. Lmbench improvements on lat_tcp are minimal: before: TCP latency using localhost: 23.2701 microseconds TCP latency using localhost: 23.1994 microseconds TCP latency using localhost: 23.2257 microseconds after: TCP latency using localhost: 22.8380 microseconds TCP latency using localhost: 22.9465 microseconds TCP latency using localhost: 22.8462 microseconds Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:52 -08:00
David S. Miller	9ec75fe85c	[IPV6] tcp: Fix typo _read_mostly --> __read_mostly. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:46 -08:00
Eric Dumazet	72a3effaf6	[NET]: Size listen hash tables using backlog hint We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for each LISTEN socket, regardless of various parameters (listen backlog for example) On x86_64, this means order-1 allocations (might fail), even for 'small' sockets, expecting few connections. On the contrary, a huge server wanting a backlog of 50000 is slowed down a bit because of this fixed limit. This patch makes the sizing of listen hash table a dynamic parameter, depending of : - net.core.somaxconn tunable (default is 128) - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128) - backlog value given by user application (2nd parameter of listen()) For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of kmalloc(). We still limit memory allocation with the two existing tunables (somaxconn & tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM usage. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:44 -08:00
Thomas Graf	1f6c9557e8	[NET] rules: Share common attribute validation policy Move the attribute policy for the non-specific attributes into net/fib_rules.h and include it in the respective protocols. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:41 -08:00
Thomas Graf	b8964ed9fa	[NET] rules: Protocol independant mark selector Move mark selector currently implemented per protocol into the protocol independant part. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:41 -08:00
Thomas Graf	47dcf0cb10	[NET]: Rethink mark field in struct flowi Now that all protocols have been made aware of the mark field it can be moved out of the union thus simplyfing its usage. The config options in the IPv4/IPv6/DECnet subsystems to enable respectively disable mark based routing only obfuscate the code with ifdefs, the cost for the additional comparison in the flow key is insignificant, and most distributions have all these options enabled by default anyway. Therefore it makes sense to remove the config options and enable mark based routing by default. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:39 -08:00
Thomas Graf	82e91ffef6	[NET]: Turn nfmark into generic mark nfmark is being used in various subsystems and has become the defacto mark field for all kinds of packets. Therefore it makes sense to rename it to `mark' and remove the dependency on CONFIG_NETFILTER. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:38 -08:00
Al Viro	ae08e1f092	[IPV6]: ip6_output annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:26 -08:00
Al Viro	fede70b986	[IPV6]: annotate inet6_csk_search_req() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:22 -08:00
Al Viro	90bcaf7b4a	[IPV6]: flowlabels are net-endian Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:21 -08:00
Al Viro	8a74ff7770	[IPV6]: annotate ipv6 mcast Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:15 -08:00
Al Viro	04ce69093f	[IPV6]: 'info' argument of ipv6 ->err_handler() is net-endian Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:12 -08:00
Al Viro	8c689a6eae	[XFRM]: misc annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:11 -08:00
Al Viro	d2ecd9ccd0	[IPV6]: annotate inet6_hashtables Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:10 -08:00
David S. Miller	d54a81d341	[IPV6] NDISC: Calculate packet length correctly for allocation. MAX_HEADER does not include the ipv6 header length in it, so we need to add it in explicitly. With help from YOSHIFUJI Hideaki. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:06:31 -08:00
YOSHIFUJI Hideaki	f2776ff047	[IPV6]: Fix address/interface handling in UDP and DCCP, according to the scoping architecture. TCP and RAW do not have this issue. Closes Bug #7432. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-21 17:41:56 -08:00
Yasuyuki Kozakai	53ab61c6d8	[IPV6] IP6TUNNEL: Add missing nf_reset() on input path. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-11-21 16:16:27 -08:00
Yasuyuki Kozakai	b3fdd9f115	[IPV6] IP6TUNNEL: Delete all tunnel device when unloading module. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-11-21 16:16:26 -08:00
YOSHIFUJI Hideaki	ea659e0775	[IPV6] ROUTE: Do not enable router reachability probing in router mode. RFC4191 explicitly states that the procedures are applicable to hosts only. We should not have changed behavior of routers. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-11-21 16:16:25 -08:00
YOSHIFUJI Hideaki	557e92efd4	[IPV6] ROUTE: Prefer reachable nexthop only if the caller requests. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-11-21 16:16:24 -08:00
YOSHIFUJI Hideaki	ea73ee23c4	[IPV6] ROUTE: Try to use router which is not known unreachable. Only routers in "FAILED" state should be considered unreachable. Otherwise, we do not try to use speicific routes unless all least specific routers are considered unreachable. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-11-21 16:16:23 -08:00
Patrick McHardy	337dde798d	[NETFILTER]: ip6_tables: use correct nexthdr value in ipv6_find_hdr() nexthdr is NEXTHDR_FRAGMENT, the nexthdr value from the fragment header is hp->nexthdr. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-15 21:18:50 -08:00
Patrick McHardy	d8a585d78e	[NETFILTER]: Use pskb_trim in {ip,ip6,nfnetlink}_queue Based on patch by James D. Nurmi: I've got some code very dependant on nfnetlink_queue, and turned up a large number of warns coming from skb_trim. While it's quite possibly my code, having not seen it on older kernels made me a bit suspect. Anyhow, based on some googling I turned up this thread: http://lkml.org/lkml/2006/8/13/56 And believe the issue to be related, so attached is a small patch to the kernel -- not sure if this is completely correct, but for anyone else hitting the WARN_ON(1) in skbuff.h, it might be helpful.. Signed-off-by: James D. Nurmi <jdnurmi@gmail.com> Ported to ip6_queue and nfnetlink_queue and added return value checks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-15 21:18:48 -08:00
Patrick McHardy	daccff024f	[IPV6]: Give sit driver an appropriate module alias. It would be nice to keep things working even with this built as a module, it took me some time to realize my IPv6 tunnel was broken because of the missing sit module. This module alias fixes things until distributions have added an appropriate alias to modprobe.conf. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-05 15:47:04 -08:00
Dmitry Mishin	36f73d0c3b	[IPV6]: Add ndisc_netdev_notifier unregister. If inet6_init() fails later than ndisc_init() call, or IPv6 module is unloaded, ndisc_netdev_notifier call remains in the list and will follows in oops later. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-05 14:11:33 -08:00
Al Viro	5b1225454f	[IPV6]: File the fingerprints off ah6->spi/esp6->spi In theory these are opaque 32bit values. However, we end up allocating them sequentially in host-endian and stick unchanged on the wire. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-01 15:42:35 -08:00
James Morris	1b7c2dbc07	[IPV6]: fix flowlabel seqfile handling There's a bug in the seqfile show operation for flowlabel objects, where each hash chain is traversed cumulatively for each element. The following function is called for each element of each chain: static void ip6fl_fl_seq_show(struct seq_file seq, struct ip6_flowlabel fl) { while(fl) { seq_printf... fl = fl->next; } } Thus, objects can appear mutliple times when reading /proc/net/ip6_flowlabel, as the above is called for each element in the chain. The solution is to remove the while() loop from the above, and traverse each chain exactly once, per the patch below. This also removes the ip6fl_fl_seq_show() function, which does nothing else. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-31 00:43:44 -08:00
James Morris	c6817e4c32	[IPV6]: return EINVAL for invalid address with flowlabel lease request Currently, when an application requests a lease for a flowlabel via the IPV6_FLOWLABEL_MGR socket option, no error is returned if an invalid type of destination address is supplied as part of the request, leading to a silent failure. This patch ensures that EINVAL is returned to the application in this case. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 18:56:06 -08:00
Dmitry Mishin	590bdf7fd2	[NETFILTER]: Missed and reordered checks in {arp,ip,ip6}_tables There is a number of issues in parsing user-provided table in translate_table(). Malicious user with CAP_NET_ADMIN may crash system by passing special-crafted table to the _tables. The first issue is that mark_source_chains() function is called before entry content checks. In case of standard target, mark_source_chains() function uses t->verdict field in order to determine new position. But the check, that this field leads no further, than the table end, is in check_entry(), which is called later, than mark_source_chains(). The second issue, that there is no check that target_offset points inside entry. If so, _ITERATE_MATCH macro will follow further, than the entry ends. As a result, we'll have oops or memory disclosure. And the third issue, that there is no check that the target is completely inside entry. Results are the same, as in previous issue. Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:44 -08:00
Patrick McHardy	844dc7c880	[NETFILTER]: remove masq/NAT from ip6tables Kconfig help Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:43 -08:00
James Morris	bcd620757d	[IPV6]: fix lockup via /proc/net/ip6_flowlabel There's a bug in the seqfile handling for /proc/net/ip6_flowlabel, where, after finding a flowlabel, the code will loop forever not finding any further flowlabels, first traversing the rest of the hash bucket then just looping. This patch fixes the problem by breaking after the hash bucket has been traversed. Note that this bug can cause lockups and oopses, and is trivially invoked by an unpriveleged user. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:42 -08:00
Heiko Carstens	a27b58fed9	[NET]: fix uaccess handling Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:41 -08:00
Patrick McHardy	6d381634d2	[NETFILTER]: Fix ip6_tables extension header bypass bug As reported by Mark Dowd <Mark_Dowd@McAfee.com>, ip6_tables is susceptible to a fragmentation attack causing false negatives on extension header matches. When extension headers occur in the non-first fragment after the fragment header (possibly with an incorrect nexthdr value in the fragment header) a rule looking for this extension header will never match. Drop fragments that are at offset 0 and don't contain the final protocol header regardless of the ruleset, since this should not happen normally. Since all extension headers are before the protocol header this makes sure an extension header is either not present or in the first fragment, where we can properly parse it. With help from Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-24 16:15:10 -07:00
Patrick McHardy	51d8b1a652	[NETFILTER]: Fix ip6_tables protocol bypass bug As reported by Mark Dowd <Mark_Dowd@McAfee.com>, ip6_tables is susceptible to a fragmentation attack causing false negatives on protocol matches. When the protocol header doesn't follow the fragment header immediately, the fragment header contains the protocol number of the next extension header. When the extension header and the protocol header are sent in a second fragment a rule like "ip6tables .. -p udp -j DROP" will never match. Drop fragments that are at offset 0 and don't contain the final protocol header regardless of the ruleset, since this should not happen normally. With help from Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-24 16:14:04 -07:00
Thomas Graf	375216ad0c	[IPv6] fib: initialize tb6_lock in common place to give lockdep a key Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-21 20:20:54 -07:00
David S. Miller	6723ab549d	[IPV6]: Fix route.c warnings when multiple tables are disabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-18 21:20:57 -07:00
Thomas Graf	9ce8ade015	[IPv6] route: Fix prohibit and blackhole routing decision Lookups resolving to ip6_blk_hole_entry must result in silently discarding the packets whereas an ip6_pkt_prohibit_entry is supposed to cause an ICMPV6_ADM_PROHIBITED message to be sent. Thanks to Kim Nordlund <kim.nordlund@nokia.com> for noticing this bug. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-18 20:46:54 -07:00
Ville Nuorvala	22e1e4d8dc	[IPV6]: Always copy rt->u.dst.error when copying a rt6_info. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-18 19:55:30 -07:00
Ville Nuorvala	264e91b68a	[IPV6]: Make IPV6_SUBTREES depend on IPV6_MULTIPLE_TABLES. As IPV6_SUBTREES can't work without IPV6_MULTIPLE_TABLES have IPV6_SUBTREES depend on it. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-18 19:55:29 -07:00
Ville Nuorvala	e0eda7bbaa	[IPV6]: Clean up BACKTRACK(). The fn check is unnecessary as fn can never be NULL in BACKTRACK(). Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-18 19:55:28 -07:00
Ville Nuorvala	4251320fa2	[IPV6]: Make sure error handling is done when calling ip6_route_output(). As ip6_route_output() never returns NULL, error checking must be done by looking at dst->error in stead of comparing dst against NULL. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-18 19:55:27 -07:00
Jan Dittmer	39c850863d	[IPV6] sit: Add missing MODULE_LICENSE This is missing the MODULE_LICENSE statements and taints the kernel upon loading. License is obvious from the beginning of the file. Signed-off-by: Jan Dittmer <jdi@l4x.org> Signed-off-by: Joerg Roedel <joro-lkml@zlug.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-15 23:14:21 -07:00
YOSHIFUJI Hideaki	f1a95859a8	[IPV6]: Remove bogus WARN_ON in Proxy-NA handling. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-15 23:14:20 -07:00
Thomas Graf	adaa70bbdf	[IPv6] rules: Use RT6_LOOKUP_F_HAS_SADDR and fix source based selectors Fixes rt6_lookup() to provide the source address in the flow and sets RT6_LOOKUP_F_HAS_SADDR whenever it is present in the flow. Avoids unnecessary prefix comparisons by checking for a prefix length first. Fixes the rule logic to not match packets if a source selector has been specified but no source address is available. Thanks to Kim Nordlund <kim.nordlund@nokia.com> for working on this patch with me. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-15 23:14:19 -07:00
YOSHIFUJI Hideaki	9469c7b4aa	[NET]: Use typesafe inet_twsk() inline function instead of cast. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:58 -07:00
YOSHIFUJI Hideaki	4244f8a9f8	[TCP]: Use TCPOLEN_TSTAMP_ALIGNED macro instead of magic number. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:54 -07:00
Joerg Roedel	0be669bb37	[IPV6]: Seperate sit driver to extra module (addrconf.c changes) This patch contains the changes to net/ipv6/addrconf.c to remove sit specific code if the sit driver is not selected. Signed-off-by: Joerg Roedel <joro-lkml@zlug.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:52 -07:00
Joerg Roedel	989e5b96e1	[IPV6]: Seperate sit driver to extra module This patch removes the driver of the IPv6-in-IPv4 tunnel driver (sit) from the IPv6 module. It adds an option to Kconfig which makes it possible to compile it as a seperate module. Signed-off-by: Joerg Roedel <joro-lkml@zlug.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:50 -07:00
Venkat Yekkirala	5b368e61c2	IPsec: correct semantics for SELinux policy matching Currently when an IPSec policy rule doesn't specify a security context, it is assumed to be "unlabeled" by SELinux, and so the IPSec policy rule fails to match to a flow that it would otherwise match to, unless one has explicitly added an SELinux policy rule allowing the flow to "polmatch" to the "unlabeled" IPSec policy rules. In the absence of such an explicitly added SELinux policy rule, the IPSec policy rule fails to match and so the packet(s) flow in clear text without the otherwise applicable xfrm(s) applied. The above SELinux behavior violates the SELinux security notion of "deny by default" which should actually translate to "encrypt by default" in the above case. This was first reported by Evgeniy Polyakov and the way James Morris was seeing the problem was when connecting via IPsec to a confined service on an SELinux box (vsftpd), which did not have the appropriate SELinux policy permissions to send packets via IPsec. With this patch applied, SELinux "polmatching" of flows Vs. IPSec policy rules will only come into play when there's a explicit context specified for the IPSec policy rule (which also means there's corresponding SELinux policy allowing appropriate domains/flows to polmatch to this context). Secondly, when a security module is loaded (in this case, SELinux), the security_xfrm_policy_lookup() hook can return errors other than access denied, such as -EINVAL. We were not handling that correctly, and in fact inverting the return logic and propagating a false "ok" back up to xfrm_lookup(), which then allowed packets to pass as if they were not associated with an xfrm policy. The solution for this is to first ensure that errno values are correctly propagated all the way back up through the various call chains from security_xfrm_policy_lookup(), and handled correctly. Then, flow_cache_lookup() is modified, so that if the policy resolver fails (typically a permission denied via the security module), the flow cache entry is killed rather than having a null policy assigned (which indicates that the packet can pass freely). This also forces any future lookups for the same flow to consult the security module (e.g. SELinux) for current security policy (rather than, say, caching the error on the flow cache entry). This patch: Fix the selinux side of things. This makes sure SELinux polmatching of flow contexts to IPSec policy rules comes into play only when an explicit context is associated with the IPSec policy rule. Also, this no longer defaults the context of a socket policy to the context of the socket since the "no explicit context" case is now handled properly. Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-10-11 23:59:37 -07:00
Linus Torvalds	fefd26b3b8	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/configh * master.kernel.org:/pub/scm/linux/kernel/git/davej/configh: Remove all inclusions of <linux/config.h> Manually resolved trivial path conflicts due to removed files in the sound/oss/ subdirectory.	2006-10-04 09:59:57 -07:00
Dave Jones	038b0a6d8d	Remove all inclusions of <linux/config.h> kbuild explicitly includes this at build time. Signed-off-by: Dave Jones <davej@redhat.com>	2006-10-04 03:38:54 -04:00
Diego Beltrami	0a69452cb4	[XFRM]: BEET mode This patch introduces the BEET mode (Bound End-to-End Tunnel) with as specified by the ietf draft at the following link: http://www.ietf.org/internet-drafts/draft-nikander-esp-beet-mode-06.txt The patch provides only single family support (i.e. inner family = outer family). Signed-off-by: Diego Beltrami <diego.beltrami@gmail.com> Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Abhinav Pathak <abhinav.pathak@hiit.fi> Signed-off-by: Jeff Ahrenholz <ahrenholz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-04 00:31:09 -07:00
Herbert Xu	1e0c14f49d	[UDP]: Fix MSG_PROBE crash UDP tracks corking status through the pending variable. The IP layer also tracks it through the socket write queue. It is possible for the two to get out of sync when MSG_PROBE is used. This patch changes UDP to check the write queue to ensure that the two stay in sync. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-04 00:31:00 -07:00
Herbert Xu	132a55f3c5	[UDP6]: Fix flowi clobbering The udp6_sendmsg function uses a shared buffer to store the flow without taking any locks. This leads to races with SMP. This patch moves the flowi object onto the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-04 00:30:59 -07:00
Herbert Xu	1a9e9ef684	[IPV6]: Disable SG for GSO unless we have checksum Because the system won't turn off the SG flag for us we need to do this manually on the IPv6 path. Otherwise we will throw IPv6 packets with bad checksums at the hardware. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:45 -07:00
Al Viro	a252cc2371	[XFRM]: xrfm_replay_check() annotations seq argument is net-endian Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:40 -07:00
Al Viro	6067b2baba	[XFRM]: xfrm_parse_spi() annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:39 -07:00
Al Viro	a94cfd1974	[XFRM]: xfrm_state_lookup() annotations spi argument of xfrm_state_lookup() is net-endian Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:37 -07:00
Al Viro	8f83f23e6d	[XFRM]: ports in struct xfrm_selector annotated Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:33 -07:00
Al Viro	82103232ed	[IPV4]: inet_rcv_saddr() annotations inet_rcv_saddr() returns net-endian Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:28 -07:00
Al Viro	4f765d842f	[IPV4]: INET_MATCH() annotations INET_MATCH() and friends depend on an interesting set of kludges: * there's a pair of adjacent fields in struct inet_sock - __be16 dport followed by __u16 num. We want to search by pair, so we combine the keys into a single 32bit value and compare with 32bit value read from &...->dport. * on 64bit targets we combine comparisons with pair of adjacent __be32 fields in the same way. Make sure that we don't mix those values with anything else and that pairs we form them from have correct types. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:25 -07:00
Al Viro	fd68322209	[IPV4]: inet_addr_type() annotations argument and inferred net-endian variables in callers annotated. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:01:07 -07:00
Fabio Olive Leite	293b9c4251	[IPV6]: bh_lock_sock_nested on tcp_v6_rcv A while ago Ingo patched tcp_v4_rcv on net/ipv4/tcp_ipv4.c to use bh_lock_sock_nested and silence a lock validator warning. This fixed it for IPv4, but recently I saw a report of the same warning on IPv6. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 17:53:54 -07:00
Noriaki TAKAMIYA	3b9f9a1c39	[IPV6] ADDRCONF: Mobile IPv6 Home Address support. IFA_F_HOMEADDRESS is introduced for Mobile IPv6 Home Addresses on Mobile Node. The IFA_F_HOMEADDRESS flag should be set for Mobile IPv6 Home Addresses for 2 purposes. 1) We need to check this on receipt of Type 2 Routing Header (RFC3775 Secion 6.4), 2) We prefer Home Address(es) in source address selection (RFC3484 Section 5 Rule 4). Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:20:29 -07:00
Noriaki TAKAMIYA	55ebaef1d5	[IPV6] ADDRCONF: Allow non-DAD'able addresses. IFA_F_NODAD flag, similar to IN6_IFF_NODAD in BSDs, is introduced to skip DAD. This flag should be set to Mobile IPv6 Home Address(es) on Mobile Node because DAD would fail if we should perform DAD; our Home Agent protects our Home Address(es). Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:20:28 -07:00
YOSHIFUJI Hideaki	fc26d0abd5	[IPV6] NDISC: Fix is_router flag setting. We did not send appropriate IsRouter flag if the forwarding setting is positive even value. Let's give 1/0 value to ndisc_send_na(). Also, existing users of ndisc_send_na() give 0/1 to override, we can omit redundant operation in that function. Bug hinted by Nicolas Dichtel <nicolas.dichtel@6wind.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:20:27 -07:00
YOSHIFUJI Hideaki	8814c4b533	[IPV6] ADDRCONF: Convert addrconf_lock to RCU. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:20:26 -07:00
YOSHIFUJI Hideaki	fbea49e1e2	[IPV6] NDISC: Add proxy_ndp sysctl. We do not always need proxy NDP functionality even we enable forwarding. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:20:25 -07:00
Ville Nuorvala	62dd93181a	[IPV6] NDISC: Set per-entry is_router flag in Proxy NA. We have sent NA with router flag from the node-wide forwarding configuration. This is not appropriate for proxy NA, and it should be set according to each proxy entry's configuration. This is used by Mobile IPv6 home agent to support physical home link in acting as a proxy router for mobile node which is not a router, for example. Based on MIPL2 kernel patch. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-09-22 15:20:24 -07:00
Ville Nuorvala	5f3e6e9e19	[IPV6] NDISC: Avoid updating neighbor cache for proxied address in receiving NA. This aims at proxying router not updating neighbor cache entry for proxied address when it receives NA because either the proxied node is off link or it has already sent a NA to the proxied router. Based on MIPL2 kernel patch. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-09-22 15:20:23 -07:00
Ville Nuorvala	74553b09dc	[IPV6]: Don't forward packets to proxied link-local address. Proxying router can't forward traffic sent to link-local address, so signal the sender and discard the packet. This behavior is clarified by Mobile IPv6 specification (RFC3775) but might be required for all proxying router. Based on MIPL2 kernel patch. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-09-22 15:20:22 -07:00
Ville Nuorvala	e21e0b5f19	[IPV6] NDISC: Handle NDP messages to proxied addresses. It is required to respond to NDP messages sent directly to the "target" unicast address. Proxying node (router) is required to handle such messages. To achieve this, check if the packet in forwarding patch is NDP message. With this patch, the proxy neighbor entries are always looked up in forwarding path. We may want to optimize further. Based on MIPL2 kernel patch. Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-09-22 15:20:21 -07:00
Brian Haley	1192e403e9	[NETFILTER]: make some netfilter globals __read_mostly Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:58 -07:00
Patrick McHardy	7cf73936fe	[NETFILTER]: ip6t_HL: remove write-only variable Noticed by Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:55 -07:00
Dmitry Mishin	90d47db4a0	[NETFILTER]: x_tables: small check_entry & module_refcount cleanup While standard_target has target->me == NULL, module_put() should be called for it as for others, because there were try_module_get() before. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:51 -07:00
Patrick McHardy	9123de2c04	[NETFILTER]: ip6table_mangle: reroute when nfmark changes in NF_IP6_LOCAL_OUT Now that IPv6 supports policy routing we need to reroute in NF_IP6_LOCAL_OUT when the mark value changes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:51 -07:00
Patrick McHardy	df0933dcb0	[NETFILTER]: kill listhelp.h Kill listhelp.h and use the list.h functions instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:45 -07:00
Patrick McHardy	a1e59abf82	[XFRM]: Fix wildcard as tunnel source Hashing SAs by source address breaks templates with wildcards as tunnel source since the source address used for hashing/lookup is still 0/0. Move source address lookup to xfrm_tmpl_resolve_one() so we can use the real address in the lookup. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:06 -07:00
Thomas Graf	7198f8cec1	[IPV6] address: Support NLM_F_EXCL when adding addresses iproute2 doesn't provide the NLM_F_CREATE flag when adding addresses, it is assumed to be implied. The existing code issues a check on said flag when the modify operation fails (likely due to ENOENT) before continueing to create it, this leads to a hard to predict result, therefore the NLM_F_CREATE check is removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:02 -07:00
Thomas Graf	680a27a23a	[IPV6] address: Allow address changes while device is administrative down Same behaviour as IPv4, using IFF_UP is a no-no anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:01 -07:00
Thomas Graf	0ab6803bc9	[IPV6] address: Convert address dumping to new netlink api Replaces INET6_IFADDR_RTA_SPACE with a new function calculating the total required message size for all address messages. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:00 -07:00
Thomas Graf	101bb22969	[IPV6] address: Add put_ifaddrmsg() and rt_scope() Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:59 -07:00
Thomas Graf	85486af00b	[IPV6] address: Add put_cacheinfo() to dump struct cacheinfo Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:58 -07:00
Thomas Graf	1b29fc2c8b	[IPV6] address: Convert address lookup to new netlink api Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:57 -07:00
Thomas Graf	b933f7166b	[IPV6] address: Convert address deletion to new netlink api Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:56 -07:00
Thomas Graf	461d8837fa	[IPV6] address: Convert address addition to new netlink api Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:55 -07:00
Brian Haley	94aec08ea4	[NETFILTER]: Change tunables to __read_mostly Change some netfilter tunables to __read_mostly. Also fixed some incorrect file reference comments while I was in there. (this will be my last __read_mostly patch unless someone points out something else that needs it) Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:54 -07:00
Jamal Hadi Salim	eb878e8457	[IPSEC]: output mode to take an xfrm state as input param Expose IPSEC modes output path to take an xfrm state as input param. This makes it consistent with the input mode processing (which already takes the xfrm state as a param). Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:48 -07:00
Dmitry Mishin	fda9ef5d67	[NET]: Fix sk->sk_filter field access Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg needlock = 0, while socket is not locked at that moment. In order to avoid this and similar issues in the future, use rcu for sk->sk_filter field read protection. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Kirill Korotaev <dev@openvz.org>	2006-09-22 15:18:47 -07:00
Masahide NAKAMURA	dc435e6dac	[IPV6] MIP6: Fix to update IP6CB when cloned skbuff is received at HAO. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:46 -07:00
YOSHIFUJI Hideaki	33cc489668	[IPV6] ROUTE: Fix dst reference counting in ip6_pol_route_lookup(). In ip6_pol_route_lookup(), when we finish backtracking at the top-level root entry, we need to hold it. Bug noticed by Mitsuru Chinen <CHINEN@jp.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:26 -07:00
Thomas Graf	5176f91ea8	[NETLINK]: Make use of NLA_STRING/NLA_NUL_STRING attribute validation Converts existing NLA_STRING attributes to use the new validation features, saving a couple of temporary buffers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:25 -07:00
Gerrit Renker	25030a7f9e	[UDP]: Unify UDPv4 and UDPv6 ->get_port() This patch creates one common function which is called by udp_v4_get_port() and udp_v6_get_port(). As a result, * duplicated code is removed * udp_port_rover and local port lookup can now be removed from udp.h * further savings follow since the same function will be used by UDP-Litev4 and UDP-Litev6 In contrast to the patch sent in response to Yoshifujis comments (fixed by this variant), the code below also removes the EXPORT_SYMBOL(udp_port_rover), since udp_port_rover can now remain local to net/ipv4/udp.c. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:21 -07:00
Alexey Dobriyan	e5d679f339	[NET]: Use SLAB_PANIC Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:19 -07:00
YOSHIFUJI Hideaki	ef047f5e10	[NET]: Use BUILD_BUG_ON() for checking size of skb->cb. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:15 -07:00
Patrick McHardy	366e4adc0f	[IPV6]: Fix routing by fwmark Fix mark comparison, also dump the mask to userspace when the mask is zero, but the mark is not (in which case the mark is dumped, so the mask is needed to make sense of it). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:14 -07:00
David S. Miller	267935b197	[IPV6]: Fix build with fwmark disabled. Based upon a patch by Brian Haley. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:09 -07:00
YOSHIFUJI Hideaki	cd9d742622	[IPV6] ROUTE: Add support for fwmask in routing rules. Add support for fwmark masks. A mask of 0xFFFFFFFF is used when a mark value != 0 is sent without a mask. Based on patch for net/ipv4/fib_rules.c by Patrick McHardy <kaber@trash.net>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:08 -07:00
YOSHIFUJI Hideaki	2613aad5ab	[IPV6] ROUTE: Fix size of fib6_rule_policy. It should not be RTA_MAX+1 but FRA_MAX+1. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:07 -07:00
YOSHIFUJI Hideaki	6c5eb6a507	[IPV6] ROUTE: Fix FWMARK support. - Add missing nla_policy entry. - type of fwmark is u32, not u8. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:06 -07:00
YOSHIFUJI Hideaki	75bff8f023	[IPV6] ROUTE: Routing by FWMARK. Based on patch by Jean Lorchat <lorchat@sfc.wide.ad.jp>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-09-22 15:18:00 -07:00
YOSHIFUJI Hideaki	2cc67cc731	[IPV6] ROUTE: Routing by Traffic Class. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-09-22 15:17:59 -07:00
YOSHIFUJI Hideaki	e731c248ba	[IPV6] MIP6: Several obvious clean-ups. - Remove redundant code. Pointed out by Brian Haley <brian.haley@hp.com>. - Unify code paths with/without CONFIG_IPV6_MIP. - Use NIP6_FMT for IPv6 address textual presentation. - Fold long line. Pointed out by David Miller <davem@davemloft.net>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-09-22 15:17:58 -07:00
David S. Miller	e4bec827fe	[IPSEC] esp: Defer output IV initialization to first use. First of all, if the xfrm_state only gets used for input packets this entropy is a complete waste. Secondly, it is often the case that a configuration loads many rules (perhaps even dynamically) and they don't all necessarily ever get used. This get_random_bytes() call was showing up in the profiles for xfrm_state inserts which is how I noticed this. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:17:35 -07:00
David S. Miller	9d4a706d85	[XFRM]: Add generation count to xfrm_state and xfrm_dst. Each xfrm_state inserted gets a new generation counter value. When a bundle is created, the xfrm_dst objects get the current generation counter of the xfrm_state they will attach to at dst->xfrm. xfrm_bundle_ok() will return false if it sees an xfrm_dst with a generation count different from the generation count of the xfrm_state that dst points to. This provides a facility by which to passively and cheaply invalidate cached IPSEC routes during SA database changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:08:42 -07:00
David S. Miller	edcd582152	[XFRM]: Pull xfrm_state_by{spi,src} hash table knowledge out of afinfo. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:08:39 -07:00
David S. Miller	2770834c9f	[XFRM]: Pull xfrm_state_bydst hash table knowledge out of afinfo. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:08:38 -07:00
Masahide NAKAMURA	64d9fdda8e	[XFRM] IPV6: Support Mobile IPv6 extension headers sorting. Support Mobile IPv6 extension headers sorting for two transformation policies. Mobile IPv6 extension headers should be placed after IPsec transport mode, but before transport AH when outbound. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:08:37 -07:00
Masahide NAKAMURA	58c949d1b9	[XFRM] IPV6: Add sort functions to combine templates/states for IPsec. Add sort functions to combine templates/states for IPsec. Think of outbound transformation order we should be careful with transport AH which must be the last of all transport ones. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:08:36 -07:00
Masahide NAKAMURA	01be8e5d59	[IPV6] MIP6: Ignore to report if mobility headers is rejected. Ignore to report user-space for known mobility headers rejected by destination options header transformation. Mobile IPv6 specification (RFC3775) says that mobility header is used with destination options header carrying home address option only for binding update message. Other type message cannot be used and node must drop it silently (and must not send binding error) if receving such packet. To achieve it, (1) application should use transformation policy and wild-card states to catch binding update message prior other packets (2) kernel doesn't report the reject to user-space not to send binding error message by application. This patch is for (2). Based on MIPL2 kernel patch. This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:08:32 -07:00
Masahide NAKAMURA	70182ed23d	[IPV6] MIP6: Report to user-space when home address option is rejected. Report to user-space when home address option is rejected. In receiving this message user-space application will send Mobile IPv6 binding error. It is rate-limited by kernel. Based on MIPL2 kernel patch. This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:08:31 -07:00
Masahide NAKAMURA	2ce4272a69	[IPV6] MIP6: Transformation support mobility header. Transformation support mobility header. Based on MIPL2 kernel patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:07:03 -07:00
Masahide NAKAMURA	6e8f4d48b2	[IPV6] MIP6: Add sending mobility header functions through raw socket. Mobility header is built by user-space and sent through raw socket. Kernel just extracts its type to flow. Based on MIPL2 kernel patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:07:02 -07:00
Masahide NAKAMURA	7be96f7628	[IPV6] MIP6: Add receiving mobility header functions through raw socket. Like ICMPv6, mobility header is handled through raw socket. In inbound case, check only whether ICMPv6 error should be sent as a reply or not by kernel. Based on MIPL2 kernel patch. This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi> This patch was also written by: Antti Tuominen <anttit@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:07:01 -07:00
Noriaki TAKAMIYA	3d126890dd	[IPV6] MIP6: Add destination options header transformation. Add destination options header transformation for Mobile IPv6. Based on MIPL2 kernel patch. This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:58 -07:00
Noriaki TAKAMIYA	2c8d7ca0f7	[IPV6] MIP6: Add routing header type 2 transformation. Add routing header type 2 transformation for Mobile IPv6. Based on MIPL2 kernel patch. Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:57 -07:00
Masahide NAKAMURA	27637df92e	[IPV6] IPSEC: Support sending with Mobile IPv6 extension headers. Mobile IPv6 defines home address option as an option of destination options header. It is placed before fragment header then ip6_find_1stfragopt() is fixed to know about it. Home address option also carries final source address of the flow, then outbound AH calculation should take care of it like routing header case. Based on MIPL2 kernel patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:56 -07:00
Masahide NAKAMURA	793832361f	[IPV6] MIP6: Revert address to send ICMPv6 error. IPv6 source address is replaced in receiving packet with home address option carried by destination options header. To send ICMPv6 error back, original address which is received one on wire should be used. This function checks such header is included and reverts them. Based on MIPL2 kernel patch. This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:55 -07:00
Masahide NAKAMURA	a831f5bbc8	[IPV6] MIP6: Add inbound interface of home address option. Add inbound function of home address option by registering it to TLV table for destination options header. Based on MIPL2 kernel patch. This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:53 -07:00
Masahide NAKAMURA	a80ff03e05	[IPV6]: Allow to replace skbuff by TLV parser. In receiving Mobile IPv6 home address option which is a TLV carried by destination options header, kernel will try to mangle source adderss of packet. Think of cloned skbuff it is required to replace it by the parser just like routing header case. This is a framework to achieve that to allow TLV parser to replace inbound skbuff pointer. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:51 -07:00
Masahide NAKAMURA	c61a404325	[IPV6]: Find option offset by type. This is a helper to search option offset from extension header which can carry TLV option like destination options header. Mobile IPv6 home address option will use it. Based on MIPL2 kernel patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:50 -07:00
Masahide NAKAMURA	280a9d3400	[IPV6] MIP6: Add socket option and ancillary data interface of routing header type 2. Add socket option and ancillary data interface of routing header type 2. Mobile IPv6 application will use this to send binding acknowledgement with the header without relation of confirmed route optimization (binding). Based on MIPL2 kernel patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:49 -07:00
Masahide NAKAMURA	65d4ed9221	[IPV6] MIP6: Add inbound interface of routing header type 2. Add inbound interface of routing header type 2 for Mobile IPv6. Based on MIPL2 kernel patch. This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:48 -07:00
Masahide NAKAMURA	ee53826801	[IPV6]: Add Kconfig to enable Mobile IPv6. Add Kconfig to enable Mobile IPv6. Based on MIPL2 kernel patch. Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2006-09-22 15:06:46 -07:00
Masahide NAKAMURA	e53820de0f	[XFRM] IPV6: Restrict bundle reusing For outbound transformation, bundle is checked whether it is suitable for current flow to be reused or not. In such IPv6 case as below, transformation may apply incorrect bundle for the flow instead of creating another bundle: - The policy selector has destination prefix length < 128 (Two or more addresses can be matched it) - Its bundle holds dst entry of default route whose prefix length < 128 (Previous traffic was used such route as next hop) - The policy and the bundle were used a transport mode state and this time flow address is not matched the bundled state. This issue is found by Mobile IPv6 usage to protect mobility signaling by IPsec, but it is not a Mobile IPv6 specific. This patch adds strict check to xfrm_bundle_ok() for each state mode and address when prefix length is less than 128. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:44 -07:00
Masahide NAKAMURA	9afaca0579	[XFRM] IPV6: Update outbound state timestamp for each sending. With this patch transformation state is updated last used time for each sending. Xtime is used for it like other state lifetime expiration. Mobile IPv6 enabled nodes will want to know traffic status of each binding (e.g. judgement to request binding refresh by correspondent node, or to keep home/care-of nonce alive by mobile node). The last used timestamp is an important hint about it. Based on MIPL2 kernel patch. This patch was also written by: Henrik Petander <petander@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:43 -07:00
Masahide NAKAMURA	1b5c229987	[XFRM] STATE: Support non-fragment outbound transformation headers. For originated outbound IPv6 packets which will fragment, ip6_append_data() should know length of extension headers before sending them and the length is carried by dst_entry. IPv6 IPsec headers fragment then transformation was designed to place all headers after fragment header. OTOH Mobile IPv6 extension headers do not fragment then it is a good idea to make dst_entry have non-fragment length to tell it to ip6_append_data(). Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:41 -07:00
Masahide NAKAMURA	99505a8436	[XFRM] STATE: Add a hook to obtain local/remote outbound address. Outbound transformation replaces both source and destination address with state's end-point addresses at the same time when IPsec tunnel mode. It is also required to change them for Mobile IPv6 route optimization, but we should care about the following differences: - changing result is not end-point but care-of address - either source or destination is replaced for each state This hook is a common platform to change outbound address. Based on MIPL2 kernel patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:41 -07:00
Masahide NAKAMURA	fbd9a5b47e	[XFRM] STATE: Common receive function for route optimization extension headers. XFRM_STATE_WILDRECV flag is introduced; the last resort state is set it and receives packet which is not route optimized but uses such extension headers i.e. Mobile IPv6 signaling (binding update and acknowledgement). A node enabled Mobile IPv6 adds the state. Based on MIPL2 kernel patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:39 -07:00
Masahide NAKAMURA	1d71627d69	[XFRM] STATE: Introduce route optimization mode. Route optimization is used with routing header and destination options header for Mobile IPv6. At outbound it makes header space like IPsec transport. At inbound it does nothing because exhdrs.c functions have responsibility to update skbuff information for these headers. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:37 -07:00
Masahide NAKAMURA	aee5adb430	[XFRM] STATE: Add a hook to find offset to be inserted header in outbound. On current kernel, ip6_find_1stfragopt() is used by IPv6 IPsec to find offset to be inserted header in outbound for transport mode. (BTW, no usage may be needed for IPv4 case.) Mobile IPv6 requires another logic for routing header and destination options header respectively. This patch is common platform for the offset and adopts it to IPsec. Based on MIPL2 kernel patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:36 -07:00
Masahide NAKAMURA	eb2971b68a	[XFRM] STATE: Search by address using source address list. This is a support to search transformation states by its addresses by using source address list for Mobile IPv6 usage. To use it from user-space, it is also added a message type for source address as a xfrm state option. Based on MIPL2 kernel patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:35 -07:00
Masahide NAKAMURA	6c44e6b7ab	[XFRM] STATE: Add source address list. Support source address based searching. Mobile IPv6 will use it. Based on MIPL2 kernel patch. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:06:34 -07:00
Masahide NAKAMURA	7e49e6de30	[XFRM]: Add XFRM_MODE_xxx for future use. Transformation mode is used as either IPsec transport or tunnel. It is required to add two more items, route optimization and inbound trigger for Mobile IPv6. Based on MIPL2 kernel patch. This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:05:15 -07:00
YOSHIFUJI Hideaki	77d16f450a	[IPV6] ROUTE: Unify RT6_F_xxx and RT6_SELECT_F_xxx flags Unify RT6_F_xxx and RT6_SELECT_F_xxx flags into RT6_LOOKUP_F_xxx flags, and put them into ip6_route.h Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:56 -07:00
YOSHIFUJI Hideaki	4e96c2b418	[IPV6] KCONFIG: Add subtrees support. This is for developers only. Based on MIPL2 kernel patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:55 -07:00
YOSHIFUJI Hideaki	c0bece9f2a	[IPV6] ROUTE: Add credits about subtree fixes. Based on MIPL2 kernel patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:55 -07:00
YOSHIFUJI Hideaki	cb15d9c224	[IPV6] NDISC: Search subtrees when backtracking on receipt of redirects. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:54 -07:00
YOSHIFUJI Hideaki	150730d5a5	[IPV6] ROUTE: Purge clones on other trees when deleting a route. Based on MIPL2 kernel patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:53 -07:00
YOSHIFUJI Hideaki	982f56f3a9	[IPV6] ROUTE: Search subtree when backtracking. Based on MIPL2 kernel patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:52 -07:00
YOSHIFUJI Hideaki	7fc33165a7	[IPV6] ROUTE: Put SUBTREE() as FIB6_SUBTREE() into ip6_fib.h for future use. Based on MIPL2 kernel patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:51 -07:00
YOSHIFUJI Hideaki	fefc2a6c20	[IPV6] ROUTE: Allow searching subtree only. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:50 -07:00
YOSHIFUJI Hideaki	825e288ef4	[IPV6] ROUTE: Make sure we do not exceed args in fib6_lookup_1(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:49 -07:00

... 10 11 12 13 14 ...

1602 Commits