linux

Commit Graph

Author	SHA1	Message	Date
Florian Westphal	b65e3d7be0	xfrm: state: add sequence count to detect hash resizes Once xfrm_state_find is lockless we have to cope with a concurrent resize opertion. We use a sequence counter to block in case a resize is in progress and to detect if we might have missed a state that got moved to a new hash table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:23:24 +02:00
Florian Westphal	df7274eb70	xfrm: state: delay freeing until rcu grace period has elapsed The hash table backend memory and the state structs are free'd via kfree/vfree. Once we only rely on rcu during lookups we have to make sure no other cpu is currently accessing this before doing the free. Free operations already happen from worker so we can use synchronize_rcu to wait until concurrent readers are done. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:23:23 +02:00
Florian Westphal	02efdff7e2	xfrm: state: use atomic_inc_not_zero to increment refcount Once xfrm_state_lookup_byaddr no longer acquires the state lock another cpu might be freeing the state entry at the same time. To detect this we use atomic_inc_not_zero, we then signal -EAGAIN to caller in case our result was stale. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:23:23 +02:00
Florian Westphal	ae3fb6d321	xfrm: state: use hlist_for_each_entry_rcu helper This is required once we allow lockless access of bydst/bysrc hash tables. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:23:23 +02:00
Julia Lawall	e45a8a9e60	xfrm: constify xfrm_replay structures The xfrm_replay structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:18:49 +02:00
Tobias Brunner	6916fb3b10	xfrm: Ignore socket policies when rebuilding hash tables Whenever thresholds are changed the hash tables are rebuilt. This is done by enumerating all policies and hashing and inserting them into the right table according to the thresholds and direction. Because socket policies are also contained in net->xfrm.policy_all but no hash tables are defined for their direction (dir + XFRM_POLICY_MAX) this causes a NULL or invalid pointer dereference after returning from policy_hash_bysel() if the rebuild is done while any socket policies are installed. Since the rebuild after changing thresholds is scheduled this crash could even occur if the userland sets thresholds seemingly before installing any socket policies. Fixes: `53c2e285f9` ("xfrm: Do not hash socket policies") Signed-off-by: Tobias Brunner <tobias@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-07-29 10:21:54 +02:00
Vegard Nossum	7677c7560c	xfrm: get rid of another incorrect WARN During fuzzing I regularly run into this WARN(). According to Herbert Xu, this "certainly shouldn't be a WARN, it probably shouldn't print anything either". Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-07-27 13:09:00 +02:00
Vegard Nossum	73efc3245f	xfrm: get rid of incorrect WARN AFAICT this message is just printed whenever input validation fails. This is a normal failure and we shouldn't be dumping the stack over it. Looks like it was originally a printk that was maybe incorrectly upgraded to a WARN: commit `62db5cfd70` Author: stephen hemminger <shemminger@vyatta.com> Date: Wed May 12 06:37:06 2010 +0000 xfrm: add severity to printk Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-07-27 13:07:46 +02:00
Vegard Nossum	1ba5bf993c	xfrm: fix crash in XFRM_MSG_GETSA netlink handler If we hit any of the error conditions inside xfrm_dump_sa(), then xfrm_state_walk_init() never gets called. However, we still call xfrm_state_walk_done() from xfrm_dump_sa_done(), which will crash because the state walk was never initialized properly. We can fix this by setting cb->args[0] only after we've processed the first element and checking this before calling xfrm_state_walk_done(). Fixes: `d3623099d3` ("ipsec: add support of limited SA dump") Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-07-18 09:37:02 +02:00
David S. Miller	e800072c18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net In netdevice.h we removed the structure in net-next that is being changes in 'net'. In macsec.c and rtnetlink.c we have overlaps between fixes in 'net' and the u64 attribute changes in 'net-next'. The mlx5 conflicts have to do with vxlan support dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-09 15:59:24 -04:00
David S. Miller	32b583a0cb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2016-05-04 1) The flowcache can hit an OOM condition if too many entries are in the gc_list. Fix this by counting the entries in the gc_list and refuse new allocations if the value is too high. 2) The inner headers are invalid after a xfrm transformation, so reset the skb encapsulation field to ensure nobody tries access the inner headers. Otherwise tunnel devices stacked on top of xfrm may build the outer headers based on wrong informations. 3) Add pmtu handling to vti, we need it to report pmtu informations for local generated packets. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 16:35:31 -04:00
Nicolas Dichtel	de95c4a46a	xfrm: align nlattr properly when needed Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:25 -04:00
subashab@codeaurora.org	071d36bf21	xfrm: Fix crash observed during device unregistration and decryption A crash is observed when a decrypted packet is processed in receive path. get_rps_cpus() tries to dereference the skb->dev fields but it appears that the device is freed from the poison pattern. [<ffffffc000af58ec>] get_rps_cpu+0x94/0x2f0 [<ffffffc000af5f94>] netif_rx_internal+0x140/0x1cc [<ffffffc000af6094>] netif_rx+0x74/0x94 [<ffffffc000bc0b6c>] xfrm_input+0x754/0x7d0 [<ffffffc000bc0bf8>] xfrm_input_resume+0x10/0x1c [<ffffffc000ba6eb8>] esp_input_done+0x20/0x30 [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc [<ffffffc0000b7324>] worker_thread+0x2f8/0x418 [<ffffffc0000bb40c>] kthread+0xe0/0xec -013\|get_rps_cpu( \| dev = 0xFFFFFFC08B688000, \| skb = 0xFFFFFFC0C76AAC00 -> ( \| dev = 0xFFFFFFC08B688000 -> ( \| name = "...................................................... \| name_hlist = (next = 0xAAAAAAAAAAAAAAAA, pprev = 0xAAAAAAAAAAA Following are the sequence of events observed - - Encrypted packet in receive path from netdevice is queued - Encrypted packet queued for decryption (asynchronous) - Netdevice brought down and freed - Packet is decrypted and returned through callback in esp_input_done - Packet is queued again for process in network stack using netif_rx Since the device appears to have been freed, the dereference of skb->dev in get_rps_cpus() leads to an unhandled page fault exception. Fix this by holding on to device reference when queueing packets asynchronously and releasing the reference on call back return. v2: Make the change generic to xfrm as mentioned by Steffen and update the title to xfrm Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jerome Stanislaus <jeromes@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-24 14:29:36 -04:00
Andy Lutomirski	2bf8c47626	net/xfrm_user: use in_compat_syscall to deny compat syscalls The code wants to prevent compat code from receiving messages. Use in_compat_syscall for this. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-22 15:36:02 -07:00
Steffen Klassert	215276c014	xfrm: Reset encapsulation field of the skb before transformation The inner headers are invalid after a xfrm transformation. So reset the skb encapsulation field to ensure nobody tries to access the inner headers. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-03-17 10:28:44 +01:00
Herbert Xu	17bc197022	ipsec: Use skcipher and ahash when probing algorithms This patch removes the last reference to hash and ablkcipher from IPsec and replaces them with ahash and skcipher respectively. For skcipher there is currently no difference at all, while for ahash the current code is actually buggy and would prevent asynchronous algorithms from being discovered. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>	2016-01-27 20:36:07 +08:00
Konstantin Khlebnikov	9207f9d45b	net: preserve IP control block during GSO segmentation Skb_gso_segment() uses skb control block during segmentation. This patch adds 32-bytes room for previous control block which will be copied into all resulting segments. This patch fixes kernel crash during fragmenting forwarded packets. Fragmentation requires valid IP CB in skb for clearing ip options. Also patch removes custom save/restore in ovs code, now it's redundant. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-15 14:35:24 -05:00
David S. Miller	024f35c552	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2015-12-22 Just one patch to fix dst_entries_init with multiple namespaces. From Dan Streetman. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-22 16:26:31 -05:00
Eric Dumazet	d188ba86dd	xfrm: add rcu protection to sk->sk_policy[] XFRM can deal with SYNACK messages, sent while listener socket is not locked. We add proper rcu protection to __xfrm_sk_clone_policy() and xfrm_sk_policy_lookup() This might serve as the first step to remove xfrm.xfrm_policy_lock use in fast path. Fixes: `fa76ce7328` ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-11 19:22:06 -05:00
Eric Dumazet	56f047305d	xfrm: add rcu grace period in xfrm_policy_destroy() We will soon switch sk->sk_policy[] to RCU protection, as SYNACK packets are sent while listener socket is not locked. This patch simply adds RCU grace period before struct xfrm_policy freeing, and the corresponding rcu_head in struct xfrm_policy. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-11 19:22:06 -05:00
Eric Dumazet	bd5eb35f16	xfrm: take care of request sockets TCP SYNACK messages might now be attached to request sockets. XFRM needs to get back to a listener socket. Adds new helpers that might be used elsewhere : sk_to_full_sk() and sk_const_to_full_sk() Note: We also need to add RCU protection for xfrm lookups, now TCP/DCCP have lockless listener processing. This will be addressed in separate patches. Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-07 17:07:33 -05:00
Dan Streetman	a8a572a6b5	xfrm: dst_entries_init() per-net dst_ops Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops templates; their dst_entries counters will never be used. Move the xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init and dst_entries_destroy for each net namespace. The ipv4 and ipv6 xfrms each create dst_ops template, and perform dst_entries_init on the templates. The template values are copied to each net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops pcpuc_entries field is a percpu counter and cannot be used correctly by simply copying it to another object. The result of this is a very subtle bug; changes to the dst entries counter from one net namespace may sometimes get applied to a different net namespace dst entries counter. This is because of how the percpu counter works; it has a main count field as well as a pointer to the percpu variables. Each net namespace maintains its own main count variable, but all point to one set of percpu variables. When any net namespace happens to change one of the percpu variables to outside its small batch range, its count is moved to the net namespace's main count variable. So with multiple net namespaces operating concurrently, the dst_ops entries counter can stray from the actual value that it should be; if counts are consistently moved from one net namespace to another (which my testing showed is likely), then one net namespace winds up with a negative dst_ops count while another winds up with a continually increasing count, eventually reaching its gc_thresh limit, which causes all new traffic on the net namespace to fail with -ENOBUFS. Signed-off-by: Dan Streetman <dan.streetman@canonical.com> Signed-off-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-11-03 08:42:57 +01:00
David S. Miller	e7b63ff115	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-10-30 1) The flow cache is limited by the flow cache limit which depends on the number of cpus and the xfrm garbage collector threshold which is independent of the number of cpus. This leads to the fact that on systems with more than 16 cpus we hit the xfrm garbage collector limit and refuse new allocations, so new flows are dropped. On systems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. 2) Fix some unaligned accesses on sparc systems. From Sowmini Varadhan. 3) Fix some header checks in _decode_session4. We may call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull. This can lead to incorrect policy lookups. We fix this by a check of the data pointer position before we call pskb_may_pull. 4) Reload skb header pointers after calling pskb_may_pull in _decode_session4 as this may change the pointers into the packet. 5) Add a missing statistic counter on inner mode errors. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-30 20:51:56 +09:00
David S. Miller	ba3e2084f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:54:12 -07:00
Steffen Klassert	cb866e3298	xfrm: Increment statistic counter on inner mode error Increment the LINUX_MIB_XFRMINSTATEMODEERROR statistic counter to notify about dropped packets if we fail to fetch a inner mode. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-10-23 07:52:58 +02:00
Sowmini Varadhan	e33d4f13d2	xfrm: Fix unaligned access to stats in copy_to_user_state() On sparc, deleting established SAs (e.g., by restarting ipsec) results in unaligned access messages via xfrm_del_sa -> km_state_notify -> xfrm_send_state_notify(). Even though struct xfrm_usersa_info is aligned on 8-byte boundaries, netlink attributes are fundamentally only 4 byte aligned, and this cannot be changed for nla_data() that is passed up to userspace. As a result, the put_unaligned() macro needs to be used to set up potentially unaligned fields such as the xfrm_stats in copy_to_user_state() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-10-23 06:49:29 +02:00
Eric W. Biederman	ede2059dba	dst: Pass net into dst->output The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:03 -07:00
Eric W. Biederman	cf91a99daa	ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:02 -07:00
Eric W. Biederman	4ebdfba73c	dst: Pass a sk into .local_out For consistency with the other similar methods in the kernel pass a struct sock into the dst_ops .local_out method. Simplifying the socket passing case is needed a prequel to passing a struct net reference into .local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:55 -07:00
Eric W. Biederman	13206b6bff	net: Pass net into dst_output and remove dst_output_okfn Replace dst_output_okfn with dst_output Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:54 -07:00
Eric W. Biederman	3f5312ae62	xfrm: Only compute net once in xfrm_policy_queue_process Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:53 -07:00
Michael Rossberg	4e077237cf	xfrm: Fix state threshold configuration from userspace Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry timer (XFRMA_ETIMER_THRESH) of a state without having to set other attributes like replay counter and byte lifetime. Changing these other values while traffic flows will break the state. Signed-off-by: Michael Rossberg <michael.rossberg@tu-ilmenau.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-09-29 11:45:55 +02:00
Eric Dumazet	6f9c961546	inet: constify ip_route_output_flow() socket argument Very soon, TCP stack might call inet_csk_route_req(), which calls inet_csk_route_req() with an unlocked listener socket, so we need to make sure ip_route_output_flow() is not trying to change any field from its socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:37 -07:00
Eric W. Biederman	be10de0a32	netfilter: Add blank lines in callers of netfilter hooks In code review it was noticed that I had failed to add some blank lines in places where they are customarily used. Taking a second look at the code I have to agree blank lines would be nice so I have added them here. Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	0c4b51f005	netfilter: Pass net into okfn This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	29a26a5680	netfilter: Pass struct net into the netfilter hooks Pass a network namespace parameter into the netfilter hooks. At the call site of the netfilter hooks the path a packet is taking through the network stack is well known which allows the network namespace to be easily and reliabily. This allows the replacement of magic code like "dev_net(state->in?:state->out)" that appears at the start of most netfilter hooks with "state->net". In almost all cases the network namespace passed in is derived from the first network device passed in, guaranteeing those paths will not see any changes in practice. The exceptions are: xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm) ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp) ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp) ipv4/raw.c:raw_send_hdrinc() sock_net(sk) ipv6/ip6_output.c:ip6_xmit() sock_net(sk) ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev) ipv6/raw.c:raw6_send_hdrinc() sock_net(sk) br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev In all cases these exceptions seem to be a better expression for the network namespace the packet is being processed in then the historic "dev_net(in?in:out)". I am documenting them in case something odd pops up and someone starts trying to track down what happened. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	5a70649e0d	net: Merge dst_output and dst_output_sk Add a sock paramter to dst_output making dst_output_sk superfluous. Add a skb->sk parameter to all of the callers of dst_output Have the callers of dst_output_sk call dst_output. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:32 -07:00
Eric W. Biederman	a6568b2425	xfrm: Remove unused afinfo method init_dst Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:32 -07:00
Linus Torvalds	dd5cdb48ed	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Another merge window, another set of networking changes. I've heard rumblings that the lightweight tunnels infrastructure has been voted networking change of the year. But what do I know? 1) Add conntrack support to openvswitch, from Joe Stringer. 2) Initial support for VRF (Virtual Routing and Forwarding), which allows the segmentation of routing paths without using multiple devices. There are some semantic kinks to work out still, but this is a reasonably strong foundation. From David Ahern. 3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov. 4) Ignore route nexthops with a link down state in ipv6, just like ipv4. From Andy Gospodarek. 5) Remove spinlock from fast path of act_gact and act_mirred, from Eric Dumazet. 6) Document the DSA layer, from Florian Fainelli. 7) Add netconsole support to bcmgenet, systemport, and DSA. Also from Florian Fainelli. 8) Add Mellanox Switch Driver and core infrastructure, from Jiri Pirko. 9) Add support for "light weight tunnels", which allow for encapsulation and decapsulation without bearing the overhead of a full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of others. 10) Add Identifier Locator Addressing support for ipv6, from Tom Herbert. 11) Support fragmented SKBs in iwlwifi, from Johannes Berg. 12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia. 13) Add BQL support to 3c59x driver, from Loganaden Velvindron. 14) Stop using a zero TX queue length to mean that a device shouldn't have a qdisc attached, use an explicit flag instead. From Phil Sutter. 15) Use generic geneve netdevice infrastructure in openvswitch, from Pravin B Shelar. 16) Add infrastructure to avoid re-forwarding a packet in software that was already forwarded by a hardware switch. From Scott Feldman. 17) Allow AF_PACKET fanout function to be implemented in a bpf program, from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits) netfilter: nf_conntrack: make nf_ct_zone_dflt built-in netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled net: fec: clear receive interrupts before processing a packet ipv6: fix exthdrs offload registration in out_rt path xen-netback: add support for multicast control bgmac: Update fixed_phy_register() sock, diag: fix panic in sock_diag_put_filterinfo flow_dissector: Use 'const' where possible. flow_dissector: Fix function argument ordering dependency ixgbe: Resolve "initialized field overwritten" warnings ixgbe: Remove bimodal SR-IOV disabling ixgbe: Add support for reporting 2.5G link speed ixgbe: fix bounds checking in ixgbe_setup_tc for 82598 ixgbe: support for ethtool set_rxfh ixgbe: Avoid needless PHY access on copper phys ixgbe: cleanup to use cached mask value ixgbe: Remove second instance of lan_id variable ixgbe: use kzalloc for allocating one thing flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c ixgbe: Remove unused PCI bus types ...	2015-09-03 08:08:17 -07:00
Herbert Xu	de0ded77b9	ipsec: Replace seqniv with seqiv Now that seqniv is identical with seqiv we no longer need it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-08-17 16:53:42 +08:00
David Ahern	42a7b32b73	xfrm: Add oif to dst lookups Rules can be installed that direct route lookups to specific tables based on oif. Plumb the oif through the xfrm lookups so it gets set in the flow struct and passed to the resolver routines. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-08-11 12:41:35 +02:00
Andrzej Hajda	df367561ff	net/xfrm: use kmemdup rather than duplicating its implementation The patch was generated using fixed coccinelle semantic patch scripts/coccinelle/api/memdup.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-08-11 12:41:35 +02:00
Jakub Wilk	0c199a903d	xfrm: Fix a typo Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:28:36 -07:00
Linus Torvalds	e0456717e4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Add TX fast path in mac80211, from Johannes Berg. 2) Add TSO/GRO support to ibmveth, from Thomas Falcon 3) Move away from cached routes in ipv6, just like ipv4, from Martin KaFai Lau. 4) Lots of new rhashtable tests, from Thomas Graf. 5) Run ingress qdisc lockless, from Alexei Starovoitov. 6) Allow servers to fetch TCP packet headers for SYN packets of new connections, for fingerprinting. From Eric Dumazet. 7) Add mode parameter to pktgen, for testing receive. From Alexei Starovoitov. 8) Cache access optimizations via simplifications of build_skb(), from Alexander Duyck. 9) Move page frag allocator under mm/, also from Alexander. 10) Add xmit_more support to hv_netvsc, from KY Srinivasan. 11) Add a counter guard in case we try to perform endless reclassify loops in the packet scheduler. 12) Extern flow dissector to be programmable and use it in new "Flower" classifier. From Jiri Pirko. 13) AF_PACKET fanout rollover fixes, performance improvements, and new statistics. From Willem de Bruijn. 14) Add netdev driver for GENEVE tunnels, from John W Linville. 15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso. 16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet. 17) Add an ECN retry fallback for the initial TCP handshake, from Daniel Borkmann. 18) Add tail call support to BPF, from Alexei Starovoitov. 19) Add several pktgen helper scripts, from Jesper Dangaard Brouer. 20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa. 21) Favor even port numbers for allocation to connect() requests, and odd port numbers for bind(0), in an effort to help avoid ip_local_port_range exhaustion. From Eric Dumazet. 22) Add Cavium ThunderX driver, from Sunil Goutham. 23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata, from Alexei Starovoitov. 24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai. 25) Double TCP Small Queues default to 256K to accomodate situations like the XEN driver and wireless aggregation. From Wei Liu. 26) Add more entropy inputs to flow dissector, from Tom Herbert. 27) Add CDG congestion control algorithm to TCP, from Kenneth Klette Jonassen. 28) Convert ipset over to RCU locking, from Jozsef Kadlecsik. 29) Track and act upon link status of ipv4 route nexthops, from Andy Gospodarek. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits) bridge: vlan: flush the dynamically learned entries on port vlan delete bridge: multicast: add a comment to br_port_state_selection about blocking state net: inet_diag: export IPV6_V6ONLY sockopt stmmac: troubleshoot unexpected bits in des0 & des1 net: ipv4 sysctl option to ignore routes when nexthop link is down net: track link-status of ipv4 nexthops net: switchdev: ignore unsupported bridge flags net: Cavium: Fix MAC address setting in shutdown state drivers: net: xgene: fix for ACPI support without ACPI ip: report the original address of ICMP messages net/mlx5e: Prefetch skb data on RX net/mlx5e: Pop cq outside mlx5e_get_cqe net/mlx5e: Remove mlx5e_cq.sqrq back-pointer net/mlx5e: Remove extra spaces net/mlx5e: Avoid TX CQE generation if more xmit packets expected net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device ...	2015-06-24 16:49:49 -07:00
Linus Torvalds	44d21c3f3a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 4.2: API: - Convert RNG interface to new style. - New AEAD interface with one SG list for AD and plain/cipher text. All external AEAD users have been converted. - New asymmetric key interface (akcipher). Algorithms: - Chacha20, Poly1305 and RFC7539 support. - New RSA implementation. - Jitter RNG. - DRBG is now seeded with both /dev/random and Jitter RNG. If kernel pool isn't ready then DRBG will be reseeded when it is. - DRBG is now the default crypto API RNG, replacing krng. - 842 compression (previously part of powerpc nx driver). Drivers: - Accelerated SHA-512 for arm64. - New Marvell CESA driver that supports DMA and more algorithms. - Updated powerpc nx 842 support. - Added support for SEC1 hardware to talitos" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (292 commits) crypto: marvell/cesa - remove COMPILE_TEST dependency crypto: algif_aead - Temporarily disable all AEAD algorithms crypto: af_alg - Forbid the use internal algorithms crypto: echainiv - Only hold RNG during initialisation crypto: seqiv - Add compatibility support without RNG crypto: eseqiv - Offer normal cipher functionality without RNG crypto: chainiv - Offer normal cipher functionality without RNG crypto: user - Add CRYPTO_MSG_DELRNG crypto: user - Move cryptouser.h to uapi crypto: rng - Do not free default RNG when it becomes unused crypto: skcipher - Allow givencrypt to be NULL crypto: sahara - propagate the error on clk_disable_unprepare() failure crypto: rsa - fix invalid select for AKCIPHER crypto: picoxcell - Update to the current clk API crypto: nx - Check for bogus firmware properties crypto: marvell/cesa - add DT bindings documentation crypto: marvell/cesa - add support for Kirkwood and Dove SoCs crypto: marvell/cesa - add support for Orion SoCs crypto: marvell/cesa - add allhwsupport module parameter crypto: marvell/cesa - add support for all armada SoCs ...	2015-06-22 21:04:48 -07:00
Martin Willi	b08b6b7791	xfrm: Define ChaCha20-Poly1305 AEAD XFRM algo for IPsec users Signed-off-by: Martin Willi <martin@strongswan.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-06-04 15:04:55 +08:00
David S. Miller	dda922c831	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/phy/amd-xgbe-phy.c drivers/net/wireless/iwlwifi/Kconfig include/net/mac80211.h iwlwifi/Kconfig and mac80211.h were both trivial overlapping changes. The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and the bug fix that happened on the 'net' side is already integrated into the rest of the amd-xgbe driver. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-01 22:51:30 -07:00
David S. Miller	a74eab639e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-05-28 1) Remove xfrm_queue_purge as this is the same as skb_queue_purge. 2) Optimize policy and state walk. 3) Use a sane return code if afinfo registration fails. 4) Only check fori a acquire state if the state is not valid. 5) Remove a unnecessary NULL check before xfrm_pol_hold as it checks the input for NULL. 6) Return directly if the xfrm hold queue is empty, avoid to take a lock as it is nothing to do in this case. 7) Optimize the inexact policy search and allow for matching of policies with priority ~0U. All from Li RongQing. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-28 20:23:01 -07:00
Alexander Duyck	049f8e2e28	xfrm: Override skb->mark with tunnel->parm.i_key in xfrm_input This change makes it so that if a tunnel is defined we just use the mark from the tunnel instead of the mark from the skb header. By doing this we can avoid the need to set skb->mark inside of the tunnel receive functions. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-05-28 06:23:31 +02:00
Herbert Xu	69b0137f61	ipsec: Add IV generator information to xfrm_state This patch adds IV generator information to xfrm_state. This is currently obtained from our own list of algorithm descriptions. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-05-28 11:23:20 +08:00
Herbert Xu	165ecc6373	xfrm: Add IV generator information to xfrm_algo_desc This patch adds IV generator information for each AEAD and block cipher to xfrm_algo_desc. This will be used to access the new AEAD interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-05-28 11:23:19 +08:00
Herbert Xu	407d34ef29	xfrm: Always zero high-order sequence number bits As we're now always including the high bits of the sequence number in the IV generation process we need to ensure that they don't contain crap. This patch ensures that the high sequence bits are always zeroed so that we don't leak random data into the IV. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-05-21 06:56:23 +02:00
Li RongQing	8faf491e64	xfrm: optimise to search the inexact policy list The policies are organized into list by priority ascent of policy, so it is unnecessary to continue to loop the policy if the priority of current looped police is larger than or equal priority which is from the policy_bydst list. This allows to match policy with ~0U priority in inexact list too. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-05-18 10:31:56 +02:00
Ying Xue	9449c3cd90	net: make skb_dst_pop routine static As xfrm_output_one() is the only caller of skb_dst_pop(), we should make skb_dst_pop() localized. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 23:19:49 -04:00
Li RongQing	de2ad486cb	xfrm: move the checking for old xfrm_policy hold_queue to beginning if hold_queue of old xfrm_policy is NULL, return directly, then not need to run other codes, especially take the spin lock Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-05-05 09:27:01 +02:00
Li RongQing	586f2eb416	xfrm: remove the unnecessary checking before call xfrm_pol_hold xfrm_pol_hold will check its input with NULL Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-05-05 09:27:00 +02:00
Li RongQing	bdddbf6996	xfrm: fix a race in xfrm_state_lookup_byspi The returned xfrm_state should be hold before unlock xfrm_state_lock, otherwise the returned xfrm_state maybe be released. Fixes: c454997e6[{pktgen, xfrm} Introduce xfrm_state_lookup_byspi..] Cc: Fan Du <fan.du@intel.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Fan Du <fan.du@intel.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-04-29 13:53:46 +02:00
Li RongQing	dc0565ce6e	xfrm: slightly optimise xfrm_input Check x->km.state with XFRM_STATE_ACQ only when state is not XFRM_STAT_VALID, not everytime Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-04-24 12:17:16 +02:00
Li RongQing	f31e8d4f7b	xfrm: fix the return code when xfrm__register_afinfo failed If xfrm__register_afinfo failed since xfrm_*_afinfo[afinfo->family] had the value, return the -EEXIST, not -ENOBUFS Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-04-23 11:37:52 +02:00
Li RongQing	800777026e	xfrm: optimise the use of walk list header in xfrm_policy/state_walk The walk from input is the list header, and marked as dead, and will be skipped in loop. list_first_entry() can be used to return the true usable value from walk if walk is not empty Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-04-23 11:36:06 +02:00
Li RongQing	1ee5e6676b	xfrm: remove the xfrm_queue_purge definition The task of xfrm_queue_purge is same as skb_queue_purge, so remove it Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-04-23 11:35:47 +02:00
David S. Miller	87ffabb1f0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The dwmac-socfpga.c conflict was a case of a bug fix overlapping changes in net-next to handle an error pointer differently. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-14 15:44:14 -04:00
David S. Miller	9399bdcbb5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-04-09 1) Prohibit the use/abuse of the xfrm netlink interface on 32/64 bit compatibility tasks. We need a full compat layer before we can allow this. From Fan Du. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-09 14:41:47 -04:00
David Miller	7026b1ddb6	netfilter: Pass socket pointer down through okfn(). On the output paths in particular, we have to sometimes deal with two socket contexts. First, and usually skb->sk, is the local socket that generated the frame. And second, is potentially the socket used to control a tunneling socket, such as one the encapsulates using UDP. We do not want to disassociate skb->sk when encapsulating in order to fix this, because that would break socket memory accounting. The most extreme case where this can cause huge problems is an AF_PACKET socket transmitting over a vxlan device. We hit code paths doing checks that assume they are dealing with an ipv4 socket, but are actually operating upon the AF_PACKET one. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-07 15:25:55 -04:00
Alexey Dobriyan	68c11e98ef	xfrm: fix xfrm_input/xfrm_tunnel_check oops https://bugzilla.kernel.org/show_bug.cgi?id=95211 Commit `70be6c91c8` ("xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer") added check which dereferences ->outer_mode too early but larval SAs don't have this pointer set (yet). So check for tunnel stuff later. Mike Noordermeer reported this bug and patiently applied all the debugging. Technically this is remote-oops-in-interrupt-context type of thing. BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 IP: [<ffffffff8150dca2>] xfrm_input+0x3c2/0x5a0 ... [<ffffffff81500fc6>] ? xfrm4_esp_rcv+0x36/0x70 [<ffffffff814acc9a>] ? ip_local_deliver_finish+0x9a/0x200 [<ffffffff81471b83>] ? __netif_receive_skb_core+0x6f3/0x8f0 ... RIP [<ffffffff8150dca2>] xfrm_input+0x3c2/0x5a0 Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-04-07 07:52:27 +02:00
Jiri Benc	15e318bdc6	xfrm: simplify xfrm_address_t use In many places, the a6 field is typecasted to struct in6_addr. As the fields are in union anyway, just add in6_addr type to the union and get rid of the typecasting. Modifying the uapi header is okay, the union has still the same size. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-31 13:58:35 -04:00
David S. Miller	ca00942a81	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2015-03-16 1) Fix the network header offset in _decode_session6 when multiple IPv6 extension headers are present. From Hajime Tazaki. 2) Fix an interfamily tunnel crash. We set outer mode protocol too early and may dispatch to the wrong address family. Move the setting of the outer mode protocol behind the last accessing of the inner mode to fix the crash. 3) Most callers of xfrm_lookup() expect that dst_orig is released on error. But xfrm_lookup_route() may need dst_orig to handle certain error cases. So introduce a flag that tells what should be done in case of error. From Huaibin Wang. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-16 16:16:49 -04:00
Fan Du	74005991b7	xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host structure like xfrm_usersa_info or xfrm_userpolicy_info has different sizeof when compiled as 32bits and 64bits due to not appending pack attribute in their definition. This will result in broken SA and SP information when user trying to configure them through netlink interface. Inform user land about this situation instead of keeping silent, the upper test scripts would behave accordingly. Signed-off-by: Fan Du <fan.du@intel.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-03-03 10:10:16 +01:00
huaibin Wang	ac37e2515c	xfrm: release dst_orig in case of error in xfrm_lookup() dst_orig should be released on error. Function like __xfrm_route_forward() expects that behavior. Since a recent commit, xfrm_lookup() may also be called by xfrm_lookup_route(), which expects the opposite. Let's introduce a new flag (XFRM_LOOKUP_KEEP_DST_REF) to tell what should be done in case of error. Fixes: f92ee61982d("xfrm: Generate blackhole routes only from route lookup functions") Signed-off-by: huaibin Wang <huaibin.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-02-12 07:10:56 +01:00
Johannes Berg	053c095a82	netlink: make nlmsg_end() and genlmsg_end() void Contrary to common expectations for an "int" return, these functions return only a positive value -- if used correctly they cannot even return 0 because the message header will necessarily be in the skb. This makes the very common pattern of if (genlmsg_end(...) < 0) { ... } be a whole bunch of dead code. Many places also simply do return nlmsg_end(...); and the caller is expected to deal with it. This also commonly (at least for me) causes errors, because it is very common to write if (my_function(...)) /* error condition */ and if my_function() does "return nlmsg_end()" this is of course wrong. Additionally, there's not a single place in the kernel that actually needs the message length returned, and if anyone needs it later then it'll be very easy to just use skb->len there. Remove this, and make the functions void. This removes a bunch of dead code as described above. The patch adds lines because I did - return nlmsg_end(...); + nlmsg_end(...); + return 0; I could have preserved all the function's return values by returning skb->len, but instead I've audited all the places calling the affected functions and found that none cared. A few places actually compared the return value with <= 0 in dump functionality, but that could just be changed to < 0 with no change in behaviour, so I opted for the more efficient version. One instance of the error I've made numerous times now is also present in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't check for <0 or <=0 and thus broke out of the loop every single time. I've preserved this since it will (I think) have caused the messages to userspace to be formatted differently with just a single message for every SKB returned to userspace. It's possible that this isn't needed for the tools that actually use this, but I don't even know what they are so couldn't test that changing this behaviour would be acceptable. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-18 01:03:45 -05:00
Rickard Strandqvist	83400b990c	net: xfrm: xfrm_algo: Remove unused function Remove the function aead_entries() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:50:46 -05:00
David S. Miller	6db70e3e1d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-12-03 1) Fix a set but not used warning. From Fabian Frederick. 2) Currently we make sequence number values available to userspace only if we use ESN. Make the sequence number values also available for non ESN states. From Zhi Ding. 3) Remove socket policy hashing. We don't need it because socket policies are always looked up via a linked list. From Herbert Xu. 4) After removing socket policy hashing, we can use __xfrm_policy_link in xfrm_policy_insert. From Herbert Xu. 5) Add a lookup method for vti6 tunnels with wildcard endpoints. I forgot this when I initially implemented vti6. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 21:30:21 -05:00
Herbert Xu	12bfa8bdba	xfrm: Use __xfrm_policy_link in xfrm_policy_insert For a long time we couldn't actually use __xfrm_policy_link in xfrm_policy_insert because the latter wanted to do hashing at a specific position. Now that __xfrm_policy_link no longer does hashing it can now be safely used in xfrm_policy_insert to kill some duplicate code, finally reuniting general policies with socket policies. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-11-13 11:25:04 +01:00
Herbert Xu	53c2e285f9	xfrm: Do not hash socket policies Back in 2003 when I added policy expiration, I half-heartedly did a clean-up and renamed xfrm_sk_policy_link/xfrm_sk_policy_unlink to __xfrm_policy_link/__xfrm_policy_unlink, because the latter could be reused for all policies. I never actually got around to using __xfrm_policy_link for non-socket policies. Later on hashing was added to all xfrm policies, including socket policies. In fact, we don't need hashing on socket policies at all since they're always looked up via a linked list. This patch restores xfrm_sk_policy_link/xfrm_sk_policy_unlink as wrappers around __xfrm_policy_link/__xfrm_policy_unlink so that it's obvious we're dealing with socket policies. This patch also removes hashing from __xfrm_policy_link as for now it's only used by socket policies which do not need to be hashed. Ironically this will in fact allow us to use this helper for non-socket policies which I shall do later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-11-13 11:25:03 +01:00
dingzhi	f293a5e33e	xfrm: add XFRMA_REPLAY_VAL attribute to SA messages After this commit, the attribute XFRMA_REPLAY_VAL is added when no ESN replay value is defined. Thus sequence number values are always notified to userspace. Signed-off-by: dingzhi <zhi.ding@6wind.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-11-03 08:54:52 +01:00
Eric Dumazet	39bb5e6286	net: skb_fclone_busy() needs to detect orphaned skb Some drivers are unable to perform TX completions in a bound time. They instead call skb_orphan() Problem is skb_fclone_busy() has to detect this case, otherwise we block TCP retransmits and can freeze unlucky tcp sessions on mostly idle hosts. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `1f3279ae0c` ("tcp: avoid retransmits of TCP packets hanging in host queues") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 19:58:30 -04:00
Fabian Frederick	5c1e9f2c1f	xfrm: fix set but not used warning in xfrm_policy_queue_process() err was set but unused. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-10-27 07:16:46 +01:00
Florian Westphal	330966e501	net: make skb_gso_segment error handling more robust skb_gso_segment has three possible return values: 1. a pointer to the first segmented skb 2. an errno value (IS_ERR()) 3. NULL. This can happen when GSO is used for header verification. However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL and would oops when NULL is returned. Note that these call sites should never actually see such a NULL return value; all callers mask out the GSO bits in the feature argument. However, there have been issues with some protocol handlers erronously not respecting the specified feature mask in some cases. It is preferable to get 'have to turn off hw offloading, else slow' reports rather than 'kernel crashes'. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 12:38:13 -04:00
Eric Dumazet	d0bf4a9e92	net: cleanup and document skb fclone layout Lets use a proper structure to clearly document and implement skb fast clones. Then, we might experiment more easily alternative layouts. This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm, to stop leaking of implementation details. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 16:34:25 -04:00
David S. Miller	f5c7e1a47a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-09-25 1) Remove useless hash_resize_mutex in xfrm_hash_resize(). This mutex is used only there, but xfrm_hash_resize() can't be called concurrently at all. From Ying Xue. 2) Extend policy hashing to prefixed policies based on prefix lenght thresholds. From Christophe Gouault. 3) Make the policy hash table thresholds configurable via netlink. From Christophe Gouault. 4) Remove the maximum authentication length for AH. This was needed to limit stack usage. We switched already to allocate space, so no need to keep the limit. From Herbert Xu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-28 17:19:15 -04:00
David S. Miller	1f6d80358d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: arch/mips/net/bpf_jit.c drivers/net/can/flexcan.c Both the flexcan and MIPS bpf_jit conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-23 12:09:27 -04:00
Herbert Xu	689f1c9de2	ipsec: Remove obsolete MAX_AH_AUTH_LEN While tracking down the MAX_AH_AUTH_LEN crash in an old kernel I thought that this limit was rather arbitrary and we should just get rid of it. In fact it seems that we've already done all the work needed to remove it apart from actually removing it. This limit was there in order to limit stack usage. Since we've already switched over to allocating scratch space using kmalloc, there is no longer any need to limit the authentication length. This patch kills all references to it, including the BUG_ONs that led me here. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-09-18 10:54:36 +02:00
Steffen Klassert	b8c203b2d2	xfrm: Generate queueing routes only from route lookup functions Currently we genarate a queueing route if we have matching policies but can not resolve the states and the sysctl xfrm_larval_drop is disabled. Here we assume that dst_output() is called to kill the queued packets. Unfortunately this assumption is not true in all cases, so it is possible that these packets leave the system unwanted. We fix this by generating queueing routes only from the route lookup functions, here we can guarantee a call to dst_output() afterwards. Fixes: `a0073fe18e` ("xfrm: Add a state resolution packet queue") Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-09-16 10:08:49 +02:00
Steffen Klassert	f92ee61982	xfrm: Generate blackhole routes only from route lookup functions Currently we genarate a blackhole route route whenever we have matching policies but can not resolve the states. Here we assume that dst_output() is called to kill the balckholed packets. Unfortunately this assumption is not true in all cases, so it is possible that these packets leave the system unwanted. We fix this by generating blackhole routes only from the route lookup functions, here we can guarantee a call to dst_output() afterwards. Fixes: `2774c131b1` ("xfrm: Handle blackhole route creation via afinfo.") Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-09-16 10:08:40 +02:00
Florian Westphal	46cfd725c3	net: use kfree_skb_list() helper in more places Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:45 -07:00
Christophe Gouault	880a6fab8f	xfrm: configure policy hash table thresholds by netlink Enable to specify local and remote prefix length thresholds for the policy hash table via a netlink XFRM_MSG_NEWSPDINFO message. prefix length thresholds are specified by XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH optional attributes (struct xfrmu_spdhthresh). example: struct xfrmu_spdhthresh thresh4 = { .lbits = 0; .rbits = 24; }; struct xfrmu_spdhthresh thresh6 = { .lbits = 0; .rbits = 56; }; struct nlmsghdr hdr; struct nl_msg msg; msg = nlmsg_alloc(); hdr = nlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, XFRMA_SPD_IPV4_HTHRESH, sizeof(__u32), NLM_F_REQUEST); nla_put(msg, XFRMA_SPD_IPV4_HTHRESH, sizeof(thresh4), &thresh4); nla_put(msg, XFRMA_SPD_IPV6_HTHRESH, sizeof(thresh6), &thresh6); nla_send_auto(sk, msg); The numbers are the policy selector minimum prefix lengths to put a policy in the hash table. - lbits is the local threshold (source address for out policies, destination address for in and fwd policies). - rbits is the remote threshold (destination address for out policies, source address for in and fwd policies). The default values are: XFRMA_SPD_IPV4_HTHRESH: 32 32 XFRMA_SPD_IPV6_HTHRESH: 128 128 Dynamic re-building of the SPD is performed when the thresholds values are changed. The current thresholds can be read via a XFRM_MSG_GETSPDINFO request: the kernel replies to XFRM_MSG_GETSPDINFO requests by an XFRM_MSG_NEWSPDINFO message, with both attributes XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH. Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-09-02 13:37:56 +02:00
Christophe Gouault	b58555f176	xfrm: hash prefixed policies based on preflen thresholds The idea is an extension of the current policy hashing. Today only non-prefixed policies are stored in a hash table. This patch relaxes the constraints, and hashes policies whose prefix lengths are greater or equal to a configurable threshold. Each hash table (one per direction) maintains its own set of IPv4 and IPv6 thresholds (dbits4, sbits4, dbits6, sbits6), by default (32, 32, 128, 128). Example, if the output hash table is configured with values (16, 24, 56, 64): ip xfrm policy add dir out src 10.22.0.0/20 dst 10.24.1.0/24 ... => hashed ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.1.1/32 ... => hashed ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.0.0/16 ... => unhashed ip xfrm policy add dir out \ src 3ffe:304:124:2200::/60 dst 3ffe:304:124:2401::/64 ... => hashed ip xfrm policy add dir out \ src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2401::2/128 ... => hashed ip xfrm policy add dir out \ src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2400::/56 ... => unhashed The high order bits of the addresses (up to the threshold) are used to compute the hash key. Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-09-02 13:29:44 +02:00
Ying Xue	0244790c8a	xfrm: remove useless hash_resize_mutex locks In xfrm_state.c, hash_resize_mutex is defined as a local variable and only used in xfrm_hash_resize() which is declared as a work handler of xfrm.state_hash_work. But when the xfrm.state_hash_work work is put in the global workqueue(system_wq) with schedule_work(), the work will be really inserted in the global workqueue if it was not already queued, otherwise, it is still left in the same position on the the global workqueue. This means the xfrm_hash_resize() work handler is only executed once at any time no matter how many times its work is scheduled, that is, xfrm_hash_resize() is not called concurrently at all, so hash_resize_mutex is redundant for us. Cc: Christophe Gouault <christophe.gouault@6wind.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-08-29 11:40:03 +02:00
Ken Helias	1d023284c3	list: fix order of arguments for hlist_add_after(_rcu) All other add functions for lists have the new item as first argument and the position where it is added as second argument. This was changed for no good reason in this function and makes using it unnecessary confusing. The name was changed to hlist_add_behind() to cause unconverted code to generate a compile error instead of using the wrong parameter order. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Ken Helias <kenhelias@firemail.de> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [intel driver bits] Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:24 -07:00
Tobias Brunner	a0e5ef53aa	xfrm: Fix installation of AH IPsec SAs The SPI check introduced in `ea9884b3ac` was intended for IPComp SAs but actually prevented AH SAs from getting installed (depending on the SPI). Fixes: `ea9884b3ac` ("xfrm: check user specified spi for IPComp") Cc: Fan Du <fan.du@windriver.com> Signed-off-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-06-30 07:42:12 +02:00
Steffen Klassert	b7eea4545e	xfrm: Fix refcount imbalance in xfrm_lookup xfrm_lookup must return a dst_entry with a refcount for the caller. Git commit `1a1ccc96ab` ("xfrm: Remove caching of xfrm_policy_sk_bundles") removed this refcount for the socket policy case accidentally. This patch restores it and sets DST_NOCACHE flag to make sure that the dst_entry is freed when the refcount becomes null. Fixes: `1a1ccc96ab` ("xfrm: Remove caching of xfrm_policy_sk_bundles") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-06-26 07:52:42 +02:00
David S. Miller	c99f7abf0e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-03 23:32:12 -07:00
Michal Kubecek	21ee543edc	xfrm: fix race between netns cleanup and state expire notification The xfrm_user module registers its pernet init/exit after xfrm itself so that its net exit function xfrm_user_net_exit() is executed before xfrm_net_exit() which calls xfrm_state_fini() to cleanup the SA's (xfrm states). This opens a window between zeroing net->xfrm.nlsk pointer and deleting all xfrm_state instances which may access it (via the timer). If an xfrm state expires in this window, xfrm_exp_state_notify() will pass null pointer as socket to nlmsg_multicast(). As the notifications are called inside rcu_read_lock() block, it is sufficient to retrieve the nlsk socket with rcu_dereference() and check the it for null. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-03 16:07:44 -07:00
David S. Miller	65db611a5c	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-05-22 This is the last ipsec pull request before I leave for a three weeks vacation tomorrow. David, can you please take urgent ipsec patches directly into net/net-next during this time? I'll continue to run the ipsec/ipsec-next trees as soon as I'm back. 1) Simplify the xfrm audit handling, from Tetsuo Handa. 2) Codingstyle cleanup for xfrm_output, from abian Frederick. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 16:00:00 -04:00
Fabian Frederick	fc68086ce8	net/xfrm/xfrm_output.c: move EXPORT_SYMBOL Fix checkpatch warning: "WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable" Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-05-13 12:44:28 +02:00
David S. Miller	5f013c9bc7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 13:19:14 -04:00
WANG Cong	698365fa18	net: clean up snmp stats code commit `8f0ea0fe3a` (snmp: reduce percpu needs by 50%) reduced snmp array size to 1, so technically it doesn't have to be an array any more. What's more, after the following commit: commit `933393f58f` Date: Thu Dec 22 11:58:51 2011 -0600 percpu: Remove irqsafe_cpu_xxx variants We simply say that regular this_cpu use must be safe regardless of preemption and interrupt state. That has no material change for x86 and s390 implementations of this_cpu operations. However, arches that do not provide their own implementation for this_cpu operations will now get code generated that disables interrupts instead of preemption. probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after almost 3 years, no one complains. So, just convert the array to a single pointer and remove snmp_mib_init() and snmp_mib_free() as well. Cc: Christoph Lameter <cl@linux.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-07 16:06:05 -04:00
Eric W. Biederman	90f62cf30a	net: Use netlink_ns_capable to verify the permisions of netlink messages It is possible by passing a netlink socket to a more privileged executable and then to fool that executable into writing to the socket data that happens to be valid netlink message to do something that privileged executable did not intend to do. To keep this from happening replace bare capable and ns_capable calls with netlink_capable, netlink_net_calls and netlink_ns_capable calls. Which act the same as the previous calls except they verify that the opener of the socket had the desired permissions as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:54 -04:00
Tetsuo Handa	2e71029e2c	xfrm: Remove useless xfrm_audit struct. Commit `f1370cc4` "xfrm: Remove useless secid field from xfrm_audit." changed "struct xfrm_audit" to have either { audit_get_loginuid(current) / audit_get_sessionid(current) } or { INVALID_UID / -1 } pair. This means that we can represent "struct xfrm_audit" as "bool". This patch replaces "struct xfrm_audit" argument with "bool". Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-04-23 08:21:04 +02:00
Tetsuo Handa	f1370cc4a0	xfrm: Remove useless secid field from xfrm_audit. It seems to me that commit `ab5f5e8b` "[XFRM]: xfrm audit calls" is doing something strange at xfrm_audit_helper_usrinfo(). If secid != 0 && security_secid_to_secctx(secid) != 0, the caller calls audit_log_task_context() which basically does secid != 0 && security_secid_to_secctx(secid) == 0 case except that secid is obtained from current thread's context. Oh, what happens if secid passed to xfrm_audit_helper_usrinfo() was obtained from other thread's context? It might audit current thread's context rather than other thread's context if security_secid_to_secctx() in xfrm_audit_helper_usrinfo() failed for some reason. Then, are all the caller of xfrm_audit_helper_usrinfo() passing either secid obtained from current thread's context or secid == 0? It seems to me that they are. If I didn't miss something, we don't need to pass secid to xfrm_audit_helper_usrinfo() because audit_log_task_context() will obtain secid from current thread's context. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-04-22 10:47:53 +02:00
Eric Dumazet	aad88724c9	ipv4: add a sock pointer to dst->output() path. In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: `8f646c922d` ("vxlan: keep original skb ownership") Reported-by: lucien xin <lucien.xin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-15 13:47:15 -04:00
David S. Miller	04f58c8854	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: Documentation/devicetree/bindings/net/micrel-ks8851.txt net/core/netpoll.c The net/core/netpoll.c conflict is a bug fix in 'net' happening to code which is completely removed in 'net-next'. In micrel-ks8851.txt we simply have overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-25 20:29:20 -04:00
David S. Miller	995dca4ce9	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== One patch to rename a newly introduced struct. The rest is the rework of the IPsec virtual tunnel interface for ipv6 to support inter address family tunneling and namespace crossing. 1) Rename the newly introduced struct xfrm_filter to avoid a conflict with iproute2. From Nicolas Dichtel. 2) Introduce xfrm_input_afinfo to access the address family dependent tunnel callback functions properly. 3) Add and use a IPsec protocol multiplexer for ipv6. 4) Remove dst_entry caching. vti can lookup multiple different dst entries, dependent of the configured xfrm states. Therefore it does not make to cache a dst_entry. 5) Remove caching of flow informations. vti6 does not use the the tunnel endpoint addresses to do route and xfrm lookups. 6) Update the vti6 to use its own receive hook. 7) Remove the now unused xfrm_tunnel_notifier. This was used from vti and is replaced by the IPsec protocol multiplexer hooks. 8) Support inter address family tunneling for vti6. 9) Check if the tunnel endpoints of the xfrm state and the vti interface are matching and return an error otherwise. 10) Enable namespace crossing for vti devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-18 14:09:07 -04:00
Steffen Klassert	2f32b51b60	xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly IPv6 can be build as a module, so we need mechanism to access the address family dependent callback functions properly. Therefore we introduce xfrm_input_afinfo, similar to that what we have for the address family dependent part of policies and states. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-03-14 07:28:07 +01:00
Steffen Klassert	4a93f5095a	flowcache: Fix resource leaks on namespace exit. We leak an active timer, the hotcpu notifier and all allocated resources when we exit a namespace. Fix this by introducing a flow_cache_fini() function where we release the resources before we exit. Fixes: `ca925cf153` ("flowcache: Make flow cache name space aware") Reported-by: Jakub Kicinski <moorray3@wp.pl> Tested-by: Jakub Kicinski <moorray3@wp.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:31:18 -04:00
Nikolay Aleksandrov	52a4c6404f	selinux: add gfp argument to security_xfrm_policy_alloc and fix callers security_xfrm_policy_alloc can be called in atomic context so the allocation should be done with GFP_ATOMIC. Add an argument to let the callers choose the appropriate way. In order to do so a gfp argument needs to be added to the method xfrm_policy_alloc_security in struct security_operations and to the internal function selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic callers and leave GFP_KERNEL as before for the rest. The path that needed the gfp argument addition is: security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security -> all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) -> selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only) Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also add it to security_context_to_sid which is used inside and prior to this patch did only GFP_KERNEL allocation. So add gfp argument to security_context_to_sid and adjust all of its callers as well. CC: Paul Moore <paul@paul-moore.com> CC: Dave Jones <davej@redhat.com> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Fan Du <fan.du@windriver.com> CC: David S. Miller <davem@davemloft.net> CC: LSM list <linux-security-module@vger.kernel.org> CC: SELinux list <selinux@tycho.nsa.gov> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-03-10 08:30:02 +01:00
Nicolas Dichtel	870a2df4ca	xfrm: rename struct xfrm_filter iproute2 already defines a structure with that name, let's use another one to avoid any conflict. CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-03-07 08:12:37 +01:00
David S. Miller	67ddc87f16	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/ath/ath9k/recv.c drivers/net/wireless/mwifiex/pcie.c net/ipv6/sit.c The SIT driver conflict consists of a bug fix being done by hand in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper was created (netdev_alloc_pcpu_stats()) which takes care of this. The two wireless conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-05 20:32:02 -05:00
Steffen Klassert	3a9016f97f	xfrm: Fix unlink race when policies are deleted. When a policy is unlinked from the lists in thread context, the xfrm timer can fire before we can mark this policy as dead. So reinitialize the bydst hlist, then hlist_unhashed() will notice that this policy is not linked and will avoid a doulble unlink of that policy. Reported-by: Xianpeng Zhao <673321875@qq.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-26 09:52:02 +01:00
Steffen Klassert	70be6c91c8	xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer IPsec vti_rcv needs to remind the tunnel pointer to check it later at the vti_rcv_cb callback. So add this pointer to the IPsec common buffer, initialize it and check it to avoid transport state matching of a tunneled packet. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-25 07:04:17 +01:00
Steffen Klassert	3328715e6c	xfrm4: Add IPsec protocol multiplexer This patch add an IPsec protocol multiplexer. With this it is possible to add alternative protocol handlers as needed for IPsec virtual tunnel interfaces. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-25 07:04:16 +01:00
Steffen Klassert	cc9ab60e57	xfrm: Cleanup error handling of xfrm_state_clone The error pointer passed to xfrm_state_clone() is unchecked, so remove it and indicate an error by returning a null pointer. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-21 07:53:28 +01:00
Steffen Klassert	ee5c23176f	xfrm: Clone states properly on migration We loose a lot of information of the original state if we clone it with xfrm_state_clone(). In particular, there is no crypto algorithm attached if the original state uses an aead algorithm. This patch add the missing information to the clone state. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-20 14:30:10 +01:00
Steffen Klassert	8c0cba22e1	xfrm: Take xfrm_state_lock in xfrm_migrate_state_find A comment on xfrm_migrate_state_find() says that xfrm_state_lock is held. This is apparently not the case, but we need it to traverse through the state lists. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-20 14:30:04 +01:00
Steffen Klassert	35ea790d78	xfrm: Fix NULL pointer dereference on sub policy usage xfrm_state_sort() takes the unsorted states from the src array and stores them into the dst array. We try to get the namespace from the dst array which is empty at this time, so take the namespace from the src array instead. Fixes: `283bc9f35b` ("xfrm: Namespacify xfrm state/policy locks") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-20 14:29:58 +01:00
Steffen Klassert	1a1ccc96ab	xfrm: Remove caching of xfrm_policy_sk_bundles We currently cache socket policy bundles at xfrm_policy_sk_bundles. These cached bundles are never used. Instead we create and cache a new one whenever xfrm_lookup() is called on a socket policy. Most protocols cache the used routes to the socket, so let's remove the unused caching of socket policy bundles in xfrm. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-19 10:35:43 +01:00
Nicolas Dichtel	d3623099d3	ipsec: add support of limited SA dump The goal of this patch is to allow userland to dump only a part of SA by specifying a filter during the dump. The kernel is in charge to filter SA, this avoids to generate useless netlink traffic (it save also some cpu cycles). This is particularly useful when there is a big number of SA set on the system. Note that I removed the union in struct xfrm_state_walk to fix a problem on arm. struct netlink_callback->args is defined as a array of 6 long and the first long is used in xfrm code to flag the cb as initialized. Hence, we must have: sizeof(struct xfrm_state_walk) <= sizeof(long) * 5. With the union, it was false on arm (sizeof(struct xfrm_state_walk) was sizeof(long) * 7), due to the padding. In fact, whatever the arch is, this union seems useless, there will be always padding after it. Removing it will not increase the size of this struct (and reduce it on arm). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-17 07:18:19 +01:00
Horia Geanta	0f24558e91	xfrm: avoid creating temporary SA when there are no listeners In the case when KMs have no listeners, km_query() will fail and temporary SAs are garbage collected immediately after their allocation. This causes strain on memory allocation, leading even to OOM since temporary SA alloc/free cycle is performed for every packet and garbage collection does not keep up the pace. The sane thing to do is to make sure we have audience before temporary SA allocation. Signed-off-by: Horia Geanta <horia.geanta@freescale.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-13 07:40:30 +01:00
Fan Du	ca925cf153	flowcache: Make flow cache name space aware Inserting a entry into flowcache, or flushing flowcache should be based on per net scope. The reason to do so is flushing operation from fat netns crammed with flow entries will also making the slim netns with only a few flow cache entries go away in original implementation. Since flowcache is tightly coupled with IPsec, so it would be easier to put flow cache global parameters into xfrm namespace part. And one last thing needs to do is bumping flow cache genid, and flush flow cache should also be made in per net style. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-12 07:02:11 +01:00
Fan Du	01714109ea	xfrm: Don't prohibit AH from using ESN feature Clear checking when user try to use ESN through netlink keymgr for AH. As only ESP and AH support ESN feature according to RFC. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-12 07:02:11 +01:00
Linus Torvalds	4ba9920e5e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) BPF debugger and asm tool by Daniel Borkmann. 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann. 3) Correct reciprocal_divide and update users, from Hannes Frederic Sowa and Daniel Borkmann. 4) Currently we only have a "set" operation for the hw timestamp socket ioctl, add a "get" operation to match. From Ben Hutchings. 5) Add better trace events for debugging driver datapath problems, also from Ben Hutchings. 6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we have a small send and a previous packet is already in the qdisc or device queue, defer until TX completion or we get more data. 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko. 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel Borkmann. 9) Share IP header compression code between Bluetooth and IEEE802154 layers, from Jukka Rissanen. 10) Fix ipv6 router reachability probing, from Jiri Benc. 11) Allow packets to be captured on macvtap devices, from Vlad Yasevich. 12) Support tunneling in GRO layer, from Jerry Chu. 13) Allow bonding to be configured fully using netlink, from Scott Feldman. 14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can already get the TCI. From Atzm Watanabe. 15) New "Heavy Hitter" qdisc, from Terry Lam. 16) Significantly improve the IPSEC support in pktgen, from Fan Du. 17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom Herbert. 18) Add Proportional Integral Enhanced packet scheduler, from Vijay Subramanian. 19) Allow openvswitch to mmap'd netlink, from Thomas Graf. 20) Key TCP metrics blobs also by source address, not just destination address. From Christoph Paasch. 21) Support 10G in generic phylib. From Andy Fleming. 22) Try to short-circuit GRO flow compares using device provided RX hash, if provided. From Tom Herbert. The wireless and netfilter folks have been busy little bees too. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits) net/cxgb4: Fix referencing freed adapter ipv6: reallocate addrconf router for ipv6 address when lo device up fib_frontend: fix possible NULL pointer dereference rtnetlink: remove IFLA_BOND_SLAVE definition rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info qlcnic: update version to 5.3.55 qlcnic: Enhance logic to calculate msix vectors. qlcnic: Refactor interrupt coalescing code for all adapters. qlcnic: Update poll controller code path qlcnic: Interrupt code cleanup qlcnic: Enhance Tx timeout debugging. qlcnic: Use bool for rx_mac_learn. bonding: fix u64 division rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC sfc: Use the correct maximum TX DMA ring size for SFC9100 Add Shradha Shah as the sfc driver maintainer. net/vxlan: Share RX skb de-marking and checksum checks with ovs tulip: cleanup by using ARRAY_SIZE() ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called net/cxgb4: Don't retrieve stats during recovery ...	2014-01-25 11:17:34 -08:00
Linus Torvalds	6dd9158ae8	Merge git://git.infradead.org/users/eparis/audit Pull audit update from Eric Paris: "Again we stayed pretty well contained inside the audit system. Venturing out was fixing a couple of function prototypes which were inconsistent (didn't hurt anything, but we used the same value as an int, uint, u32, and I think even a long in a couple of places). We also made a couple of minor changes to when a couple of LSMs called the audit system. We hoped to add aarch64 audit support this go round, but it wasn't ready. I'm disappearing on vacation on Thursday. I should have internet access, but it'll be spotty. If anything goes wrong please be sure to cc rgb@redhat.com. He'll make fixing things his top priority" * git://git.infradead.org/users/eparis/audit: (50 commits) audit: whitespace fix in kernel-parameters.txt audit: fix location of __net_initdata for audit_net_ops audit: remove pr_info for every network namespace audit: Modify a set of system calls in audit class definitions audit: Convert int limit uses to u32 audit: Use more current logging style audit: Use hex_byte_pack_upper audit: correct a type mismatch in audit_syscall_exit() audit: reorder AUDIT_TTY_SET arguments audit: rework AUDIT_TTY_SET to only grab spin_lock once audit: remove needless switch in AUDIT_SET audit: use define's for audit version audit: documentation of audit= kernel parameter audit: wait_for_auditd rework for readability audit: update MAINTAINERS audit: log task info on feature change audit: fix incorrect set of audit_sock audit: print error message when fail to create audit socket audit: fix dangling keywords in audit_log_set_loginuid() output audit: log on errors from filter user rules ...	2014-01-23 18:08:10 -08:00
Aruna-Hewapathirane	63862b5bef	net: replace macros net_random and net_srandom with direct calls to prandom This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 15:15:25 -08:00
David S. Miller	aef2b45fe4	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Conflicts: net/xfrm/xfrm_policy.c Steffen Klassert says: ==================== This pull request has a merge conflict between commits `be7928d20b` ("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and `da7c224b1b` ("net: xfrm: xfrm_policy: silence compiler warning") from the net-next tree and commit `2f3ea9a95c` ("xfrm: checkpatch erros with inline keyword position") from the ipsec-next tree. The version from net-next can be used, like it is done in linux-next. 1) Checkpatch cleanups, from Weilong Chen. 2) Fix lockdep complaints when pktgen is used with IPsec, from Fan Du. 3) Update pktgen to allow any combination of IPsec transport/tunnel mode and AH/ESP/IPcomp type, from Fan Du. 4) Make pktgen_dst_metrics static, Fengguang Wu. 5) Compile fix for pktgen when CONFIG_XFRM is not set, from Fan Du. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 23:14:25 -08:00
Eric Paris	4440e85481	audit: convert all sessionid declaration to unsigned int Right now the sessionid value in the kernel is a combination of u32, int, and unsigned int. Just use unsigned int throughout. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:31:46 -05:00
Ying Xue	da7c224b1b	net: xfrm: xfrm_policy: silence compiler warning Fix below compiler warning: net/xfrm/xfrm_policy.c:1644:12: warning: ‘xfrm_dst_alloc_copy’ defined but not used [-Wunused-function] Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 22:45:26 -05:00
Daniel Borkmann	be7928d20b	net: xfrm: xfrm_policy: fix inline not at beginning of declaration Fix three warnings related to: net/xfrm/xfrm_policy.c:1644:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] net/xfrm/xfrm_policy.c:1656:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] net/xfrm/xfrm_policy.c:1668:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] Just removing the inline keyword is sufficient as the compiler will decide on its own about inlining or not. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:34:00 -05:00
Fan Du	c454997e68	{pktgen, xfrm} Introduce xfrm_state_lookup_byspi for pktgen Introduce xfrm_state_lookup_byspi to find user specified by custom from "pgset spi xxx". Using this scheme, any flow regardless its saddr/daddr could be transform by SA specified with configurable spi. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-03 07:29:12 +01:00
Fan Du	4ae770bf58	{pktgen, xfrm} Correct xfrm_state_lock usage in xfrm_stateonly_find Acquiring xfrm_state_lock in process context is expected to turn BH off, as this lock is also used in BH context, namely xfrm state timer handler. Otherwise it surprises LOCKDEP with below messages. [ 81.422781] pktgen: Packet Generator for packet performance testing. Version: 2.74 [ 81.725194] [ 81.725211] ========================================================= [ 81.725212] [ INFO: possible irq lock inversion dependency detected ] [ 81.725215] 3.13.0-rc2+ #92 Not tainted [ 81.725216] --------------------------------------------------------- [ 81.725218] kpktgend_0/2780 just changed the state of lock: [ 81.725220] (xfrm_state_lock){+.+...}, at: [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0 [ 81.725231] but this lock was taken by another, SOFTIRQ-safe lock in the past: [ 81.725232] (&(&x->lock)->rlock){+.-...} [ 81.725232] [ 81.725232] and interrupts could create inverse lock ordering between them. [ 81.725232] [ 81.725235] [ 81.725235] other info that might help us debug this: [ 81.725237] Possible interrupt unsafe locking scenario: [ 81.725237] [ 81.725238] CPU0 CPU1 [ 81.725240] ---- ---- [ 81.725241] lock(xfrm_state_lock); [ 81.725243] local_irq_disable(); [ 81.725244] lock(&(&x->lock)->rlock); [ 81.725246] lock(xfrm_state_lock); [ 81.725248] <Interrupt> [ 81.725249] lock(&(&x->lock)->rlock); [ 81.725251] [ 81.725251] * DEADLOCK * [ 81.725251] [ 81.725254] no locks held by kpktgend_0/2780. [ 81.725255] [ 81.725255] the shortest dependencies between 2nd lock and 1st lock: [ 81.725269] -> (&(&x->lock)->rlock){+.-...} ops: 8 { [ 81.725274] HARDIRQ-ON-W at: [ 81.725276] [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70 [ 81.725282] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725284] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725289] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290 [ 81.725292] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40 [ 81.725300] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0 [ 81.725303] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0 [ 81.725305] [<ffffffff8105a026>] irq_exit+0x96/0xc0 [ 81.725308] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60 [ 81.725313] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80 [ 81.725316] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30 [ 81.725329] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0 [ 81.725333] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0 [ 81.725338] IN-SOFTIRQ-W at: [ 81.725340] [<ffffffff8109a61d>] __lock_acquire+0x62d/0x1d70 [ 81.725342] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725344] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725347] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290 [ 81.725349] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40 [ 81.725352] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0 [ 81.725355] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0 [ 81.725358] [<ffffffff8105a026>] irq_exit+0x96/0xc0 [ 81.725360] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60 [ 81.725363] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80 [ 81.725365] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30 [ 81.725368] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0 [ 81.725370] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0 [ 81.725373] INITIAL USE at: [ 81.725375] [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70 [ 81.725385] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725388] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725390] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290 [ 81.725394] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40 [ 81.725398] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0 [ 81.725401] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0 [ 81.725404] [<ffffffff8105a026>] irq_exit+0x96/0xc0 [ 81.725407] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60 [ 81.725409] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80 [ 81.725412] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30 [ 81.725415] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0 [ 81.725417] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0 [ 81.725420] } [ 81.725421] ... key at: [<ffffffff8295b9c8>] __key.46349+0x0/0x8 [ 81.725445] ... acquired at: [ 81.725446] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725449] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725452] [<ffffffff816dc057>] __xfrm_state_delete+0x37/0x140 [ 81.725454] [<ffffffff816dc18c>] xfrm_state_delete+0x2c/0x50 [ 81.725456] [<ffffffff816dc277>] xfrm_state_flush+0xc7/0x1b0 [ 81.725458] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key] [ 81.725465] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key] [ 81.725468] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key] [ 81.725471] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0 [ 81.725476] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130 [ 81.725479] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10 [ 81.725482] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b [ 81.725484] [ 81.725486] -> (xfrm_state_lock){+.+...} ops: 11 { [ 81.725490] HARDIRQ-ON-W at: [ 81.725493] [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70 [ 81.725504] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725507] [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70 [ 81.725510] [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0 [ 81.725513] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key] [ 81.725516] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key] [ 81.725519] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key] [ 81.725522] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0 [ 81.725525] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130 [ 81.725527] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10 [ 81.725530] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b [ 81.725533] SOFTIRQ-ON-W at: [ 81.725534] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70 [ 81.725537] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725539] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725541] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0 [ 81.725544] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen] [ 81.725547] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen] [ 81.725550] [<ffffffff81078f84>] kthread+0xe4/0x100 [ 81.725555] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0 [ 81.725565] INITIAL USE at: [ 81.725567] [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70 [ 81.725569] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725572] [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70 [ 81.725574] [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0 [ 81.725576] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key] [ 81.725580] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key] [ 81.725583] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key] [ 81.725586] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0 [ 81.725589] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130 [ 81.725594] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10 [ 81.725597] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b [ 81.725599] } [ 81.725600] ... key at: [<ffffffff81cadef8>] xfrm_state_lock+0x18/0x50 [ 81.725606] ... acquired at: [ 81.725607] [<ffffffff810995c0>] check_usage_backwards+0x110/0x150 [ 81.725609] [<ffffffff81099e96>] mark_lock+0x196/0x2f0 [ 81.725611] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70 [ 81.725614] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725616] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725627] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0 [ 81.725629] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen] [ 81.725632] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen] [ 81.725635] [<ffffffff81078f84>] kthread+0xe4/0x100 [ 81.725637] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0 [ 81.725640] [ 81.725641] [ 81.725641] stack backtrace: [ 81.725645] CPU: 0 PID: 2780 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #92 [ 81.725647] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006 [ 81.725649] ffffffff82537b80 ffff880018199988 ffffffff8176af37 0000000000000007 [ 81.725652] ffff8800181999f0 ffff8800181999d8 ffffffff81099358 ffffffff82537b80 [ 81.725655] ffffffff81a32def ffff8800181999f4 0000000000000000 ffff880002cbeaa8 [ 81.725659] Call Trace: [ 81.725664] [<ffffffff8176af37>] dump_stack+0x46/0x58 [ 81.725667] [<ffffffff81099358>] print_irq_inversion_bug.part.42+0x1e8/0x1f0 [ 81.725670] [<ffffffff810995c0>] check_usage_backwards+0x110/0x150 [ 81.725672] [<ffffffff81099e96>] mark_lock+0x196/0x2f0 [ 81.725675] [<ffffffff810994b0>] ? check_usage_forwards+0x150/0x150 [ 81.725685] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70 [ 81.725691] [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90 [ 81.725694] [<ffffffff81089b38>] ? sched_clock_cpu+0xa8/0x120 [ 81.725697] [<ffffffff8109a31a>] ? __lock_acquire+0x32a/0x1d70 [ 81.725699] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0 [ 81.725702] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725704] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0 [ 81.725707] [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90 [ 81.725710] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725712] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0 [ 81.725715] [<ffffffff810971ec>] ? lock_release_holdtime.part.26+0x1c/0x1a0 [ 81.725717] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0 [ 81.725721] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen] [ 81.725724] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen] [ 81.725727] [<ffffffffa008ba71>] ? pktgen_thread_worker+0xb11/0x1880 [pktgen] [ 81.725729] [<ffffffff8109cf9d>] ? trace_hardirqs_on+0xd/0x10 [ 81.725733] [<ffffffff81775410>] ? _raw_spin_unlock_irq+0x30/0x40 [ 81.725745] [<ffffffff8151faa0>] ? e1000_clean+0x9d0/0x9d0 [ 81.725751] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60 [ 81.725753] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60 [ 81.725757] [<ffffffffa008af60>] ? mod_cur_headers+0x7f0/0x7f0 [pktgen] [ 81.725759] [<ffffffff81078f84>] kthread+0xe4/0x100 [ 81.725762] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170 [ 81.725765] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0 [ 81.725768] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170 Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-03 07:29:11 +01:00
Weilong Chen	2f3ea9a95c	xfrm: checkpatch erros with inline keyword position Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-02 07:48:51 +01:00
Weilong Chen	42054569f9	xfrm: fix checkpatch error Fix that "else should follow close brace '}'". Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-02 07:48:50 +01:00
Weilong Chen	02d0892f98	xfrm: checkpatch erros with space prohibited Fix checkpatch error "space prohibited xxx". Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-02 07:48:50 +01:00
Weilong Chen	3e94c2dcfd	xfrm: checkpatch errors with foo * bar This patch clean up some checkpatch errors like this: ERROR: "foo * bar" should be "foo bar" ERROR: "(foo)" should be "(foo *)" Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-02 07:48:49 +01:00
Weilong Chen	9b7a787d0d	xfrm: checkpatch errors with space This patch cleanup some space errors. Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-02 07:48:48 +01:00
Fan Du	776e9dd90c	xfrm: export verify_userspi_info for pkfey and netlink interface In order to check against valid IPcomp spi range, export verify_userspi_info for both pfkey and netlink interface. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-12-16 12:54:02 +01:00
Fan Du	ea9884b3ac	xfrm: check user specified spi for IPComp IPComp connection between two hosts is broken if given spi bigger than 0xffff. OUTSPI=0x87 INSPI=0x11112 ip xfrm policy update dst 192.168.1.101 src 192.168.1.109 dir out action allow \ tmpl dst 192.168.1.101 src 192.168.1.109 proto comp spi $OUTSPI ip xfrm policy update src 192.168.1.101 dst 192.168.1.109 dir in action allow \ tmpl src 192.168.1.101 dst 192.168.1.109 proto comp spi $INSPI ip xfrm state add src 192.168.1.101 dst 192.168.1.109 proto comp spi $INSPI \ comp deflate ip xfrm state add dst 192.168.1.101 src 192.168.1.109 proto comp spi $OUTSPI \ comp deflate tcpdump can capture outbound ping packet, but inbound packet is dropped with XfrmOutNoStates errors. It looks like spi value used for IPComp is expected to be 16bits wide only. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-12-16 12:54:00 +01:00
Steffen Klassert	5b8ef3415a	xfrm: Remove ancient sleeping when the SA is in acquire state We now queue packets to the policy if the states are not yet resolved, this replaces the ancient sleeping code. Also the sleeping can cause indefinite task hangs if the needed state does not get resolved. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-12-06 07:24:31 +01:00
Fan Du	283bc9f35b	xfrm: Namespacify xfrm state/policy locks By semantics, xfrm layer is fully name space aware, so will the locks, e.g. xfrm_state/pocliy_lock. Ensure exclusive access into state/policy link list for different name space with one global lock is not right in terms of semantics aspect at first place, as they are indeed mutually independent with each other, but also more seriously causes scalability problem. One practical scenario is on a Open Network Stack, more than hundreds of lxc tenants acts as routers within one host, a global xfrm_state/policy_lock becomes the bottleneck. But onces those locks are decoupled in a per-namespace fashion, locks contend is just with in specific name space scope, without causing additional SPD/SAD access delay for other name space. Also this patch improve scalability while as without changing original xfrm behavior. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-12-06 06:45:06 +01:00
Fan Du	8d549c4f5d	xfrm: Using the right namespace to migrate key info because the home agent could surely be run on a different net namespace other than init_net. The original behavior could lead into inconsistent of key info. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-12-06 06:45:05 +01:00
Fan Du	e682adf021	xfrm: Try to honor policy index if it's supplied by user xfrm code always searches for unused policy index for newly created policy regardless whether or not user space policy index hint supplied. This patch enables such feature so that using "ip xfrm ... index=xxx" can be used by user to set specific policy index. Currently this beahvior is broken, so this patch make it happen as expected. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-12-06 06:45:05 +01:00
Mathias Krause	0c7ddf36c2	net: move pskb_put() to core code This function has usage beside IPsec so move it to the core skbuff code. While doing so, give it some documentation and change its return type to 'unsigned char *' to be in line with skb_put(). Signed-off-by: Mathias Krause <mathias.krause@secunet.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-07 19:28:58 -05:00
David S. Miller	394efd19d5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 13:48:30 -05:00
David S. Miller	296c10639a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Conflicts: net/xfrm/xfrm_policy.c Minor merge conflict in xfrm_policy.c, consisting of overlapping changes which were trivial to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-02 02:13:48 -04:00
David S. Miller	c3fa32b976	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-23 16:49:34 -04:00
Steffen Klassert	4d53eff48b	xfrm: Don't queue retransmitted packets if the original is still on the host It does not make sense to queue retransmitted packets if the original packet is still in some queue of this host. So add a check to xdst_queue_output() and drop the packet if the original packet is not yet sent. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Eric Dumazet <edumazet@google.com>	2013-10-21 09:45:20 +02:00
Eric Dumazet	5cf4eb54c2	xfrm: use vmalloc_node() for percpu scratches scratches are per cpu, we can use vmalloc_node() for proper NUMA affinity. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-10-21 09:38:24 +02:00
Joe Perches	c1b1203d65	net: misc: Remove extern from function prototypes There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-19 19:12:11 -04:00
Michal Kubecek	12e3594698	xfrm: prevent ipcomp scratch buffer race condition In ipcomp_compress(), sortirq is enabled too early, allowing the per-cpu scratch buffer to be rewritten by ipcomp_decompress() (called on the same CPU in softirq context) between populating the buffer and copying the compressed data to the skb. v2: as pointed out by Steffen Klassert, if we also move the local_bh_disable() before reading the per-cpu pointers, we can get rid of get_cpu()/put_cpu(). v3: removed ipcomp_decompress part (as explained by Herbert Xu, it cannot be called from process context), get rid of cpu variable (thanks to Eric Dumazet) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-10-18 10:00:00 +02:00
Steffen Klassert	2bb53e2557	xfrm: check for a vaild skb in xfrm_policy_queue_process We might dreference a NULL pointer if the hold_queue is empty, so add a check to avoid this. Bug was introduced with git commit `a0073fe18` ("xfrm: Add a state resolution packet queue") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-10-08 10:49:51 +02:00
Steffen Klassert	e7d8f6cb2f	xfrm: Add refcount handling to queued policies We need to ensure that policies can't go away as long as the hold timer is armed, so take a refcont when we arm the timer and drop one if we delete it. Bug was introduced with git commit `a0073fe18` ("xfrm: Add a state resolution packet queue") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-10-08 10:49:45 +02:00

1 2 3 4 5 ...

978 Commits