linux

Commit Graph

Author	SHA1	Message	Date
Johannes Berg	8cb081746c	netlink: make validation more configurable for future strictness We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be everything. The current _strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to _parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:07:21 -04:00
Florian Westphal	bb9cd077e2	xfrm: remove unneeded export_symbols None of them have any external callers, make them static. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-23 07:42:20 +02:00
Florian Westphal	c53ac41e37	xfrm: remove decode_session indirection from afinfo_policy No external dependencies, might as well handle this directly. xfrm_afinfo_policy is now 40 bytes on x86_64. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-23 07:42:20 +02:00
Florian Westphal	2e8b4aa816	xfrm: remove init_path indirection from afinfo_policy handle this directly, its only used by ipv6. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-23 07:42:20 +02:00
Florian Westphal	f24ea52873	xfrm: remove tos indirection from afinfo_policy Only used by ipv4, we can read the fl4 tos value directly instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-23 07:42:20 +02:00
Florian Westphal	e54d152765	xfrm: kconfig: make xfrm depend on inet when CONFIG_INET is not enabled: net/xfrm/xfrm_output.c: In function ‘xfrm4_tunnel_encap_add’: net/xfrm/xfrm_output.c:234:2: error: implicit declaration of function ‘ip_select_ident’ [-Werror=implicit-function-declaration] ip_select_ident(dev_net(dst->dev), skb, NULL); XFRM only supports ipv4 and ipv6 so change dependency to INET and place user-visible options (pfkey sockets, migrate support and the like) under 'if INET' guard as well. Fixes: `1de7083006` ("xfrm: remove output2 indirection from xfrm_mode") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-15 11:09:20 +02:00
Florian Westphal	c9500d7b7d	xfrm: store xfrm_mode directly, not its address This structure is now only 4 bytes, so its more efficient to cache a copy rather than its address. No significant size difference in allmodconfig vmlinux. With non-modular kernel that has all XFRM options enabled, this series reduces vmlinux image size by ~11kb. All xfrm_mode indirections are gone and all modes are built-in. before (ipsec-next master): text data bss dec filename 21071494 7233140 11104324 39408958 vmlinux.master after this series: 21066448 7226772 11104324 39397544 vmlinux.patched With allmodconfig kernel, the size increase is only 362 bytes, even all the xfrm config options removed in this series are modular. before: text data bss dec filename 15731286 6936912 4046908 26715106 vmlinux.master after this series: 15731492 6937068 4046908 26715468 vmlinux Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:15:28 +02:00
Florian Westphal	4c145dce26	xfrm: make xfrm modes builtin after previous changes, xfrm_mode contains no function pointers anymore and all modules defining such struct contain no code except an init/exit functions to register the xfrm_mode struct with the xfrm core. Just place the xfrm modes core and remove the modules, the run-time xfrm_mode register/unregister functionality is removed. Before: text data bss dec filename 7523 200 2364 10087 net/xfrm/xfrm_input.o 40003 628 440 41071 net/xfrm/xfrm_state.o 15730338 6937080 4046908 26714326 vmlinux 7389 200 2364 9953 net/xfrm/xfrm_input.o 40574 656 440 41670 net/xfrm/xfrm_state.o 15730084 6937068 4046908 26714060 vmlinux The xfrm_mode_{transport,tunnel,beet} modules are gone. v2: replace CONFIG_INET6_XFRM_MODE_ IS_ENABLED guards with CONFIG_IPV6 ones rather than removing them. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:15:17 +02:00
Florian Westphal	733a5fac2f	xfrm: remove afinfo pointer from xfrm_mode Adds an EXPORT_SYMBOL for afinfo_get_rcu, as it will now be called from ipv6 in case of CONFIG_IPV6=m. This change has virtually no effect on vmlinux size, but it reduces afinfo size and allows followup patch to make xfrm modes const. v2: mark if (afinfo) tests as likely (Sabrina) re-fetch afinfo according to inner_mode in xfrm_prepare_input(). Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:15:09 +02:00
Florian Westphal	1de7083006	xfrm: remove output2 indirection from xfrm_mode similar to previous patch: no external module dependencies, so we can avoid the indirection by placing this in the core. This change removes the last indirection from xfrm_mode and the xfrm4\|6_mode_{beet,tunnel}.c modules contain (almost) no code anymore. Before: text data bss dec hex filename 3957 136 0 4093 ffd net/xfrm/xfrm_output.o 587 44 0 631 277 net/ipv4/xfrm4_mode_beet.o 649 32 0 681 2a9 net/ipv4/xfrm4_mode_tunnel.o 625 44 0 669 29d net/ipv6/xfrm6_mode_beet.o 599 32 0 631 277 net/ipv6/xfrm6_mode_tunnel.o After: text data bss dec hex filename 5359 184 0 5543 15a7 net/xfrm/xfrm_output.o 171 24 0 195 c3 net/ipv4/xfrm4_mode_beet.o 171 24 0 195 c3 net/ipv4/xfrm4_mode_tunnel.o 172 24 0 196 c4 net/ipv6/xfrm6_mode_beet.o 172 24 0 196 c4 net/ipv6/xfrm6_mode_tunnel.o v2: fold the encap_add functions into xfrm_prepare_output preserve (move) output2 comment (Sabrina) use x->outer_mode->encap, not inner fix a build breakage on ppc (kbuild robot) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:15:02 +02:00
Florian Westphal	b3284df1c8	xfrm: remove input2 indirection from xfrm_mode No external dependencies on any module, place this in the core. Increase is about 1800 byte for xfrm_input.o. The beet helpers get added to internal header, as they can be reused from xfrm_output.c in the next patch (kernel contains several copies of them in the xfrm{4,6}_mode_beet.c files). Before: text data bss dec filename 5578 176 2364 8118 net/xfrm/xfrm_input.o 1180 64 0 1244 net/ipv4/xfrm4_mode_beet.o 171 40 0 211 net/ipv4/xfrm4_mode_transport.o 1163 40 0 1203 net/ipv4/xfrm4_mode_tunnel.o 1083 52 0 1135 net/ipv6/xfrm6_mode_beet.o 172 40 0 212 net/ipv6/xfrm6_mode_ro.o 172 40 0 212 net/ipv6/xfrm6_mode_transport.o 1056 40 0 1096 net/ipv6/xfrm6_mode_tunnel.o After: text data bss dec filename 7373 200 2364 9937 net/xfrm/xfrm_input.o 587 44 0 631 net/ipv4/xfrm4_mode_beet.o 171 32 0 203 net/ipv4/xfrm4_mode_transport.o 649 32 0 681 net/ipv4/xfrm4_mode_tunnel.o 625 44 0 669 net/ipv6/xfrm6_mode_beet.o 172 32 0 204 net/ipv6/xfrm6_mode_ro.o 172 32 0 204 net/ipv6/xfrm6_mode_transport.o 599 32 0 631 net/ipv6/xfrm6_mode_tunnel.o v2: pass inner_mode to xfrm_inner_mode_encap_remove to fix AF_UNSPEC selector breakage (bisected by Benedict Wong) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:14:55 +02:00
Florian Westphal	303c5fab12	xfrm: remove xmit indirection from xfrm_mode There are only two versions (tunnel and transport). The ip/ipv6 versions are only differ in sizeof(iphdr) vs ipv6hdr. Place this in the core and use x->outer_mode->encap type to call the correct adjustment helper. Before: text data bss dec filename 15730311 `6937008` 4046908 26714227 vmlinux After: 15730428 `6937008` 4046908 26714344 vmlinux (about 117 byte increase) v2: use family from x->outer_mode, not inner Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:14:34 +02:00
Florian Westphal	0c620e97b3	xfrm: remove output indirection from xfrm_mode Same is input indirection. Only exception: we need to export xfrm_outer_mode_output for pktgen. Increases size of vmlinux by about 163 byte: Before: text data bss dec filename 15730208 6936948 4046908 26714064 vmlinux After: 15730311 `6937008` 4046908 26714227 vmlinux xfrm_inner_extract_output has no more external callers, make it static. v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca) add WARN_ON_ONCE for 'call AF_INET6 related output function, but CONFIG_IPV6=n' case. make xfrm_inner_extract_output static Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:14:28 +02:00
Florian Westphal	c2d305e510	xfrm: remove input indirection from xfrm_mode No need for any indirection or abstraction here, both functions are pretty much the same and quite small, they also have no external dependencies. xfrm_prepare_input can then be made static. With allmodconfig build, size increase of vmlinux is 25 byte: Before: text data bss dec filename 15730207 6936924 4046908 26714039 vmlinux After: 15730208 6936948 4046908 26714064 vmlinux v2: Fix INET_XFRM_MODE_TRANSPORT name in is-enabled test (Sabrina Dubroca) change copied comment to refer to transport and network header, not skb->{h,nh}, which don't exist anymore. (Sabrina) make xfrm_prepare_input static (Eyal Birger) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:14:21 +02:00
Florian Westphal	b45714b164	xfrm: prefer family stored in xfrm_mode struct Now that we have the family available directly in the xfrm_mode struct, we can use that and avoid one extra dereference. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:14:12 +02:00
Florian Westphal	b262a69582	xfrm: place af number into xfrm_mode struct This will be useful to know if we're supposed to decode ipv4 or ipv6. While at it, make the unregister function return void, all module_exit functions did just BUG(); there is never a point in doing error checks if there is no way to handle such error. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:13:46 +02:00
Martin Willi	025c65e119	xfrm: Honor original L3 slave device in xfrmi policy lookup If an xfrmi is associated to a vrf layer 3 master device, xfrm_policy_check() fails after traffic decapsulation. The input interface is replaced by the layer 3 master device, and hence xfrmi_decode_session() can't match the xfrmi anymore to satisfy policy checking. Extend ingress xfrmi lookup to honor the original layer 3 slave device, allowing xfrm interfaces to operate within a vrf domain. Fixes: `f203b76d78` ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-03-27 16:14:05 +01:00
Cong Wang	dbb2483b2a	xfrm: clean up xfrm protocol checks In commit `6a53b75932` ("xfrm: check id proto in validate_tmpl()") I introduced a check for xfrm protocol, but according to Herbert IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so it should be removed from validate_tmpl(). And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific protocols, this is why xfrm_state_flush() could still miss IPPROTO_ROUTING, which leads that those entries are left in net->xfrm.state_all before exit net. Fix this by replacing IPSEC_PROTO_ANY with zero. This patch also extracts the check from validate_tmpl() to xfrm_id_proto_valid() and uses it in parse_ipsecrequest(). With this, no other protocols should be added into xfrm. Fixes: `6a53b75932` ("xfrm: check id proto in validate_tmpl()") Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-03-26 08:35:36 +01:00
Boris Pismenny	65fd2c2afa	xfrm: gso partial offload support This patch introduces support for gso partial ESP offload. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-03-24 09:48:38 +01:00
Thomas Gleixner	671422b220	xfrm: Replace hrtimer tasklet with softirq hrtimer Switch the timer to HRTIMER_MODE_SOFT, which executed the timer callback in softirq context and remove the hrtimer_tasklet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lkml.kernel.org/r/20190301224821.29843-3-bigeasy@linutronix.de	2019-03-22 14:35:31 +01:00
Paolo Abeni	4bd97d51a5	net: dev: rename queue selection helpers. With the following patches, we are going to use __netdev_pick_tx() in many modules. Rename it to netdev_pick_tx(), to make it clear is a public API. Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(), to avoid name clashes. Suggested-by: Eric Dumazet <edumazet@google.com> Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-20 11:18:54 -07:00
Steffen Klassert	bfc01ddff2	Revert "net: xfrm: Add '_rcu' tag for rcu protected pointer in netns_xfrm" This reverts commit `f10e0010fa`. This commit was just wrong. It caused a lot of syzbot warnings, so just revert it. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-03-20 17:56:13 +01:00
Su Yanjun	f10e0010fa	net: xfrm: Add '_rcu' tag for rcu protected pointer in netns_xfrm For rcu protected pointers, we'd better add '__rcu' for them. Once added '__rcu' tag for rcu protected pointer, the sparse tool reports warnings. net/xfrm/xfrm_user.c:1198:39: sparse: expected struct sock sk net/xfrm/xfrm_user.c:1198:39: sparse: got struct sock [noderef] <asn:4> nlsk [...] So introduce a new wrapper function of nlmsg_unicast to handle type conversions. This patch also fixes a direct access of a rcu protected socket. Fixes: be33690d8fcf("[XFRM]: Fix aevent related crash") Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-03-08 13:17:31 +01:00
YueHaibing	b805d78d30	xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink UBSAN report this: UBSAN: Undefined behaviour in net/xfrm/xfrm_policy.c:1289:24 index 6 is out of range for type 'unsigned int [6]' CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.162-514.55.6.9.x86_64+ #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 0000000000000000 1466cf39b41b23c9 ffff8801f6b07a58 ffffffff81cb35f4 0000000041b58ab3 ffffffff83230f9c ffffffff81cb34e0 ffff8801f6b07a80 ffff8801f6b07a20 1466cf39b41b23c9 ffffffff851706e0 ffff8801f6b07ae8 Call Trace: <IRQ> [<ffffffff81cb35f4>] __dump_stack lib/dump_stack.c:15 [inline] <IRQ> [<ffffffff81cb35f4>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51 [<ffffffff81d94225>] ubsan_epilogue+0x12/0x8f lib/ubsan.c:164 [<ffffffff81d954db>] __ubsan_handle_out_of_bounds+0x16e/0x1b2 lib/ubsan.c:382 [<ffffffff82a25acd>] __xfrm_policy_unlink+0x3dd/0x5b0 net/xfrm/xfrm_policy.c:1289 [<ffffffff82a2e572>] xfrm_policy_delete+0x52/0xb0 net/xfrm/xfrm_policy.c:1309 [<ffffffff82a3319b>] xfrm_policy_timer+0x30b/0x590 net/xfrm/xfrm_policy.c:243 [<ffffffff813d3927>] call_timer_fn+0x237/0x990 kernel/time/timer.c:1144 [<ffffffff813d8e7e>] __run_timers kernel/time/timer.c:1218 [inline] [<ffffffff813d8e7e>] run_timer_softirq+0x6ce/0xb80 kernel/time/timer.c:1401 [<ffffffff8120d6f9>] __do_softirq+0x299/0xe10 kernel/softirq.c:273 [<ffffffff8120e676>] invoke_softirq kernel/softirq.c:350 [inline] [<ffffffff8120e676>] irq_exit+0x216/0x2c0 kernel/softirq.c:391 [<ffffffff82c5edab>] exiting_irq arch/x86/include/asm/apic.h:652 [inline] [<ffffffff82c5edab>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926 [<ffffffff82c5c985>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:735 <EOI> [<ffffffff81188096>] ? native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:52 [<ffffffff810834d7>] arch_safe_halt arch/x86/include/asm/paravirt.h:111 [inline] [<ffffffff810834d7>] default_idle+0x27/0x430 arch/x86/kernel/process.c:446 [<ffffffff81085f05>] arch_cpu_idle+0x15/0x20 arch/x86/kernel/process.c:437 [<ffffffff8132abc3>] default_idle_call+0x53/0x90 kernel/sched/idle.c:92 [<ffffffff8132b32d>] cpuidle_idle_call kernel/sched/idle.c:156 [inline] [<ffffffff8132b32d>] cpu_idle_loop kernel/sched/idle.c:251 [inline] [<ffffffff8132b32d>] cpu_startup_entry+0x60d/0x9a0 kernel/sched/idle.c:299 [<ffffffff8113e119>] start_secondary+0x3c9/0x560 arch/x86/kernel/smpboot.c:245 The issue is triggered as this: xfrm_add_policy -->verify_newpolicy_info //check the index provided by user with XFRM_POLICY_MAX //In my case, the index is 0x6E6BB6, so it pass the check. -->xfrm_policy_construct //copy the user's policy and set xfrm_policy_timer -->xfrm_policy_insert --> __xfrm_policy_link //use the orgin dir, in my case is 2 --> xfrm_gen_index //generate policy index, there is 0x6E6BB6 then xfrm_policy_timer be fired xfrm_policy_timer --> xfrm_policy_id2dir //get dir from (policy index & 7), in my case is 6 --> xfrm_policy_delete --> __xfrm_policy_unlink //access policy_count[dir], trigger out of range access Add xfrm_policy_id2dir check in verify_newpolicy_info, make sure the computed dir is valid, to fix the issue. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: `e682adf021` ("xfrm: Try to honor policy index if it's supplied by user") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-03-01 12:17:41 +01:00
Tobias Brunner	660899ddf0	xfrm: Fix inbound traffic via XFRM interfaces across network namespaces After moving an XFRM interface to another namespace it stays associated with the original namespace (net in `struct xfrm_if` and the list keyed with `xfrmi_net_id`), allowing processes in the new namespace to use SAs/policies that were created in the original namespace. For instance, this allows a keying daemon in one namespace to establish IPsec SAs for other namespaces without processes there having access to the keys or IKE credentials. This worked fine for outbound traffic, however, for inbound traffic the lookup for the interfaces and the policies used the incorrect namespace (the one the XFRM interface was moved to). Fixes: `f203b76d78` ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-02-18 10:58:54 +01:00
Cong Wang	f75a2804da	xfrm: destroy xfrm_state synchronously on net exit path xfrm_state_put() moves struct xfrm_state to the GC list and schedules the GC work to clean it up. On net exit call path, xfrm_state_flush() is called to clean up and xfrm_flush_gc() is called to wait for the GC work to complete before exit. However, this doesn't work because one of the ->destructor(), ipcomp_destroy(), schedules the same GC work again inside the GC work. It is hard to wait for such a nested async callback. This is also why syzbot still reports the following warning: WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351 ... ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153 cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551 process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153 worker_thread+0x143/0x14a0 kernel/workqueue.c:2296 kthread+0x357/0x430 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 In fact, it is perfectly fine to bypass GC and destroy xfrm_state synchronously on net exit call path, because it is in process context and doesn't need a work struct to do any blocking work. This patch introduces xfrm_state_put_sync() which simply bypasses GC, and lets its callers to decide whether to use this synchronous version. On net exit path, xfrm_state_fini() and xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is blocking, it can use xfrm_state_put_sync() directly too. Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to reflect this change. Fixes: `b48c05ab5d` ("xfrm: Fix warning in xfrm6_tunnel_net_exit.") Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-02-05 06:29:20 +01:00
Benedict Wong	e2612cd496	xfrm: Make set-mark default behavior backward compatible Fixes `9b42c1f179`, which changed the default route lookup behavior for tunnel mode SAs in the outbound direction to use the skb mark, whereas previously mark=0 was used if the output mark was unspecified. In mark-based routing schemes such as Android’s, this change in default behavior causes routing loops or lookup failures. This patch restores the default behavior of using a 0 mark while still incorporating the skb mark if the SET_MARK (and SET_MARK_MASK) is specified. Tested with additions to Android's kernel unit test suite: https://android-review.googlesource.com/c/kernel/tests/+/860150 Fixes: `9b42c1f179` ("xfrm: Extend the output_mark to support input direction and masking") Signed-off-by: Benedict Wong <benedictwong@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-01-16 13:10:55 +01:00
Florian Westphal	35e6103861	xfrm: refine validation of template and selector families The check assumes that in transport mode, the first templates family must match the address family of the policy selector. Syzkaller managed to build a template using MODE_ROUTEOPTIMIZATION, with ipv4-in-ipv6 chain, leading to following splat: BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x1db/0x1854 Read of size 4 at addr ffff888063e57aa0 by task a.out/2050 xfrm_state_find+0x1db/0x1854 xfrm_tmpl_resolve+0x100/0x1d0 xfrm_resolve_and_create_bundle+0x108/0x1000 [..] Problem is that addresses point into flowi4 struct, but xfrm_state_find treats them as being ipv6 because it uses templ->encap_family is used (AF_INET6 in case of reproducer) rather than family (AF_INET). This patch inverts the logic: Enforce 'template family must match selector' EXCEPT for tunnel and BEET mode. In BEET and Tunnel mode, xfrm_tmpl_resolve_one will have remote/local address pointers changed to point at the addresses found in the template, rather than the flowi ones, so no oob read will occur. Reported-by: 3ntr0py1337@gmail.com Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-01-10 09:12:48 +01:00
Florian Westphal	12750abad5	xfrm: policy: fix infinite loop when merging src-nodes With very small change to test script we can trigger softlockup due to bogus assignment of 'p' (policy to be examined) on restart. Previously the two to-be-merged nodes had same address/prefixlength pair, so no erase/reinsert was necessary, we only had to append the list from node a to b. If prefix lengths are different, the node has to be deleted and re-inserted into the tree, with the updated prefix length. This was broken; due to bogus update to 'p' this loops forever. Add a 'restart' label and use that instead. While at it, don't perform the unneeded reinserts of the policies that are already sorted into the 'new' node. A previous patch in this series made xfrm_policy_inexact_list_reinsert() use the relative position indicator to sort policies according to age in case priorities are identical. Fixes: `6ac098b2a9` ("xfrm: policy: add 2nd-level saddr trees for inexact policies") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-01-09 13:58:23 +01:00
Florian Westphal	1d38900cb8	xfrm: policy: fix reinsertion on node merge "newpos" has wrong scope. It must be NULL on each iteration of the loop. Otherwise, when policy is to be inserted at the start, we would instead insert at point found by the previous loop-iteration instead. Also, we need to unlink the policy before we reinsert it to the new node, else we can get next-points-to-self loops. Because policies are only ordered by priority it is irrelevant which policy is "more recent" except when two policies have same priority. (the more recent one is placed after the older one). In these cases, we can use the ->pos id number to know which one is the 'older': the higher the id, the more recent the policy. So we only need to unlink all policies from the node that is about to be removed, and insert them to the replacement node. Fixes: `9cf545ebd5` ("xfrm: policy: store inexact policies in a tree ordered by destination address") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-01-09 13:58:23 +01:00
Florian Westphal	1548bc4e05	xfrm: policy: delete inexact policies from inexact list on hash rebuild An xfrm hash rebuild has to reset the inexact policy list before the policies get re-inserted: A change of hash thresholds will result in policies to get moved from inexact tree to the policy hash table. If the thresholds are increased again later, they get moved from hash table to inexact tree. We must unlink all policies from the inexact tree before re-insertion. Otherwise 'migrate' may find policies that are in main hash table a second time, when it searches the inexact lists. Furthermore, re-insertion without deletion can cause elements ->next to point back to itself, causing soft lockups or double-frees. Reported-by: syzbot+9d971dd21eb26567036b@syzkaller.appspotmail.com Fixes: `9cf545ebd5` ("xfrm: policy: store inexact policies in a tree ordered by destination address") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-01-09 13:58:23 +01:00
Florian Westphal	7a474c3658	xfrm: policy: increment xfrm_hash_generation on hash rebuild Hash rebuild will re-set all the inexact entries, then re-insert them. Lookups that can occur in parallel will therefore not find any policies. This was safe when lookups were still guarded by rwlock. After rcu-ification, lookups check the hash_generation seqcount to detect when a hash resize takes place. Hash rebuild missed the needed increment. Hash resizes and hash rebuilds cannot occur in parallel (both acquire hash_resize_mutex), so just increment xfrm_hash_generation, like resize. Fixes: `a7c44247f7` ("xfrm: policy: make xfrm_policy_lookup_bytype lockless") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-01-09 13:58:23 +01:00
Florian Westphal	355b00d1e1	xfrm: policy: use hlist rcu variants on inexact insert, part 2 This function was modeled on the 'exact' insert one, which did not use the rcu variant either. When I fixed the 'exact' insert I forgot to propagate this to my development tree, so the inexact variant retained the bug. Fixes: `9cf545ebd5` ("xfrm: policy: store inexact policies in a tree ordered by destination address") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-01-09 13:58:23 +01:00
David S. Miller	2be09de7d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 11:53:36 -08:00
David S. Miller	ac68a3d3c3	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2018-12-20 Two last patches for this release cycle: 1) Remove an unused variable in xfrm_policy_lookup_bytype(). From YueHaibing. 2) Fix possible infinite loop in __xfrm6_tunnel_alloc_spi(). Also from YueHaibing. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:21:47 -08:00
Florian Westphal	4165079ba3	net: switch secpath to use skb extension infrastructure Remove skb->sp and allocate secpath storage via extension infrastructure. This also reduces sk_buff by 8 bytes on x86_64. Total size of allyesconfig kernel is reduced slightly, as there is less inlined code (one conditional atomic op instead of two on skb_clone). No differences in throughput in following ipsec performance tests: - transport mode with aes on 10GB link - tunnel mode between two network namespaces with aes and null cipher Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:38 -08:00
Florian Westphal	a84e3f5333	xfrm: prefer secpath_set over secpath_dup secpath_set is a wrapper for secpath_dup that will not perform an allocation if the secpath attached to the skb has a reference count of one, i.e., it doesn't need to be COW'ed. Also, secpath_dup doesn't attach the secpath to the skb, it leaves this to the caller. Use secpath_set in places that immediately assign the return value to skb. This allows to remove skb->sp without touching these spots again. secpath_dup can eventually be removed in followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:38 -08:00
Florian Westphal	26912e3756	xfrm: use secpath_exist where applicable Will reduce noise when skb->sp is removed later in this series. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	2294be0f11	net: use skb_sec_path helper in more places skb_sec_path gains 'const' qualifier to avoid xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type same reasoning as previous conversions: Won't need to touch these spots anymore when skb->sp is removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	0ca64da128	xfrm: change secpath_set to return secpath struct, not error value It can only return 0 (success) or -ENOMEM. Change return value to a pointer to secpath struct. This avoids direct access to skb->sp: err = secpath_set(skb); if (!err) .. skb->sp-> ... Becomes: sp = secpath_set(skb) if (!sp) .. sp-> .. This reduces noise in followup patch which is going to remove skb->sp. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
YueHaibing	cc4acb1b6a	xfrm: policy: remove set but not used variable 'priority' Fixes gcc '-Wunused-but-set-variable' warning: net/xfrm/xfrm_policy.c: In function 'xfrm_policy_lookup_bytype': net/xfrm/xfrm_policy.c:2079:6: warning: variable 'priority' set but not used [-Wunused-but-set-variable] It not used since commit `6be3b0db6d` ("xfrm: policy: add inexact policy search tree infrastructure") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-12-19 12:24:43 +01:00
David S. Miller	fde9cd69a5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-12-18 1) Fix error return code in xfrm_output_one() when no dst_entry is attached to the skb. From Wei Yongjun. 2) The xfrm state hash bucket count reported to userspace is off by one. Fix from Benjamin Poirier. 3) Fix NULL pointer dereference in xfrm_input when skb_dst_force clears the dst_entry. 4) Fix freeing of xfrm states on acquire. We use a dedicated slab cache for the xfrm states now, so free it properly with kmem_cache_free. From Mathias Krause. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-18 11:43:26 -08:00
Florian Westphal	88584c30e3	xfrm: policy: fix policy hash rebuild Dan Carpenter reports following static checker warning: net/xfrm/xfrm_policy.c:1316 xfrm_hash_rebuild() warn: 'dir' is out of bounds '3' vs '2' \| 1280 /* reset the bydst and inexact table in all directions / \| 1281 xfrm_hash_reset_inexact_table(net); \| 1282 \| 1283 for (dir = 0; dir < XFRM_POLICY_MAX; dir++) { \| ^^^^^^^^^^^^^^^^^^^^^ \|dir == XFRM_POLICY_MAX at the end of this loop. \| 1304 / re-insert all policies by order of creation / \| 1305 list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) { [..] \| 1314 xfrm_policy_id2dir(policy->index)); \| 1315 if (!chain) { \| 1316 void p = xfrm_policy_inexact_insert(policy, dir, 0); Fix this by updating 'dir' based on current policy. Otherwise, the inexact policies won't be found anymore during lookup, as they get hashed to a bogus bin. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `cc1bb845ad` ("xfrm: policy: return NULL when inexact search needed") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-28 07:05:48 +01:00
Mathias Krause	4a135e5389	xfrm_user: fix freeing of xfrm states on acquire Commit `565f0fa902` ("xfrm: use a dedicated slab cache for struct xfrm_state") moved xfrm state objects to use their own slab cache. However, it missed to adapt xfrm_user to use this new cache when freeing xfrm states. Fix this by introducing and make use of a new helper for freeing xfrm_state objects. Fixes: `565f0fa902` ("xfrm: use a dedicated slab cache for struct xfrm_state") Reported-by: Pan Bian <bianpan2016@163.com> Cc: <stable@vger.kernel.org> # v4.18+ Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-23 07:51:32 +01:00
Steffen Klassert	0152eee6fc	xfrm: Fix NULL pointer dereference in xfrm_input when skb_dst_force clears the dst_entry. Since commit `222d7dbd25` ("net: prevent dst uses after free") skb_dst_force() might clear the dst_entry attached to the skb. The xfrm code doesn't expect this to happen, so we crash with a NULL pointer dereference in this case. Fix it by checking skb_dst(skb) for NULL after skb_dst_force() and drop the packet in case the dst_entry was cleared. We also move the skb_dst_force() to a codepath that is not used when the transformation was offloaded, because in this case we don't have a dst_entry attached to the skb. The output and forwarding path was already fixed by commit `9e14379378` ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.") Fixes: `222d7dbd25` ("net: prevent dst uses after free") Reported-by: Jean-Philippe Menil <jpmenil@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-22 10:09:39 +01:00
Florian Westphal	39aa6928d4	xfrm: policy: fix netlink/pf_key policy lookups Colin Ian King says: Static analysis with CoverityScan found a potential issue [..] It seems that pointer pol is set to NULL and then a check to see if it is non-null is used to set pol to tmp; howeverm this check is always going to be false because pol is always NULL. Fix this and update test script to catch this. Updated script only: ./xfrm_policy.sh ; echo $? RTNETLINK answers: No such file or directory FAIL: ip -net ns3 xfrm policy get src 10.0.1.0/24 dst 10.0.2.0/24 dir out RTNETLINK answers: No such file or directory [..] PASS: policy before exception matches PASS: ping to .254 bypassed ipsec tunnel PASS: direct policy matches PASS: policy matches 1 Fixes: `6be3b0db6d` ("xfrm: policy: add inexact policy search tree infrastructure") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-15 18:09:32 +01:00
Colin Ian King	7759d6a837	xfrm: policy: add missing indentation There is a missing indentation before the goto statement. Add it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-15 18:09:32 +01:00
Florian Westphal	6ac098b2a9	xfrm: policy: add 2nd-level saddr trees for inexact policies This adds the fourth and final search class, containing policies where both saddr and daddr have prefix lengths (i.e., not wildcards). Inexact policies now end up in one of the following four search classes: 1. "Any:Any" list, containing policies where both saddr and daddr are wildcards or have very coarse prefixes, e.g. 10.0.0.0/8 and the like. 2. "saddr:any" list, containing policies with a fixed saddr/prefixlen, but without destination restrictions. These lists are stored in rbtree nodes; each node contains those policies matching saddr/prefixlen. 3. "Any:daddr" list. Similar to 2), except for policies where only the destinations are specified. 4. "saddr:daddr" lists, containing only those policies that match the given source/destination network. The root of the saddr/daddr nodes gets stored in the nodes of the 'daddr' tree. This diagram illustrates the list classes, and their placement in the lookup hierarchy: xfrm_pol_inexact_bin = hash(dir,type,family,if_id); \| +---- root_d: sorted by daddr:prefix \| \| \| xfrm_pol_inexact_node \| \| \| +- root: sorted by saddr/prefix \| \| \| \| \| xfrm_pol_inexact_node \| \| \| \| \| + root: unused \| \| \| \| \| + hhead: saddr:daddr policies \| \| \| +- coarse policies and all any:daddr policies \| +---- root_s: sorted by saddr:prefix \| \| \| xfrm_pol_inexact_node \| \| \| + root: unused \| \| \| + hhead: saddr:any policies \| +---- coarse policies and all any:any policies lookup for an inexact policy returns pointers to the four relevant list classes, after which each of the lists needs to be searched for the policy with the higher priority. This will only speed up lookups in case we have many policies and a sizeable portion of these have disjunct saddr/daddr addresses. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-09 11:58:27 +01:00
Florian Westphal	64a09a7bfe	xfrm: policy: store inexact policies in a tree ordered by source address This adds the 'saddr:any' search class. It contains all policies that have a fixed saddr/prefixlen, but 'any' destination. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-09 11:58:19 +01:00
Florian Westphal	e901cbc293	xfrm: policy: check reinserted policies match their node validate the re-inserted policies match the lookup node. Policies that fail this test won't be returned in the candidate set. This is enabled by default for now, it should not cause noticeable reinsert slow down. Such reinserts are needed when we have to merge an existing node (e.g. for 10.0.0.0/28 because a overlapping subnet was added (e.g. 10.0.0.0/24), so whenever this happens existing policies have to be placed on the list of the new node. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-09 11:58:13 +01:00
Florian Westphal	9cf545ebd5	xfrm: policy: store inexact policies in a tree ordered by destination address This adds inexact lists per destination network, stored in a search tree. Inexact lookups now return two 'candidate lists', the 'any' policies ('any' destionations), and a list of policies that share same daddr/prefix. Next patch will add a second search tree for 'saddr:any' policies so we can avoid placing those on the 'any:any' list too. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-09 11:58:07 +01:00
Florian Westphal	6be3b0db6d	xfrm: policy: add inexact policy search tree infrastructure At this time inexact policies are all searched in-order until the first match is found. After removal of the flow cache, this resolution has to be performed for every packetm resulting in major slowdown when number of inexact policies is high. This adds infrastructure to later sort inexact policies into a tree. This only introduces a single class: any:any. Next patch will add a search tree to pre-sort policies that have a fixed daddr/prefixlen, so in this patch the any:any class will still be used for all policies. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-09 11:57:59 +01:00
Florian Westphal	b5fe22e233	xfrm: policy: consider if_id when hashing inexact policy This avoids searches of polices that cannot match in the first place due to different interface id by placing them in different bins. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-09 11:57:53 +01:00
Florian Westphal	24969facd7	xfrm: policy: store inexact policies in an rhashtable Switch packet-path lookups for inexact policies to rhashtable. In this initial version, we now no longer need to search policies with non-matching address family and type. Next patch will add the if_id as well so lookups from the xfrm interface driver only need to search inexact policies for that device. Future patches will augment the hlist in each rhash bucket with a tree and pre-sort policies according to daddr/prefix. A single rhashtable is used. In order to avoid a full rhashtable walk on netns exit, the bins get placed on a pernet list, i.e. we add almost no cost for network namespaces that had no xfrm policies. The inexact lists are kept in place, and policies are added to both the per-rhash-inexact list and a pernet one. The latter is needed for the control plane to handle migrate -- these requests do not consider the if_id, so if we'd remove the inexact_list now we would have to search all hash buckets and then figure out which matching policy candidate is the most recent one -- this appears a bit harder than just keeping the 'old' inexact list for this purpose. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-09 11:57:47 +01:00
Florian Westphal	cc1bb845ad	xfrm: policy: return NULL when inexact search needed currently policy_hash_bysel() returns the hash bucket list (for exact policies), or the inexact list (when policy uses a prefix). Searching this inexact list is slow, so it might be better to pre-sort inexact lists into a tree or another data structure for faster searching. However, due to 'any' policies, that need to be searched in any case, doing so will require that 'inexact' policies need to be handled specially to decide the best search strategy. So change hash_bysel() and return NULL if the policy can't be handled via the policy hash table. Right now, we simply use the inexact list when this happens, but future patch can then implement a different strategy. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-09 11:57:38 +01:00
Florian Westphal	a927d6af53	xfrm: policy: split list insertion into a helper ... so we can reuse this later without code duplication when we add policy to a second inexact list. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-09 11:57:30 +01:00
Florian Westphal	ceb159e30a	xfrm: security: iterate all, not inexact lists currently all non-socket policies are either hashed in the dst table, or placed on the 'inexact list'. When flushing, we first walk the table, then the (per-direction) inexact lists. When we try and get rid of the inexact lists to having "n" inexact lists (e.g. per-af inexact lists, or sorted into a tree), this walk would become more complicated. Simplify this: walk the 'all' list and skip socket policies during traversal so we don't need to handle exact and inexact policies separately anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-09 11:57:22 +01:00
Benjamin Poirier	ca92e173ab	xfrm: Fix bucket count reported to userspace sadhcnt is reported by `ip -s xfrm state count` as "buckets count", not the hash mask. Fixes: `28d8909bc7` ("[XFRM]: Export SAD info.") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-06 12:25:47 +01:00
Linus Torvalds	601a88077c	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A number of fixes and some late updates: - make in_compat_syscall() behavior on x86-32 similar to other platforms, this touches a number of generic files but is not intended to impact non-x86 platforms. - objtool fixes - PAT preemption fix - paravirt fixes/cleanups - cpufeatures updates for new instructions - earlyprintk quirk - make microcode version in sysfs world-readable (it is already world-readable in procfs) - minor cleanups and fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: compat: Cleanup in_compat_syscall() callers x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT objtool: Support GCC 9 cold subfunction naming scheme x86/numa_emulation: Fix uniform-split numa emulation x86/paravirt: Remove unused _paravirt_ident_32 x86/mm/pat: Disable preemption around __flush_tlb_all() x86/paravirt: Remove GPL from pv_ops export x86/traps: Use format string with panic() call x86: Clean up 'sizeof x' => 'sizeof(x)' x86/cpufeatures: Enumerate MOVDIR64B instruction x86/cpufeatures: Enumerate MOVDIRI instruction x86/earlyprintk: Add a force option for pciserial device objtool: Support per-function rodata sections x86/microcode: Make revision and processor flags world-readable	2018-11-03 18:25:17 -07:00
Ingo Molnar	23a12ddee1	Merge branch 'core/urgent' into x86/urgent, to pick up objtool fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-11-03 23:42:16 +01:00
Linus Torvalds	82aa467151	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) BPF verifier fixes from Daniel Borkmann. 2) HNS driver fixes from Huazhong Tan. 3) FDB only works for ethernet devices, reject attempts to install FDB rules for others. From Ido Schimmel. 4) Fix spectre V1 in vhost, from Jason Wang. 5) Don't pass on-stack object to irq_set_affinity_hint() in mvpp2 driver, from Marc Zyngier. 6) Fix mlx5e checksum handling when RXFCS is enabled, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits) openvswitch: Fix push/pop ethernet validation net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules bpf: test make sure to run unpriv test cases in test_verifier bpf: add various test cases to test_verifier bpf: don't set id on after map lookup with ptr_to_map_val return bpf: fix partial copy of map_ptr when dst is scalar libbpf: Fix compile error in libbpf_attach_type_by_name kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists selftests: mlxsw: qos_mc_aware: Add a test for UC awareness selftests: mlxsw: qos_mc_aware: Tweak for min shaper mlxsw: spectrum: Set minimum shaper on MC TCs mlxsw: reg: QEEC: Add minimum shaper fields net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset() net: hns3: bugfix for rtnl_lock's range in the hclge_reset() net: hns3: bugfix for handling mailbox while the command queue reinitialized net: hns3: fix incorrect return value/type of some functions net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read net: hns3: bugfix for is_valid_csq_clean_head() net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring() net: hns3: bugfix for the initialization of command queue's spin lock ...	2018-11-01 09:16:01 -07:00
Dmitry Safonov	98f76206b3	compat: Cleanup in_compat_syscall() callers Now that in_compat_syscall() is consistent on all architectures and does not longer report true on native i686, the workarounds (ifdeffery and helpers) can be removed. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andy Lutomirsky <luto@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Stultz <john.stultz@linaro.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-efi@vger.kernel.org Cc: netdev@vger.kernel.org Link: https://lkml.kernel.org/r/20181012134253.23266-3-dima@arista.com	2018-11-01 13:02:21 +01:00
Jeff Kirsher	48e01e001d	ixgbe/ixgbevf: fix XFRM_ALGO dependency Based on the original work from Arnd Bergmann. When XFRM_ALGO is not enabled, the new ixgbe IPsec code produces a link error: drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.o: In function `ixgbe_ipsec_vf_add_sa': ixgbe_ipsec.c:(.text+0x1266): undefined reference to `xfrm_aead_get_byname' Simply selecting XFRM_ALGO from here causes circular dependencies, so to fix it, we probably want this slightly more complex solution that is similar to what other drivers with XFRM offload do: A separate Kconfig symbol now controls whether we include the IPsec offload code. To keep the old behavior, this is left as 'default y'. The dependency in XFRM_OFFLOAD still causes a circular dependency but is not actually needed because this symbol is not user visible, so removing that dependency on top makes it all work. CC: Arnd Bergmann <arnd@arndb.de> CC: Shannon Nelson <shannon.nelson@oracle.com> Fixes: `eda0333ac2` ("ixgbe: add VF IPsec management") Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>	2018-10-31 10:53:15 -07:00
Mike Rapoport	57c8a661d9	mm: remove include/linux/bootmem.h Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include <linux/memblock.h> @@ @@ - #include <linux/bootmem.h> + #include <linux/memblock.h> [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal] Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-10-31 08:54:16 -07:00
Wei Yongjun	533555e5cb	xfrm: Fix error return code in xfrm_output_one() xfrm_output_one() does not return a error code when there is no dst_entry attached to the skb, it is still possible crash with a NULL pointer dereference in xfrm_output_resume(). Fix it by return error code -EHOSTUNREACH. Fixes: `9e14379378` ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-10-28 11:00:26 +01:00
David S. Miller	2e2d6f0342	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-19 11:03:06 -07:00
David S. Miller	8f18da4721	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2018-10-18 1) Remove an unnecessary dev->tstats check in xfrmi_get_stats64. From Li RongQing. 2) We currently do a sizeof(element) instead of a sizeof(array) check when initializing the ovec array of the secpath. Currently this array can have only one element, so code is OK but error-prone. Change this to do a sizeof(array) check so that we can add more elements in future. From Li RongQing. 3) Improve xfrm IPv6 address hashing by using the complete IPv6 addresses for a hash. From Michal Kubecek. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-18 09:57:42 -07:00
Michal Kubecek	8d4b6bce25	xfrm: use complete IPv6 addresses for hash In some environments it is common that many hosts share the same lower half of their IPv6 addresses (in particular ::1). As __xfrm6_addr_hash() and __xfrm6_daddr_saddr_hash() calculate the hash only from the lower halves, as much as 1/3 of the hosts ends up in one hashtable chain which harms the performance. Use complete IPv6 addresses when calculating the hashes. Rather than just adding two more words to the xor, use jhash2() for consistency with __xfrm6_pref_hash() and __xfrm6_dpref_spref_hash(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-10-15 10:09:18 +02:00
Florian Westphal	9dffff200f	xfrm: policy: use hlist rcu variants on insert bydst table/list lookups use rcu, so insertions must use rcu versions. Fixes: `a7c44247f7` ("xfrm: policy: make xfrm_policy_lookup_bytype lockless") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-10-11 13:24:46 +02:00
David Ahern	dac9c9790e	net: Add extack to nlmsg_parse Make sure extack is passed to nlmsg_parse where easy to do so. Most of these are dump handlers and leveraging the extack in the netlink_callback. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-08 10:39:04 -07:00
Li RongQing	f1193e9157	xfrm: use correct size to initialise sp->ovec This place should want to initialize array, not a element, so it should be sizeof(array) instead of sizeof(element) but now this array only has one element, so no error in this condition that XFRM_MAX_OFFLOAD_DEPTH is 1 Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-10-08 08:15:55 +02:00
Li RongQing	b7138fddd6	xfrm: remove unnecessary check in xfrmi_get_stats64 if tstats of a device is not allocated, this device is not registered correctly and can not be used. Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-10-08 08:15:55 +02:00
David S. Miller	6f41617bf2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net' overlapped the renaming of a netlink attribute in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-03 21:00:17 -07:00
Li RongQing	4da402597c	xfrm: fix gro_cells leak when remove virtual xfrm interfaces The device gro_cells has been initialized, it should be freed, otherwise it will be leaked Fixes: `f203b76d78` ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-10-02 08:11:45 +02:00
David S. Miller	2240c12d7d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2018-10-01 1) Make xfrmi_get_link_net() static to silence a sparse warning. From Wei Yongjun. 2) Remove a unused esph pointer definition in esp_input(). From Haishuang Yan. 3) Allow the NIC driver to quietly refuse xfrm offload in case it does not support it, the SA is created without offload in this case. From Shannon Nelson. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 22:31:17 -07:00
David S. Miller	ee0b6f4834	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-10-01 1) Validate address prefix lengths in the xfrm selector, otherwise we may hit undefined behaviour in the address matching functions if the prefix is too big for the given address family. 2) Fix skb leak on local message size errors. From Thadeu Lima de Souza Cascardo. 3) We currently reset the transport header back to the network header after a transport mode transformation is applied. This leads to an incorrect transport header when multiple transport mode transformations are applied. Reset the transport header only after all transformations are already applied to fix this. From Sowmini Varadhan. 4) We only support one offloaded xfrm, so reset crypto_done after the first transformation in xfrm_input(). Otherwise we may call the wrong input method for subsequent transformations. From Sowmini Varadhan. 5) Fix NULL pointer dereference when skb_dst_force clears the dst_entry. skb_dst_force does not really force a dst refcount anymore, it might clear it instead. xfrm code did not expect this, add a check to not dereference skb_dst() if it was cleared by skb_dst_force. 6) Validate xfrm template mode, otherwise we can get a stack-out-of-bounds read in xfrm_state_find. From Sean Tranchetti. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 22:29:25 -07:00
Maciej Żenczykowski	1042caa79e	net-ipv4: remove 2 always zero parameters from ipv4_redirect() (the parameters in question are mark and flow_flags) Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:30:55 -07:00
Maciej Żenczykowski	d888f39666	net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu() (the parameters in question are mark and flow_flags) Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:30:55 -07:00
Sean Tranchetti	32bf94fb5c	xfrm: validate template mode XFRM mode parameters passed as part of the user templates in the IP_XFRM_POLICY are never properly validated. Passing values other than valid XFRM modes can cause stack-out-of-bounds reads to occur later in the XFRM processing: [ 140.535608] ================================================================ [ 140.543058] BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x17e4/0x1cc4 [ 140.550306] Read of size 4 at addr ffffffc0238a7a58 by task repro/5148 [ 140.557369] [ 140.558927] Call trace: [ 140.558936] dump_backtrace+0x0/0x388 [ 140.558940] show_stack+0x24/0x30 [ 140.558946] __dump_stack+0x24/0x2c [ 140.558949] dump_stack+0x8c/0xd0 [ 140.558956] print_address_description+0x74/0x234 [ 140.558960] kasan_report+0x240/0x264 [ 140.558963] __asan_report_load4_noabort+0x2c/0x38 [ 140.558967] xfrm_state_find+0x17e4/0x1cc4 [ 140.558971] xfrm_resolve_and_create_bundle+0x40c/0x1fb8 [ 140.558975] xfrm_lookup+0x238/0x1444 [ 140.558977] xfrm_lookup_route+0x48/0x11c [ 140.558984] ip_route_output_flow+0x88/0xc4 [ 140.558991] raw_sendmsg+0xa74/0x266c [ 140.558996] inet_sendmsg+0x258/0x3b0 [ 140.559002] sock_sendmsg+0xbc/0xec [ 140.559005] SyS_sendto+0x3a8/0x5a8 [ 140.559008] el0_svc_naked+0x34/0x38 [ 140.559009] [ 140.592245] page dumped because: kasan: bad access detected [ 140.597981] page_owner info is not active (free page?) [ 140.603267] [ 140.653503] ================================================================ Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-09-20 08:30:42 +02:00
Steffen Klassert	9e14379378	xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry. Since commit `222d7dbd25` ("net: prevent dst uses after free") skb_dst_force() might clear the dst_entry attached to the skb. The xfrm code don't expect this to happen, so we crash with a NULL pointer dereference in this case. Fix it by checking skb_dst(skb) for NULL after skb_dst_force() and drop the packet in cast the dst_entry was cleared. Fixes: `222d7dbd25` ("net: prevent dst uses after free") Reported-by: Tobias Hommel <netdev-list@genoetigt.de> Reported-by: Kristian Evensen <kristian.evensen@gmail.com> Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-09-11 11:28:25 +02:00
David S. Miller	a8305bff68	net: Add and use skb_mark_not_on_list(). An SKB is not on a list if skb->next is NULL. Codify this convention into a helper function and use it where we are dequeueing an SKB and need to mark it as such. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-10 10:06:54 -07:00
Sowmini Varadhan	782710e333	xfrm: reset crypto_done when iterating over multiple input xfrms We only support one offloaded xfrm (we do not have devices that can handle more than one offload), so reset crypto_done in xfrm_input() when iterating over multiple transforms in xfrm_input, so that we can invoke the appropriate x->type->input for the non-offloaded transforms Fixes: `d77e38e612` ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-09-04 10:27:07 +02:00
Shannon Nelson	4a132095dd	xfrm: allow driver to quietly refuse offload If the "offload" attribute is used to create an IPsec SA and the .xdo_dev_state_add() fails, the SA creation fails. However, if the "offload" attribute is used on a device that doesn't offer it, the attribute is quietly ignored and the SA is created without an offload. Along the same line of that second case, it would be good to have a way for the device to refuse to offload an SA without failing the whole SA creation. This patch adds that feature by allowing the driver to return -EOPNOTSUPP as a signal that the SA may be fine, it just can't be offloaded. This allows the user a little more flexibility in requesting offloads and not needing to know every detail at all times about each specific NIC when trying to create SAs. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-08-29 08:04:44 +02:00
Wei Yongjun	211d6f2dc8	xfrm: Make function xfrmi_get_link_net() static Fixes the following sparse warning: net/xfrm/xfrm_interface.c:745:12: warning: symbol 'xfrmi_get_link_net' was not declared. Should it be static? Fixes: `f203b76d78` ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-08-29 07:56:36 +02:00
Steffen Klassert	07bf790895	xfrm: Validate address prefix lengths in the xfrm selector. We don't validate the address prefix lengths in the xfrm selector we got from userspace. This can lead to undefined behaviour in the address matching functions if the prefix is too big for the given address family. Fix this by checking the prefixes and refuse SA/policy insertation when a prefix is invalid. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Air Icy <icytxw@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-08-03 10:23:07 +02:00
David S. Miller	89b1698c93	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net The BTF conflicts were simple overlapping changes. The virtio_net conflict was an overlap of a fix of statistics counter, happening alongisde a move over to a bonafide statistics structure rather than counting value on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-02 10:55:32 -07:00
David S. Miller	7a49d3d4ea	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2018-07-27 1) Extend the output_mark to also support the input direction and masking the mark values before applying to the skb. 2) Add a new lookup key for the upcomming xfrm interfaces. 3) Extend the xfrm lookups to match xfrm interface IDs. 4) Add virtual xfrm interfaces. The purpose of these interfaces is to overcome the design limitations that the existing VTI devices have. The main limitations that we see with the current VTI are the following: VTI interfaces are L3 tunnels with configurable endpoints. For xfrm, the tunnel endpoint are already determined by the SA. So the VTI tunnel endpoints must be either the same as on the SA or wildcards. In case VTI tunnel endpoints are same as on the SA, we get a one to one correlation between the SA and the tunnel. So each SA needs its own tunnel interface. On the other hand, we can have only one VTI tunnel with wildcard src/dst tunnel endpoints in the system because the lookup is based on the tunnel endpoints. The existing tunnel lookup won't work with multiple tunnels with wildcard tunnel endpoints. Some usecases require more than on VTI tunnel of this type, for example if somebody has multiple namespaces and every namespace requires such a VTI. VTI needs separate interfaces for IPv4 and IPv6 tunnels. So when routing to a VTI, we have to know to which address family this traffic class is going to be encapsulated. This is a lmitation because it makes routing more complex and it is not always possible to know what happens behind the VTI, e.g. when the VTI is move to some namespace. VTI works just with tunnel mode SAs. We need generic interfaces that ensures transfomation, regardless of the xfrm mode and the encapsulated address family. VTI is configured with a combination GRE keys and xfrm marks. With this we have to deal with some extra cases in the generic tunnel lookup because the GRE keys on the VTI are actually not GRE keys, the GRE keys were just reused for something else. All extensions to the VTI interfaces would require to add even more complexity to the generic tunnel lookup. So to overcome this, we developed xfrm interfaces with the following design goal: It should be possible to tunnel IPv4 and IPv6 through the same interface. No limitation on xfrm mode (tunnel, transport and beet). Should be a generic virtual interface that ensures IPsec transformation, no need to know what happens behind the interface. Interfaces should be configured with a new key that must match a new policy/SA lookup key. The lookup logic should stay in the xfrm codebase, no need to change or extend generic routing and tunnel lookups. Should be possible to use IPsec hardware offloads of the underlying interface. 5) Remove xfrm pcpu policy cache. This was added after the flowcache removal, but it turned out to make things even worse. From Florian Westphal. 6) Allow to update the set mark on SA updates. From Nathan Harold. 7) Convert some timestamps to time64_t. From Arnd Bergmann. 8) Don't check the offload_handle in xfrm code, it is an opaque data cookie for the driver. From Shannon Nelson. 9) Remove xfrmi interface ID from flowi. After this pach no generic code is touched anymore to do xfrm interface lookups. From Benedict Wong. 10) Allow to update the xfrm interface ID on SA updates. From Nathan Harold. 11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle. From YueHaibing. 12) Return more detailed errors on xfrm interface creation. From Benedict Wong. 13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR. From the kbuild test robot. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-27 09:33:37 -07:00
kbuild test robot	c6f5e017df	xfrm: fix ptr_ret.cocci warnings net/xfrm/xfrm_interface.c:692:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Fixes: `44e2b838c2` ("xfrm: Return detailed errors from xfrmi_newlink") CC: Benedict Wong <benedictwong@google.com> Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-27 06:47:28 +02:00
Benedict Wong	44e2b838c2	xfrm: Return detailed errors from xfrmi_newlink Currently all failure modes of xfrm interface creation return EEXIST. This change improves the granularity of errnos provided by also returning ENODEV or EINVAL if failures happen in looking up the underlying interface, or a required parameter is not provided. This change has been tested against the Android Kernel Networking Tests, with additional xfrmi_newlink tests here: https://android-review.googlesource.com/c/kernel/tests/+/715755 Signed-off-by: Benedict Wong <benedictwong@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-26 07:17:26 +02:00
YueHaibing	934ffce134	xfrm: fix 'passing zero to ERR_PTR()' warning Fix a static code checker warning: net/xfrm/xfrm_policy.c:1836 xfrm_resolve_and_create_bundle() warn: passing zero to 'ERR_PTR' xfrm_tmpl_resolve return 0 just means no xdst found, return NULL instead of passing zero to ERR_PTR. Fixes: `d809ec8955` ("xfrm: do not assume that template resolving always returns xfrms") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-26 07:13:44 +02:00
Stephen Hemminger	2e13b58069	xfrm: remove blank lines at EOF Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-24 14:10:42 -07:00
Nathan Harold	5baf4f9c00	xfrm: Allow xfrmi if_id to be updated by UPDSA Allow attaching an SA to an xfrm interface id after the creation of the SA, so that tasks such as keying which must be done as the SA is created, can remain separate from the decision on how to route traffic from an SA. This permits SA creation to be decomposed in to three separate steps: 1) allocation of a SPI 2) algorithm and key negotiation 3) insertion into the data path Signed-off-by: Nathan Harold <nharold@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-20 10:19:19 +02:00
Benedict Wong	bc56b33404	xfrm: Remove xfrmi interface ID from flowi In order to remove performance impact of having the extra u32 in every single flowi, this change removes the flowi_xfrm struct, prefering to take the if_id as a method parameter where needed. In the inbound direction, if_id is only needed during the __xfrm_check_policy() function, and the if_id can be determined at that point based on the skb. As such, xfrmi_decode_session() is only called with the skb in __xfrm_check_policy(). In the outbound direction, the only place where if_id is needed is the xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is directly passed into the xfrm_lookup_with_ifid() call. All existing callers can still call xfrm_lookup(), which uses a default if_id of 0. This change does not change any behavior of XFRMIs except for improving overall system performance via flowi size reduction. This change has been tested against the Android Kernel Networking Tests: https://android.googlesource.com/kernel/tests/+/master/net/test Signed-off-by: Benedict Wong <benedictwong@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-20 10:14:41 +02:00
Shannon Nelson	fcb662deeb	xfrm: don't check offload_handle for nonzero The offload_handle should be an opaque data cookie for the driver to use, much like the data cookie for a timer or alarm callback. Thus, the XFRM stack should not be checking for non-zero, because the driver might use that to store an array reference, which could be zero, or some other zero but meaningful value. We can remove the checks for non-zero because there are plenty other attributes also being checked to see if there is an offload in place for the SA in question. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-19 10:18:04 +02:00
Arnd Bergmann	386c5680e2	xfrm: use time64_t for in-kernel timestamps The lifetime managment uses '__u64' timestamps on the user space interface, but 'unsigned long' for reading the current time in the kernel with get_seconds(). While this is probably safe beyond y2038, it will still overflow in 2106, and the get_seconds() call is deprecated because fo that. This changes the xfrm time handling to use time64_t consistently, along with reading the time using the safer ktime_get_real_seconds(). It still suffers from problems that can happen from a concurrent settimeofday() call or (to a lesser degree) a leap second update, but since the time stamps are part of the user API, there is nothing we can do to prevent that. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-11 15:25:30 +02:00
Nathan Harold	6d8e85ffe1	xfrm: Allow Set Mark to be Updated Using UPDSA Allow UPDSA to change "set mark" to permit policy separation of packet routing decisions from SA keying in systems that use mark-based routing. The set mark, used as a routing and firewall mark for outbound packets, is made update-able which allows routing decisions to be handled independently of keying/SA creation. To maintain consistency with other optional attributes, the set mark is only updated if sent with a non-zero value. The per-SA lock and the xfrm_state_lock are taken in that order to avoid a deadlock with xfrm_timer_handler(), which also takes the locks in that order. Signed-off-by: Nathan Harold <nharold@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-01 17:47:10 +02:00
Florian Westphal	e4db5b61c5	xfrm: policy: remove pcpu policy cache Kristian Evensen says: In a project I am involved in, we are running ipsec (Strongswan) on different mt7621-based routers. Each router is configured as an initiator and has around ~30 tunnels to different responders (running on misc. devices). Before the flow cache was removed (kernel 4.9), we got a combined throughput of around 70Mbit/s for all tunnels on one router. However, we recently switched to kernel 4.14 (4.14.48), and the total throughput is somewhere around 57Mbit/s (best-case). I.e., a drop of around 20%. Reverting the flow cache removal restores, as expected, performance levels to that of kernel 4.9. When pcpu xdst exists, it has to be validated first before it can be used. A negative hit thus increases cost vs. no-cache. As number of tunnels increases, hit rate decreases so this pcpu caching isn't a viable strategy. Furthermore, the xdst cache also needs to run with BH off, so when removing this the bh disable/enable pairs can be removed too. Kristian tested a 4.14.y backport of this change and reported increased performance: In our tests, the throughput reduction has been reduced from around -20% to -5%. We also see that the overall throughput is independent of the number of tunnels, while before the throughput was reduced as the number of tunnels increased. Reported-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-06-25 17:46:06 +02:00
Florian Westphal	86126b77dc	xfrm: free skb if nlsk pointer is NULL nlmsg_multicast() always frees the skb, so in case we cannot call it we must do that ourselves. Fixes: `21ee543edc` ("xfrm: fix race between netns cleanup and state expire notification") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-06-25 16:43:36 +02:00
Steffen Klassert	f203b76d78	xfrm: Add virtual xfrm interfaces This patch adds support for virtual xfrm interfaces. Packets that are routed through such an interface are guaranteed to be IPsec transformed or dropped. It is a generic virtual interface that ensures IPsec transformation, no need to know what happens behind the interface. This means that we can tunnel IPv4 and IPv6 through the same interface and support all xfrm modes (tunnel, transport and beet) on it. Co-developed-by: Lorenzo Colitti <lorenzo@google.com> Co-developed-by: Benedict Wong <benedictwong@google.com> Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Benedict Wong <benedictwong@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Benedict Wong <benedictwong@google.com> Tested-by: Antony Antony <antony@phenome.org> Reviewed-by: Eyal Birger <eyal.birger@gmail.com>	2018-06-23 16:07:25 +02:00
Steffen Klassert	7e6526404a	xfrm: Add a new lookup key to match xfrm interfaces. This patch adds the xfrm interface id as a lookup key for xfrm states and policies. With this we can assign states and policies to virtual xfrm interfaces. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Benedict Wong <benedictwong@google.com> Tested-by: Benedict Wong <benedictwong@google.com> Tested-by: Antony Antony <antony@phenome.org> Reviewed-by: Eyal Birger <eyal.birger@gmail.com>	2018-06-23 16:07:15 +02:00

1 2 3 4 5 ...

1252 Commits