linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	fb5e2154b7	While testing my development branch, without the fix for the pid use after free bug, the selftest that Namhyung added triggers it. I figured it would be good to add the test for the bug after the fix, such that it does not exist without the fix. I added another patch that lets the test only test part of the pid filtering, and ignores the function-fork (filtering on children as well) if the function-fork feature does not exist. This feature is added by Namhyung just before he added this test. But since the test tests both with and without the feature, it would be good to let it not fail if the feature does not exist. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJY9kVEFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L nZQIAMJN51sNAnJHodKieAx6NUdnFbih7XknFZePGsGX2CHaRpPJuYRTEMIJrtds FSGCKOWjmmZ57xB/WYsCdH2H4cqd2TCFIeCT+6Pglk4+L2Y97idg5tzJ0+QGnDqT zBMd1kcmLathH5OoNsUEO5FR0QplBTb+3kVRu9XaAUgJhIlLwbF58BdtOv0l0avb saV/cVLosUjb4TXxwPgRZnmH9YElQ7RElf0S60JKbFTHCzyvoG0U17seFAklZOQl Ux0nn+LFWM+M7e7LYR3nSXnOzofDMz9r1bGGo9bgkng0Csl2Op1MFttofcsi3PvT FUxUGPZSEjxj3XrxXrkzzK8pRuI= =NEh1 -----END PGP SIGNATURE----- Merge tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace testcase update from Steven Rostedt: "While testing my development branch, without the fix for the pid use after free bug, the selftest that Namhyung added triggers it. I figured it would be good to add the test for the bug after the fix, such that it does not exist without the fix. I added another patch that lets the test only test part of the pid filtering, and ignores the function-fork (filtering on children as well) if the function-fork feature does not exist. This feature is added by Namhyung just before he added this test. But since the test tests both with and without the feature, it would be good to let it not fail if the feature does not exist" * tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests: ftrace: Add check for function-fork before running pid filter test selftests: ftrace: Add a testcase for function PID filter	2017-04-18 10:19:47 -07:00
Steven Rostedt (VMware)	9ed19c7695	selftests: ftrace: Add check for function-fork before running pid filter test Have the func-filter-pid test check for the function-fork option before testing it. It can still test the pid filtering, but will stop before testing the function-fork option for children inheriting the pids. This allows the test to be added before the function-fork feature, but after a bug fix that triggers one of the bugs the test can cause. Cc: Namhyung Kim <namhyung@kernel.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-04-18 12:46:11 -04:00
Linus Torvalds	0bad6d7e93	Namhyung Kim discovered a use after free bug. It has to do with adding a pid filter to function tracing in an instance, and then freeing the instance. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJY9hO7FBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L qBgIAJv+IH1zQTHqFn4gOtIkHJ0kxjTr9mzz4S5SgnHDMaCKOHTpuste02RmCvfo J+6F//bw3eM9CpEcQg/t41aFagXs+g3x1HmD0PN7Y1fKHXQ5xDdpjPpOsgprrx7q dvGLg4bolv6KaNMTJmJ8LhwPXJGMEqnbY6Ypz3qbnsziSeXe1zcrQKNA88ySJoh0 V6QV9XPWNkPO4AknnqD88oZvJhz/H/fQuJYQZNBoTomD6SG3f7mYW1bxyoWc08yW W+Rg/YddGHk6Mmkqy0BaCPBjKjGiq20h9DOvLU6CFR0Gt4ZQ7sVZczYN4NkjEn7H qdFcqaHNSkjxs0JFvbWToIu4D8w= =Gv/C -----END PGP SIGNATURE----- Merge tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Namhyung Kim discovered a use after free bug. It has to do with adding a pid filter to function tracing in an instance, and then freeing the instance" * tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix function pid filter on instances	2017-04-18 09:31:51 -07:00
Linus Torvalds	5ee4c5a929	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following problems: - regression in new XTS/LRW code when used with async crypto - long-standing bug in ahash API when used with certain algos - bogus memory dereference in async algif_aead with certain algos" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: algif_aead - Fix bogus request dereference in completion function crypto: ahash - Fix EINPROGRESS notification callback crypto: lrw - Fix use-after-free on EINPROGRESS crypto: xts - Fix use-after-free on EINPROGRESS	2017-04-18 09:03:50 -07:00
Namhyung Kim	093be89a12	selftests: ftrace: Add a testcase for function PID filter Like event pid filtering test, add function pid filtering test with the new "function-fork" option. It also tests it on an instance directory so that it can verify the bug related pid filtering on instances. Link: http://lkml.kernel.org/r/20170417024430.21194-5-namhyung@kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-04-18 12:02:36 -04:00
Matthias Kaehlcke	bbf67e450a	nl80211: Fix enum type of variable in nl80211_put_sta_rate() rate_flg is of type 'enum nl80211_attrs', however it is assigned with 'enum nl80211_rate_info' values. Change the type of rate_flg accordingly. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-04-18 11:03:03 +02:00
Matthias Kaehlcke	a4ac6f2e53	mac80211: ibss: Fix channel type enum in ieee80211_sta_join_ibss() cfg80211_chandef_create() expects an 'enum nl80211_channel_type' as channel type however in ieee80211_sta_join_ibss() NL80211_CHAN_WIDTH_20_NOHT is passed in two occasions, which is of the enum type 'nl80211_chan_width'. Change the value to NL80211_CHAN_NO_HT (20 MHz, non-HT channel) of the channel type enum. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-04-18 11:02:35 +02:00
Matthias Kaehlcke	aa1702dd16	cfg80211: Fix array-bounds warning in fragment copy __ieee80211_amsdu_copy_frag intentionally initializes a pointer to array[-1] to increment it later to valid values. clang rightfully generates an array-bounds warning on the initialization statement. Initialize the pointer to array[0] and change the algorithm from increment before to increment after consume. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-04-18 11:02:18 +02:00
Johannes Berg	f64331d580	mac80211: keep a separate list of monitor interfaces that are up In addition to keeping monitor interfaces on the regular list of interfaces, keep those that are up and not in cooked mode on a separate list. This saves having to iterate all interfaces when delivering to monitor interfaces. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-04-18 11:00:13 +02:00
Arend Van Spriel	96b08fd608	nl80211: add request id in scheduled scan event messages For multi-scheduled scan support in subsequent patch a request id will be added. This patch add this request id to the scheduled scan event messages. For now the request id will always be zero. With multi-scheduled scan its value will inform user-space to which scan the event relates. Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-04-18 10:23:50 +02:00
Linus Torvalds	20bb78f6b3	Merge branch 'parisc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "One patch which fixes get_user() for 64-bit values on 32-bit kernels. Up to now we lost the upper 32-bits of the returned 64-bit value" * 'parisc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix get_user() for 64-bit value on 32-bit kernel	2017-04-17 15:06:34 -07:00
Namhyung Kim	d879d0b8c1	ftrace: Fix function pid filter on instances When function tracer has a pid filter, it adds a probe to sched_switch to track if current task can be ignored. The probe checks the ftrace_ignore_pid from current tr to filter tasks. But it misses to delete the probe when removing an instance so that it can cause a crash due to the invalid tr pointer (use-after-free). This is easily reproducible with the following: # cd /sys/kernel/debug/tracing # mkdir instances/buggy # echo $$ > instances/buggy/set_ftrace_pid # rmdir instances/buggy ============================================================================ BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90 Read of size 8 by task kworker/0:1/17 CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G B 4.11.0-rc3 #198 Call Trace: dump_stack+0x68/0x9f kasan_object_err+0x21/0x70 kasan_report.part.1+0x22b/0x500 ? ftrace_filter_pid_sched_switch_probe+0x3d/0x90 kasan_report+0x25/0x30 __asan_load8+0x5e/0x70 ftrace_filter_pid_sched_switch_probe+0x3d/0x90 ? fpid_start+0x130/0x130 __schedule+0x571/0xce0 ... To fix it, use ftrace_clear_pids() to unregister the probe. As instance_rmdir() already updated ftrace codes, it can just free the filter safely. Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org Fixes: `0c8916c342` ("tracing: Add rmdir to remove multibuffer instances") Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-04-17 16:44:23 -04:00
David S. Miller	acf167f3f2	Merge branch 'bpf-fixes' Daniel Borkmann says: ==================== Two BPF fixes The set fixes cb_access and xdp_adjust_head bits in struct bpf_prog, that are used for requirement checks on the program rather than f.e. heuristics. Thus, for tail calls, we cannot make any assumptions and are forced to set them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:51:58 -04:00
Daniel Borkmann	c2002f9837	bpf: fix checking xdp_adjust_head on tail calls Commit `17bedab272` ("bpf: xdp: Allow head adjustment in XDP prog") added the xdp_adjust_head bit to the BPF prog in order to tell drivers that the program that is to be attached requires support for the XDP bpf_xdp_adjust_head() helper such that drivers not supporting this helper can reject the program. There are also drivers that do support the helper, but need to check for xdp_adjust_head bit in order to move packet metadata prepended by the firmware away for making headroom. For these cases, the current check for xdp_adjust_head bit is insufficient since there can be cases where the program itself does not use the bpf_xdp_adjust_head() helper, but tail calls into another program that uses bpf_xdp_adjust_head(). As such, the xdp_adjust_head bit is still set to 0. Since the first program has no control over which program it calls into, we need to assume that bpf_xdp_adjust_head() helper is used upon tail calls. Thus, for the very same reasons in cb_access, set the xdp_adjust_head bit to 1 when the main program uses tail calls. Fixes: `17bedab272` ("bpf: xdp: Allow head adjustment in XDP prog") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:51:57 -04:00
Daniel Borkmann	6b1bb01bcc	bpf: fix cb access in socket filter programs on tail calls Commit `ff936a04e5` ("bpf: fix cb access in socket filter programs") added a fix for socket filter programs such that in i) AF_PACKET the 20 bytes of skb->cb[] area gets zeroed before use in order to not leak data, and ii) socket filter programs attached to TCP/UDP sockets need to save/restore these 20 bytes since they are also used by protocol layers at that time. The problem is that bpf_prog_run_save_cb() and bpf_prog_run_clear_cb() only look at the actual attached program to determine whether to zero or save/restore the skb->cb[] parts. There can be cases where the actual attached program does not access the skb->cb[], but the program tail calls into another program which does access this area. In such a case, the zero or save/restore is currently not performed. Since the programs we tail call into are unknown at verification time and can dynamically change, we need to assume that whenever the attached program performs a tail call, that later programs could access the skb->cb[], and therefore we need to always set cb_access to 1. Fixes: `ff936a04e5` ("bpf: fix cb access in socket filter programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:51:57 -04:00
Chonggang Li	b89f04c61e	bonding: deliver link-local packets with skb->dev set to link that packets arrived on Bonding driver changes the skb->dev to the bonding-master before passing the packet to stack for further processing. This, however does not make sense for the link-local packets and it loses "the link info" once its skb->dev is changed to bonding-master. This patch changes this behavior for link-local packets by not changing the skb->dev to the bonding-master and maintaining it as it is, i.e. the link on which the packet arrived. Signed-off-by: Chonggang Li <chonggangli@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:45:47 -04:00
David Ahern	c21ef3e343	net: rtnetlink: plumb extended ack to doit function Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse for doit functions that call it directly. This is the first step to using extended error reporting in rtnetlink. >From here individual subsystems can be updated to set netlink_ext_ack as needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:35:38 -04:00
David Lebrun	af3b5158b8	ipv6: sr: fix BUG due to headroom too small after SRH push When a locally generated packet receives an SRH with two or more segments, the remaining headroom is too small to push an ethernet header. This patch ensures that the headroom is large enough after SRH push. The BUG generated the following trace. [ 192.950285] skbuff: skb_under_panic: text:ffffffff81809675 len:198 put:14 head:ffff88006f306400 data:ffff88006f3063fa tail:0xc0 end:0x2c0 dev:A-1 [ 192.952456] ------------[ cut here ]------------ [ 192.953218] kernel BUG at net/core/skbuff.c:105! [ 192.953411] invalid opcode: 0000 [#1] PREEMPT SMP [ 192.953411] Modules linked in: [ 192.953411] CPU: 5 PID: 3433 Comm: ping6 Not tainted 4.11.0-rc3+ #237 [ 192.953411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 [ 192.953411] task: ffff88007c2d42c0 task.stack: ffffc90000ef4000 [ 192.953411] RIP: 0010:skb_panic+0x61/0x70 [ 192.953411] RSP: 0018:ffffc90000ef7900 EFLAGS: 00010286 [ 192.953411] RAX: 0000000000000085 RBX: 00000000000086dd RCX: 0000000000000201 [ 192.953411] RDX: 0000000080000201 RSI: ffffffff81d104c5 RDI: 00000000ffffffff [ 192.953411] RBP: ffffc90000ef7920 R08: 0000000000000001 R09: 0000000000000000 [ 192.953411] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 192.953411] R13: ffff88007c5a4000 R14: ffff88007b363d80 R15: 00000000000000b8 [ 192.953411] FS: 00007f94b558b700(0000) GS:ffff88007fd40000(0000) knlGS:0000000000000000 [ 192.953411] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 192.953411] CR2: 00007fff5ecd5080 CR3: 0000000074141000 CR4: 00000000001406e0 [ 192.953411] Call Trace: [ 192.953411] skb_push+0x3b/0x40 [ 192.953411] eth_header+0x25/0xc0 [ 192.953411] neigh_resolve_output+0x168/0x230 [ 192.953411] ? ip6_finish_output2+0x242/0x8f0 [ 192.953411] ip6_finish_output2+0x242/0x8f0 [ 192.953411] ? ip6_finish_output2+0x76/0x8f0 [ 192.953411] ip6_finish_output+0xa8/0x1d0 [ 192.953411] ip6_output+0x64/0x2d0 [ 192.953411] ? ip6_output+0x73/0x2d0 [ 192.953411] ? ip6_dst_check+0xb5/0xc0 [ 192.953411] ? dst_cache_per_cpu_get.isra.2+0x40/0x80 [ 192.953411] seg6_output+0xb0/0x220 [ 192.953411] lwtunnel_output+0xcf/0x210 [ 192.953411] ? lwtunnel_output+0x59/0x210 [ 192.953411] ip6_local_out+0x38/0x70 [ 192.953411] ip6_send_skb+0x2a/0xb0 [ 192.953411] ip6_push_pending_frames+0x48/0x50 [ 192.953411] rawv6_sendmsg+0xa39/0xf10 [ 192.953411] ? __lock_acquire+0x489/0x890 [ 192.953411] ? __mutex_lock+0x1fc/0x970 [ 192.953411] ? __lock_acquire+0x489/0x890 [ 192.953411] ? __mutex_lock+0x1fc/0x970 [ 192.953411] ? tty_ioctl+0x283/0xec0 [ 192.953411] inet_sendmsg+0x45/0x1d0 [ 192.953411] ? _copy_from_user+0x54/0x80 [ 192.953411] sock_sendmsg+0x33/0x40 [ 192.953411] SYSC_sendto+0xef/0x170 [ 192.953411] ? entry_SYSCALL_64_fastpath+0x5/0xc2 [ 192.953411] ? trace_hardirqs_on_caller+0x12b/0x1b0 [ 192.953411] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 192.953411] SyS_sendto+0x9/0x10 [ 192.953411] entry_SYSCALL_64_fastpath+0x1f/0xc2 [ 192.953411] RIP: 0033:0x7f94b453db33 [ 192.953411] RSP: 002b:00007fff5ecd0578 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 192.953411] RAX: ffffffffffffffda RBX: 00007fff5ecd16e0 RCX: 00007f94b453db33 [ 192.953411] RDX: 0000000000000040 RSI: 000055a78352e9c0 RDI: 0000000000000003 [ 192.953411] RBP: 00007fff5ecd1690 R08: 000055a78352c940 R09: 000000000000001c [ 192.953411] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a783321e10 [ 192.953411] R13: 000055a7839890c0 R14: 0000000000000004 R15: 0000000000000000 [ 192.953411] Code: 00 00 48 89 44 24 10 8b 87 c4 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 90 58 d2 81 48 89 04 24 31 c0 e8 4f 70 9a ff <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 97 d8 00 00 [ 192.953411] RIP: skb_panic+0x61/0x70 RSP: ffffc90000ef7900 [ 193.000186] ---[ end trace bd0b89fabdf2f92c ]--- [ 193.000951] Kernel panic - not syncing: Fatal exception in interrupt [ 193.001137] Kernel Offset: disabled [ 193.001169] ---[ end Kernel panic - not syncing: Fatal exception in interrupt Fixes: `19d5a26f5e` ("ipv6: sr: expand skb head only if necessary") Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:33:53 -04:00
Ilan Tayari	7a7a9bd7ac	gso: Validate assumption of frag_list segementation Commit `07b26c9454` ("gso: Support partial splitting at the frag_list pointer") assumes that all SKBs in a frag_list (except maybe the last one) contain the same amount of GSO payload. This assumption is not always correct, resulting in the following warning message in the log: skb_segment: too many frags For example, mlx5 driver in Striding RQ mode creates some RX SKBs with one frag, and some with 2 frags. After GRO, the frag_list SKBs end up having different amounts of payload. If this frag_list SKB is then forwarded, the aforementioned assumption is violated. Validate the assumption, and fall back to software GSO if it not true. Fixes: `07b26c9454` ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:31:29 -04:00
Xin Long	edb12f2d72	sctp: get list_of_streams of strreset outreq earlier Now when processing strreset out responses, it gets outreq->list_of_streams only when result is performed. But if result is not performed, str_p will be NULL. It will cause panic in sctp_ulpevent_make_stream_reset_event if nums is not 0. This patch is to fix it by getting outreq->list_of_streams earlier, and also to improve some codes for the strreset inreq process. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:25:35 -04:00
Chenbo Feng	9fd0f31563	Add uid and cookie bpf helper to cg_skb_func_proto BPF helper functions get_socket_cookie and get_socket_uid can be used for network traffic classifications, among others. Expose them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of commit `8f917bba00` ("bpf: pass sk to helper functions") the required skb->sk function is available at both cgroup bpf ingress and egress hooks. With these two new helper, cg_skb_func_proto is effectively the same as sk_filter_func_proto. Change since V1: Instead of add the helper to cg_skb_func_proto, redirect the cg_skb_func_proto to sk_filter_func_proto since all helper function in sk_filter_func_proto are applicable to cg_skb_func_proto now. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:22:00 -04:00
Simon Xiao	f3c9d40ee1	hv_netvsc: change netvsc device default duplex to FULL The netvsc device supports full duplex by default. This warnings in log from bonding device which did not like seeing UNKNOWN duplex. Signed-off-by: Simon Xiao <sixiao@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:12:33 -04:00
stephen hemminger	776e726bfb	netvsc: fix RCU warning in get_stats The statistics functionis called with RTNL held during probe but with RCU held during access from /proc and elsewhere. This is safe so update the lockdep annotation. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:12:33 -04:00
Dan Carpenter	1dbba4cb8d	net: phy: test the right variable in phy_write_mmd() This is a copy and paste buglet. We meant to test for ->write_mmd but we test for ->read_mmd. Fixes: `1ee6b9bc62` ("net: phy: make phy_(read\|write)_mmd() generic MMD accessors") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:10:42 -04:00
Florian Westphal	0aa8c13eb5	ipv6: drop non loopback packets claiming to originate from ::1 We lack a saddr check for ::1. This causes security issues e.g. with acls permitting connections from ::1 because of assumption that these originate from local machine. Assuming a source address of ::1 is local seems reasonable. RFC4291 doesn't allow such a source address either, so drop such packets. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:09:23 -04:00
David S. Miller	450cc8cce2	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-04-14 Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12 kernel. - Many fixes to 6LoWPAN, in particular for BLE - New CA8210 IEEE 802.15.4 device driver (accounting for most of the lines of code added in this pull request) - Added Nokia Bluetooth (UART) HCI driver - Some serdev & TTY changes that are dependencies for the Nokia driver (with acks from relevant maintainers and an agreement that these come through the bluetooth tree) - Support for new Intel Bluetooth device - Various other minor cleanups/fixes here and there Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 15:00:57 -04:00
David S. Miller	d584fec634	Merge branch 'bpf-lru-perf' Martin KaFai Lau says: ==================== bpf: LRU performance and test-program improvements The first 4 patches make a few improvements to the LRU tests. Patch 5/6 is to improve the performance of BPF_F_NO_COMMON_LRU map. Patch 6/6 adds an example in using LRU map with map-in-map. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:54 -04:00
Martin KaFai Lau	3a5795b83d	bpf: lru: Add map-in-map LRU example This patch adds a map-in-map LRU example. If we know only a subset of cores will use the LRU, we can allocate a common LRU list per targeting core and store it into an array-of-hashs. It allows using the common LRU map with map-update performance comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory on the unused cores that we know they will never access the LRU map. BPF_F_NO_COMMON_LRU: > map_perf_test 32 8 10000000 10000000 \| awk '{sum += $3}END{print sum}' 9234314 (9.23M/s) map-in-map LRU: > map_perf_test 512 8 1260000 80000000 \| awk '{sum += $3}END{print sum}' 9962743 (9.96M/s) Notes that the max_entries for the map-in-map LRU test is 1260000 which is the max_entries for each inner LRU map. 8 processes have been started, so 8 * 1260000 = 10080000 (~10M) which is close to what is used in the BPF_F_NO_COMMON_LRU test. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Martin KaFai Lau	695ba2651a	bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4 After doing map_perf_test with a much bigger BPF_F_NO_COMMON_LRU map, the perf report shows a lot of time spent in rotating the inactive list (i.e. __bpf_lru_list_rotate_inactive): > map_perf_test 32 8 10000 1000000 \| awk '{sum += $3}END{print sum}' 19644783 (19M/s) > map_perf_test 32 8 10000000 10000000 \| awk '{sum += $3}END{print sum}' 6283930 (6.28M/s) By inactive, it usually means the element is not in cache. Hence, there is a need to tune the PERCPU_NR_SCANS value. This patch finds a better number of elements to scan during each list rotation. The PERCPU_NR_SCANS (which is defined the same as PERCPU_FREE_TARGET) decreases from 16 elements to 4 elements. This change only affects the BPF_F_NO_COMMON_LRU map. The test_lru_dist does not show meaningful difference between 16 and 4. Our production L4 load balancer which uses the LRU map for conntrack-ing also shows little change in cache hit rate. Since both benchmark and production data show no cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4. We can consider making it configurable if we find a usecase later that shows another value works better and/or use a different rotation strategy. After this change: > map_perf_test 32 8 10000000 10000000 \| awk '{sum += $3}END{print sum}' 9240324 (9.2M/s) i.e. 6.28M/s -> 9.2M/s The test_lru_dist has not shown meaningful difference: > test_lru_dist zipf.100k.a1_01.out 4000 1: nr_misses: 31575 (Before) vs 31566 (After) > test_lru_dist zipf.100k.a0_01.out 40000 1 nr_misses: 67036 (Before) vs 67031 (After) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Martin KaFai Lau	9fd63d05f3	bpf: Allow bpf sample programs (_user.c) to change bpf_map_def The current bpf_map_def is statically defined during compile time. This patch allows the _user.c program to change it during runtime. It is done by adding load_bpf_file_fixup_map() which takes a callback. The callback will be called before creating each map so that it has a chance to modify the bpf_map_def. The current usecase is to change max_entries in map_perf_test. It is interesting to test with a much bigger map size in some cases (e.g. the following patch on bpf_lru_map.c). However, it is hard to find one size to fit all testing environment. Hence, it is handy to take the max_entries as a cmdline arg and then configure the bpf_map_def during runtime. This patch adds two cmdline args. One is to configure the map's max_entries. Another is to configure the max_cnt which controls how many times a syscall is called. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Martin KaFai Lau	bf8db5d243	bpf: lru: Refactor LRU map tests in map_perf_test One more LRU test will be added later in this patch series. In this patch, we first move all existing LRU map tests into a single syscall (connect) first so that the future new LRU test can be added without hunting another syscall. One of the map name is also changed from percpu_lru_hash_map to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Martin KaFai Lau	6467acbc70	bpf: lru: Cleanup test_lru_map.c This patch does the following cleanup on test_lru_map.c 1) Fix indentation (Replace spaces by tabs) 2) Remove redundant BPF_F_NO_COMMON_LRU test 3) Simplify some comments Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Martin KaFai Lau	9746f85686	bpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU test_lru_sanity3 is not applicable to BPF_F_NO_COMMON_LRU. It just happens to work when PERCPU_FREE_TARGET == 16. This patch: 1) Disable test_lru_sanity3 for BPF_F_NO_COMMON_LRU 2) Add test_lru_sanity6 to test list rotation for the BPF_F_NO_COMMON_LRU map. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Jisheng Zhang	82960fff09	net: mvneta: fix failed to suspend if WOL is enabled Recently, suspend/resume and WOL support are added into mvneta driver. If we enable WOL, then we get some error as below on Marvell BG4CT platforms during suspend: [ 184.149723] dpm_run_callback(): mdio_bus_suspend+0x0/0x50 returns -16 [ 184.149727] PM: Device f7b62004.mdio-mi:00 failed to suspend: error -16 -16 means -EBUSY, phy_suspend() will return -EBUSY if it finds the device has WOL enabled. We fix this issue by properly setting the netdev's power.can_wakeup and power.wakeup, i.e 1. in mvneta_mdio_probe(), call device_set_wakeup_capable() to set power.can_wakeup if the phy support WOL. 2. in mvneta_ethtool_set_wol(), call device_set_wakeup_enable() to set power.wakeup if WOL has been successfully enabled in phy. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:46:35 -04:00
Nikolay Aleksandrov	cab93af0ed	net: bridge: notify on hw fdb takeover Recently we added support for SW fdbs to take over HW ones, but that results in changing a user-visible fdb flag thus we need to send a notification, also it's consistent with how HW takes over SW entries. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:45:34 -04:00
David S. Miller	71947f0f91	Merge branch 'mediatek-tx-bugs' Sean Wang says: ==================== mediatek: Fix crash caused by reporting inconsistent skb->len to BQL Changes since v1: - fix inconsistent enumeration which easily causes the potential bug The series fixes kernel BUG caused by inconsistent SKB length reported into BQL. The reason for inconsistent length comes from hardware BUG which results in different port number carried on the TXD within the lifecycle of SKB. So patch 2) is proposed for use a software way to track which port the SKB involving instead of hardware way. And patch 1) is given for another issue I found which causes TXD and SKB inconsistency that is not expected in the initial logic, so it is also being corrected it in the series. The log for the kernel BUG caused by the issue is posted as below. [ 120.825955] kernel BUG at ... lib/dynamic_queue_limits.c:26! [ 120.837684] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 120.842778] Modules linked in: [ 120.845811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-191576-gdbcef47 #35 [ 120.853488] Hardware name: Mediatek Cortex-A7 (Device Tree) [ 120.859012] task: c1007480 task.stack: c1000000 [ 120.863510] PC is at dql_completed+0x108/0x17c [ 120.867915] LR is at 0x46 [ 120.870512] pc : [<c03c19c8>] lr : [<00000046>] psr: 80000113 [ 120.870512] sp : c1001d58 ip : c1001d80 fp : c1001d7c [ 120.881895] r10: 0000003e r9 : df6b3400 r8 : 0ed86506 [ 120.887075] r7 : 00000001 r6 : 00000001 r5 : 0ed8654c r4 : df0135d8 [ 120.893546] r3 : 00000001 r2 : df016800 r1 : 0000fece r0 : df6b3480 [ 120.900018] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 120.907093] Control: 10c5387d Table: 9e27806a DAC: 00000051 [ 120.912789] Process swapper/0 (pid: 0, stack limit = 0xc1000218) [ 120.918744] Stack: (0xc1001d58 to 0xc1002000) .... 121.085331] 1fc0: 00000000 c0a52a28 00000000 c10855d4 c1003c58 c0a52a24 c100885c 8000406a [ 121.093444] 1fe0: 410fc073 00000000 00000000 c1001ff8 8000807c c0a009cc 00000000 00000000 [ 121.101575] [<c03c19c8>] (dql_completed) from [<c04cb010>] (mtk_napi_tx+0x1d0/0x37c) [ 121.109263] [<c04cb010>] (mtk_napi_tx) from [<c05e28cc>] (net_rx_action+0x24c/0x3b8) [ 121.116951] [<c05e28cc>] (net_rx_action) from [<c010152c>] (__do_softirq+0xe4/0x35c) [ 121.124638] [<c010152c>] (__do_softirq) from [<c012a624>] (irq_exit+0xe8/0x150) [ 121.131895] [<c012a624>] (irq_exit) from [<c017750c>] (__handle_domain_irq+0x70/0xc4) [ 121.139666] [<c017750c>] (__handle_domain_irq) from [<c0101404>] (gic_handle_irq+0x58/0x9c) [ 121.147953] [<c0101404>] (gic_handle_irq) from [<c010e18c>] (__irq_svc+0x6c/0x90) [ 121.155373] Exception stack(0xc1001ef8 to 0xc1001f40) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:33:59 -04:00
Sean Wang	134d21525f	net: ethernet: mediatek: fix inconsistency of port number carried in TXD Fix port inconsistency on TXD due to hardware BUG that would cause different port number is carried on the same TXD between tx_map() and tx_unmap() with the iperf test. It would cause confusing BQL logic which leads to kernel panic when dual GMAC runs concurrently. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:33:58 -04:00
Sean Wang	81d2dd09ca	net: ethernet: mediatek: fix inconsistency between TXD and the used buffer Fix inconsistency between the TXD descriptor and the used buffer that would cause unexpected logic at mtk_tx_unmap() during skb housekeeping. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:33:58 -04:00
Grygorii Strashko	bfe7244257	net: phy: micrel: fix crash when statistic requested for KSZ9031 phy Now the command: ethtool --phy-statistics eth0 will cause system crash with meassage "Unable to handle kernel NULL pointer dereference at virtual address 00000010" from: (kszphy_get_stats) from [<c069f1d8>] (ethtool_get_phy_stats+0xd8/0x210) (ethtool_get_phy_stats) from [<c06a0738>] (dev_ethtool+0x5b8/0x228c) (dev_ethtool) from [<c06b5484>] (dev_ioctl+0x3fc/0x964) (dev_ioctl) from [<c0679f7c>] (sock_ioctl+0x170/0x2c0) (sock_ioctl) from [<c02419d4>] (do_vfs_ioctl+0xa8/0x95c) (do_vfs_ioctl) from [<c02422c4>] (SyS_ioctl+0x3c/0x64) (SyS_ioctl) from [<c0107d60>] (ret_fast_syscall+0x0/0x44) The reason: phy_driver structure for KSZ9031 phy has no .probe() callback defined. As result, struct phy_device *phydev->priv pointer will not be initializes (null). This issue will affect also following phys: KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737 Fix it by: - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021 phys. The kszphy_probe() can be re-used as it doesn't do any phy specific settings. - removing statistic callbacks from other phys (KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding statistic counters. Fixes: `2b2427d064` ("phy: micrel: Add ethtool statistics counters") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:29:49 -04:00
WANG Cong	f5001ceab8	kcm: remove a useless copy_from_user() struct kcm_clone only contains fd, and kcm_clone() only writes this struct, so there is no need to copy it from user. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:28:48 -04:00
David Ahern	426c87caa2	net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule Only need 1 l3mdev FIB rule. Fix setting NLM_F_EXCL in the nlmsghdr. Fixes: `1aa6c4f6b8` ("net: vrf: Add l3mdev rules on first device create") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:27:54 -04:00
Jiri Pirko	6b2af241f0	MAINTAINERS: rename TC entry and add couple of header files The section is not specific only to "TC classifiers", but applies to the whole TC subsystem. Also, add couple of forgotten headers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:26:21 -04:00
Russell King	786df9c2a4	net: phy: simplify phy_supported_speeds() Simplify the loop in phy_supported_speeds(). Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:25:07 -04:00
Russell King	d06130377c	net: phy: improve phylib correctness for non-autoneg settings phylib has some undesirable behaviour when forcing a link mode through ethtool. phylib uses this code: idx = phy_find_valid(phy_find_setting(phydev->speed, phydev->duplex), features); to find an index in the settings table. phy_find_setting() starts at index 0, and scans upwards looking for an exact speed and duplex match. When it doesn't find it, it returns MAX_NUM_SETTINGS - 1, which is 10baseT-Half duplex. phy_find_valid() then scans from the point (and effectively only checks one entry) before bailing out, returning MAX_NUM_SETTINGS - 1. phy_sanitize_settings() then sets ->speed to SPEED_10 and ->duplex to DUPLEX_HALF whether or not 10baseT-Half is supported or not. This goes against all the comments against these functions, and 10baseT-Half may not even be supported by the hardware. Rework these functions, introducing a new method of scanning the table. There are two modes of lookup that phylib wants: exact, and inexact. - in exact mode, we return either an exact match or failure - in inexact mode, we return an exact match if it exists, a match at the highest speed that is not greater than the requested speed (ignoring duplex), or failing that, the lowest supported speed, or failure. The biggest difference is that we always check whether the entry is supported before further consideration, so all unsupported entries are not considered as candidates. This results in arguably saner behaviour, better matches the comments, and is probably what users would expect. This becomes important as ethernet speeds increase, PHYs exist which do not support the 10Mbit speeds, and half-duplex is likely to become obsolete - it's already not even an option on 10Gbit and faster links. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:25:07 -04:00
stephen hemminger	8ea3e43911	Subject: net: allow configuring default qdisc Since 3.12 it has been possible to configure the default queuing discipline via sysctl. This patch adds ability to configure the default queue discipline in kernel configuration. This is useful for environments where configuring the value from userspace is difficult to manage. The default is still the same as before (pfifo_fast) and it is possible to change after kernel init with sysctl. This is similar to how TCP congestion control works. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:23:06 -04:00
David S. Miller	7ca9511813	Merge branch 'qed-arfs' Manish Chopra says: ==================== qed/qede: aRFS support This series adds support for Accelerated Flow Steering in qede driver for TCP/UDP over IPv4/IPv6 protocols. Please consider applying this series to "net-next" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:06:19 -04:00
Chopra, Manish	e4917d46a6	qede: Add aRFS support This patch adds support for aRFS for TCP and UDP protocols with IPv4/IPv6. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:06:18 -04:00
Chopra, Manish	d51e4af5c2	qed: aRFS infrastructure support This patch adds necessary APIs to interface with qede aRFS support in successive patch. It also reserves separate PTT entry for aRFS, [as being in fastpath flow] for hardware access instead of trying to acquire it at run time from the ptt pool. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:06:18 -04:00
Martin Wetterwald	53a759c89b	smsc95xx: Add comments to the registers definition This chip is used by a lot of embedded devices and also by the Raspberry Pi 1, 2 & 3 which were created to promote the study of computer sciences. Students wanting to learn kernel / network device driver programming through those devices can only rely on the Linux kernel driver source to make their own. This commit adds a lot of comments to the registers definition to expand the register names. Cc: Steve Glendinning <steve.glendinning@shawell.net> Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com> CC: David Miller <davem@davemloft.net> Signed-off-by: Martin Wetterwald <martin@wetterwald.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Steve Glendinning <steve.glendinning@shawell.net> Acked-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:04:52 -04:00
George Cherian	b47a57a273	net: thunderx: Fix set_max_bgx_per_node for 81xx rgx Add the PCI_SUBSYS_DEVID_81XX_RGX and use the same to set the max bgx per node count. This fixes the issue intoduced by following commit `78aacb6f6` net: thunderx: Fix invalid mac addresses for node1 interfaces With this commit the max_bgx_per_node for 81xx is set as 2 instead of 3 because of which num_vfs is always calculated as zero. Signed-off-by: George Cherian <george.cherian@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:03:48 -04:00

1 2 3 4 5 ...

664978 Commits All Branches Search

664978 Commits

All Branches