Commit Graph

533960 Commits

Author SHA1 Message Date
Claudiu Manoil 614b42426c gianfar: Fix suspend/resume for wol magic packet
If we disable NAPI in the first place we can mask the device's
interrupts (and halt it) without fearing that imask may be
concurrently accessed from interrupt context, so there's
no need to do local_irq_save() around gfar_halt_nodisable().
lock_rx_qs()/unlock_tx_qs() are just obsoleted and potentially
buggy routines.  The txlock is currently used in the driver only
to manage TX congestion, it has nothing to do with halting the
device.  With these changes, the TX processing is stopped before
gfar_halt().

Compact gfar_halt() is used instead of gfar_halt_nodisable(),
as it disables Rx/TX DMA h/w blocks and the Rx/TX h/w queues.
gfar_start() re-enables all these blocks on resume.  Enabling
the magic-packet mode remains the same, note that the RX block
is re-enabled just before entering sleep mode.

Add IRQF_NO_SUSPEND flag for the error interrupt line, to signal
that the interrupt line must remain active during sleep in order
to wake the system by magic packet (MAG) reception interrupt.
(On some systems the MAG interrupt did trigger w/o this flag
as well, but on others it didn't.)

Without these fixes, when suspended during fair Tx traffic the
interface occasionally failed to be woken up by magic packet.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31 15:41:49 -07:00
Claudiu Manoil 8486830549 gianfar: Fix warning when CONFIG_PM off
CC      drivers/net/ethernet/freescale/gianfar.o
drivers/net/ethernet/freescale/gianfar.c:568:13: warning: 'lock_tx_qs'
defined but not used [-Wunused-function]
 static void lock_tx_qs(struct gfar_private *priv)
             ^
drivers/net/ethernet/freescale/gianfar.c:576:13: warning: 'unlock_tx_qs'
defined but not used [-Wunused-function]
 static void unlock_tx_qs(struct gfar_private *priv)
             ^

Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31 15:41:49 -07:00
WANG Cong 5175f7106c act_pedit: check binding before calling tcf_hash_release()
When we share an action within a filter, the bind refcnt
should increase, therefore we should not call tcf_hash_release().

Fixes: 1a29321ed0 ("net_sched: act: Dont increment refcnt on replace")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31 15:22:34 -07:00
Sowmini Varadhan 8a68173691 net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
The newsk returned by sk_clone_lock should hold a get_net()
reference if, and only if, the parent is not a kernel socket
(making this similar to sk_alloc()).

E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock
sets up the syn_recv newsk from sk_clone_lock. When the parent (listen)
socket is a kernel socket (defined in sk_alloc() as having
sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt
and should not hold a get_net() reference.

Fixes: 26abe14379 ("net: Modify sk_alloc to not reference count the
      netns of kernel sockets.")
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30 15:59:12 -07:00
Daniel Borkmann 28e6b67f0b net: sched: fix refcount imbalance in actions
Since commit 55334a5db5 ("net_sched: act: refuse to remove bound action
outside"), we end up with a wrong reference count for a tc action.

Test case 1:

  FOO="1,6 0 0 4294967295,"
  BAR="1,6 0 0 4294967294,"
  tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \
     action bpf bytecode "$FOO"
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 1 bind 1
  tc actions replace action bpf bytecode "$BAR" index 1
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
    index 1 ref 2 bind 1
  tc actions replace action bpf bytecode "$FOO" index 1
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 3 bind 1

Test case 2:

  FOO="1,6 0 0 4294967295,"
  tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
  tc actions show action gact
    action order 0: gact action pass
    random type none pass val 0
     index 1 ref 1 bind 1
  tc actions add action drop index 1
    RTNETLINK answers: File exists [...]
  tc actions show action gact
    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 2 bind 1
  tc actions add action drop index 1
    RTNETLINK answers: File exists [...]
  tc actions show action gact
    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 3 bind 1

What happens is that in tcf_hash_check(), we check tcf_common for a given
index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've
found an existing action. Now there are the following cases:

  1) We do a late binding of an action. In that case, we leave the
     tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init()
     handler. This is correctly handeled.

  2) We replace the given action, or we try to add one without replacing
     and find out that the action at a specific index already exists
     (thus, we go out with error in that case).

In case of 2), we have to undo the reference count increase from
tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to
do so because of the 'tcfc_bindcnt > 0' check which bails out early with
an -EPERM error.

Now, while commit 55334a5db5 prevents 'tc actions del action ...' on an
already classifier-bound action to drop the reference count (which could
then become negative, wrap around etc), this restriction only accounts for
invocations outside a specific action's ->init() handler.

One possible solution would be to add a flag thus we possibly trigger
the -EPERM ony in situations where it is indeed relevant.

After the patch, above test cases have correct reference count again.

Fixes: 55334a5db5 ("net_sched: act: refuse to remove bound action outside")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30 14:20:39 -07:00
David S. Miller 990c9b3472 Merge branch 'r8152-fixes'
Hayes Wang says:

====================
r8152: device reset

v3:
For patch #2, remove cancel_delayed_work().

v2:
For patch #1, remove usb_autopm_get_interface(), usb_autopm_put_interface(), and
the checking of intf->condition.

For patch #2, replace the original method with usb_queue_reset_device() to reset
the device.

v1:
Although the driver works normally, we find the device may get all 0xff data when
transmitting packets on certain platforms. It would break the device and no packet
could be transmitted. The reset is necessary to recover the hw for this situation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30 14:03:46 -07:00
hayeswang 37608f3e57 r8152: reset device when tx timeout
The device reset is necessary if the hw becomes abnormal and stops
transmitting packets.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30 14:03:46 -07:00
hayeswang e501139a51 r8152: add pre_reset and post_reset
Add rtl8152_pre_reset() and rtl8152_post_reset() which are used when
calling usb_reset_device(). The two functions could reduce the time
of reset when calling usb_reset_device() after probe().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30 14:03:46 -07:00
Shahed Shaikh 15f1bb1f1e qlcnic: Fix corruption while copying
Use proper typecasting while performing byte-by-byte copy

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:57:26 -07:00
Daniel Borkmann f4eaed28c7 act_bpf: fix memory leaks when replacing bpf programs
We currently trigger multiple memory leaks when replacing bpf
actions, besides others:

  comm "tc", pid 1909, jiffies 4294851310 (age 1602.796s)
  hex dump (first 32 bytes):
    01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................
    18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00  ...m............
  backtrace:
    [<ffffffff817e623e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff8120a22d>] __vmalloc_node_range+0x1bd/0x2c0
    [<ffffffff8120a37a>] __vmalloc+0x4a/0x50
    [<ffffffff811a8d0a>] bpf_prog_alloc+0x3a/0xa0
    [<ffffffff816c0684>] bpf_prog_create+0x44/0xa0
    [<ffffffffa09ba4eb>] tcf_bpf_init+0x28b/0x3c0 [act_bpf]
    [<ffffffff816d7001>] tcf_action_init_1+0x191/0x1b0
    [<ffffffff816d70a2>] tcf_action_init+0x82/0xf0
    [<ffffffff816d4d12>] tcf_exts_validate+0xb2/0xc0
    [<ffffffffa09b5838>] cls_bpf_modify_existing+0x98/0x340 [cls_bpf]
    [<ffffffffa09b5cd6>] cls_bpf_change+0x1a6/0x274 [cls_bpf]
    [<ffffffff816d56e5>] tc_ctl_tfilter+0x335/0x910
    [<ffffffff816b9145>] rtnetlink_rcv_msg+0x95/0x240
    [<ffffffff816df34f>] netlink_rcv_skb+0xaf/0xc0
    [<ffffffff816b909e>] rtnetlink_rcv+0x2e/0x40
    [<ffffffff816deaaf>] netlink_unicast+0xef/0x1b0

Issue is that the old content from tcf_bpf is allocated and needs
to be released when we replace it. We seem to do that since the
beginning of act_bpf on the filter and insns, later on the name as
well.

Example test case, after patch:

  # FOO="1,6 0 0 4294967295,"
  # BAR="1,6 0 0 4294967294,"
  # tc actions add action bpf bytecode "$FOO" index 2
  # tc actions show action bpf
   action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
   index 2 ref 1 bind 0
  # tc actions replace action bpf bytecode "$BAR" index 2
  # tc actions show action bpf
   action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
   index 2 ref 1 bind 0
  # tc actions replace action bpf bytecode "$FOO" index 2
  # tc actions show action bpf
   action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
   index 2 ref 1 bind 0
  # tc actions del action bpf index 2
  [...]
  # echo "scan" > /sys/kernel/debug/kmemleak
  # cat /sys/kernel/debug/kmemleak | grep "comm \"tc\"" | wc -l
  0

Fixes: d23b8ad8ab ("tc: add BPF based action")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:56:22 -07:00
David S. Miller f68b1231c4 Merge branch 'thunderx-fixes'
Aleksey Makarov says:

====================
net: thunderx: Misc fixes

Miscellaneous fixes for the ThunderX VNIC driver

All the patches can be applied individually.
It's ok to drop some if the maintainer feels uncomfortable
with applying for 4.2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:32 -07:00
Thanneeru Srinivasulu 60f83c8987 net: thunderx: Fix for crash while BGX teardown
Cortina phy does not have kernel driver and we don't attach
device with phy layer for intefaces like XFI, XLAUI etc,
Hence check for interface type before calling disconnect.

Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:32 -07:00
Sunil Goutham 4adf435114 net: thunderx: Add PCI driver shutdown routine
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:32 -07:00
Sunil Goutham b49087dd0f net: thunderx: Fix crash when changing rss with mutliple traffic flows
This fixes a crash when changing rss with multiple traffic flows.

While interface teardown, disable tx queues after all NAPI threads
are done. If done otherwise tx queues might be woken up inside NAPI
if any CQE_TX are processed.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:32 -07:00
Sunil Goutham 3d7a8aaad8 net: thunderx: Set watchdog timeout value
If a txq (SQ) remains in stopped state after this timeout its
considered as stuck and interface is reinited.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:32 -07:00
Sunil Goutham 74840b83bd net: thunderx: Wakeup TXQ only if CQE_TX are processed
Previously TXQ is wakedup whenever napi is executed
and irrespective of if any CQE_TX are processed or not.
Added 'txq_stop' and 'txq_wake' counters to aid in debugging
if there are any future issues.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:31 -07:00
Sunil Goutham f8ce9666fa net: thunderx: Suppress alloc_pages() failure warnings
Suppressing standard alloc_pages() warnings. Some kernel configs limit
alloc size and the network driver may fail. Do not drop a kernel
warning in this case, instead just drop a oneliner that the network
driver could not be loaded since the buffer could not be allocated.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:31 -07:00
Sunil Goutham 2cb468e01e net: thunderx: Fix TSO packet statistic
Fixing TSO packages not being counted.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:31 -07:00
Sunil Goutham c62cd3c451 net: thunderx: Fix memory leak when changing queue count
Fix for memory leak when changing queue/channel count via ethtool

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:31 -07:00
Sunil Goutham 32c1b965f4 net: thunderx: Fix RQ_DROP miscalculation
With earlier configured value sufficient number of CQEs are not
being reserved for transmitted packets. Hence under heavy incoming
traffic load, receive notifications will take away most of the CQ
thus transmit notifications will be lost resulting in tx skbs not
being freed.

Finally SQ will be full and it will be stopped, watchdog timer
will kick in. After this fix receive notifications will not take
morethan half of CQ reserving the rest for transmit notifications.

Also changed CQ & SQ sizes from 16k to 4k.
This is also due to the receive notifications taking first half of
CQ under heavy load and time taken by NAPI to clear transmit notifications
will increase with higher queue sizes. Again results in SQ being stopped.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:31 -07:00
Sunil Goutham 143ceb0b8a net: thunderx: Fix memory leak while tearing down interface
Fixed 'tso_hdrs' memory not being freed properly.
Also fixed SQ skbuff maintenance issues.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:30 -07:00
Sunil Goutham 4b561c17d9 net: thunderx: Fix data integrity issues with LDWB
Switching back to LDD transactions from LDWB.

While transmitting packets out with LDWB transactions
data integrity issues are seen very frequently.
hence switching back to LDD.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:52:30 -07:00
Eric Dumazet c8507fb235 ipv6: flush nd cache on IFF_NOARP change
This patch is the IPv6 equivalent of commit
6c8b4e3ff8 ("arp: flush arp cache on IFF_NOARP change")

Without it, we keep buggy neighbours in the cache, with destination
MAC address equal to our own MAC address.

Tested:
 tcpdump -i eth0 -s 0 ip6 -n -e &
 ip link set dev eth0 arp off
 ping6 remote   // sends buggy frames
 ip link set dev eth0 arp on
 ping6 remote   // should work once kernel is patched

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mario Fanelli <mariofanelli@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 23:01:39 -07:00
David S. Miller b2428f94f4 Merge branch 'netcp-fixes'
Murali Karicheri says:

====================
net: netcp: bug fixes for dynamic module support

This series fixes few bugs to allow keystone netcp modules to be
dynamically loaded and removed. Currently it allows following
sequence multiple times

 insmod cpsw_ale.ko
 insmod davinci_mdio.ko
 insmod keystone_netcp.ko
 insmod keystone_netcp_ethss.ko
 ifup eth0
 ifup eth1
 ping <hosts on eth0>
 ping <hosts on eth1>
 ifdown eth1
 ifdown eth0
 rmmod keystone_netcp_ethss.ko
 rmmod keystone_netcp.ko
 rmmod davinci_mdio.ko
 rmmod cpsw_ale.ko
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 18:37:41 -07:00
Karicheri, Muralidharan 31a184b7ac net: netcp: ethss: cleanup gbe_probe() and gbe_remove() functions
This patch clean up error handle code to use goto label properly. In some
cases, the code unnecessarily use goto instead of just returning the error
code.  Code also make explicit calls to devm_* APIs on error which is
not necessary. In the gbe_remove() also it makes similar calls which is
also unnecessary.

Also fix few checkpatch warnings

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 18:37:41 -07:00
Karicheri, Muralidharan c20afae75c net: netcp: ethss: fix up incorrect use of list api
The code seems to assume a null is returned when the list is empty
from first_sec_slave() to break the loop which is incorrect. Fix the
code by using list_empty().

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 18:37:40 -07:00
Karicheri, Muralidharan 01a030996e net: netcp: fix cleanup interface list in netcp_remove()
Currently if user do rmmod keystone_netcp.ko following warning is
seen :-

[   59.035891] ------------[ cut here ]------------
[   59.040535] WARNING: CPU: 2 PID: 1619 at drivers/net/ethernet/ti/
netcp_core.c:2127 netcp_remove)

This is because the interface list is not cleaned up in netcp_remove.
This patch fixes this. Also fix some checkpatch related warnings.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 18:37:40 -07:00
Daniel Borkmann 2482abb93e ebpf, x86: fix general protection fault when tail call is invoked
With eBPF JIT compiler enabled on x86_64, I was able to reliably trigger
the following general protection fault out of an eBPF program with a simple
tail call, f.e. tracex5 (or a stripped down version of it):

  [  927.097918] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
  [...]
  [  927.100870] task: ffff8801f228b780 ti: ffff880016a64000 task.ti: ffff880016a64000
  [  927.102096] RIP: 0010:[<ffffffffa002440d>]  [<ffffffffa002440d>] 0xffffffffa002440d
  [  927.103390] RSP: 0018:ffff880016a67a68  EFLAGS: 00010006
  [  927.104683] RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000000 RCX: 0000000000000001
  [  927.105921] RDX: 0000000000000000 RSI: ffff88014e438000 RDI: ffff880016a67e00
  [  927.107137] RBP: ffff880016a67c90 R08: 0000000000000000 R09: 0000000000000001
  [  927.108351] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880016a67e00
  [  927.109567] R13: 0000000000000000 R14: ffff88026500e460 R15: ffff880220a81520
  [  927.110787] FS:  00007fe7d5c1f740(0000) GS:ffff880265000000(0000) knlGS:0000000000000000
  [  927.112021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  927.113255] CR2: 0000003e7bbb91a0 CR3: 000000006e04b000 CR4: 00000000001407e0
  [  927.114500] Stack:
  [  927.115737]  ffffc90008cdb000 ffff880016a67e00 ffff88026500e460 ffff880220a81520
  [  927.117005]  0000000100000000 000000000000001b ffff880016a67aa8 ffffffff8106c548
  [  927.118276]  00007ffcdaf22e58 0000000000000000 0000000000000000 ffff880016a67ff0
  [  927.119543] Call Trace:
  [  927.120797]  [<ffffffff8106c548>] ? lookup_address+0x28/0x30
  [  927.122058]  [<ffffffff8113d176>] ? __module_text_address+0x16/0x70
  [  927.123314]  [<ffffffff8117bf0e>] ? is_ftrace_trampoline+0x3e/0x70
  [  927.124562]  [<ffffffff810c1a0f>] ? __kernel_text_address+0x5f/0x80
  [  927.125806]  [<ffffffff8102086f>] ? print_context_stack+0x7f/0xf0
  [  927.127033]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
  [  927.128254]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
  [  927.129461]  [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
  [  927.130654]  [<ffffffff8119ee4a>] trace_call_bpf+0x8a/0x140
  [  927.131837]  [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
  [  927.133015]  [<ffffffff8119f008>] kprobe_perf_func+0x28/0x220
  [  927.134195]  [<ffffffff811a1668>] kprobe_dispatcher+0x38/0x60
  [  927.135367]  [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
  [  927.136523]  [<ffffffff81061400>] kprobe_ftrace_handler+0xf0/0x150
  [  927.137666]  [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
  [  927.138802]  [<ffffffff8117950c>] ftrace_ops_recurs_func+0x5c/0xb0
  [  927.139934]  [<ffffffffa022b0d5>] 0xffffffffa022b0d5
  [  927.141066]  [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
  [  927.142199]  [<ffffffff81174b95>] seccomp_phase1+0x5/0x230
  [  927.143323]  [<ffffffff8102c0a4>] syscall_trace_enter_phase1+0xc4/0x150
  [  927.144450]  [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
  [  927.145572]  [<ffffffff8102c0a4>] ? syscall_trace_enter_phase1+0xc4/0x150
  [  927.146666]  [<ffffffff817f9a9f>] tracesys+0xd/0x44
  [  927.147723] Code: 48 8b 46 10 48 39 d0 76 2c 8b 85 fc fd ff ff 83 f8 20 77 21 83
                       c0 01 89 85 fc fd ff ff 48 8d 44 d6 80 48 8b 00 48 83 f8 00 74
                       0a <48> 8b 40 20 48 83 c0 33 ff e0 48 89 d8 48 8b 9d d8 fd ff
                       ff 4c
  [  927.150046] RIP  [<ffffffffa002440d>] 0xffffffffa002440d

The code section with the instructions that traps points into the eBPF JIT
image of the root program (the one invoking the tail call instruction).

Using bpf_jit_disasm -o on the eBPF root program image:

  [...]
  4e:   mov    -0x204(%rbp),%eax
        8b 85 fc fd ff ff
  54:   cmp    $0x20,%eax               <--- if (tail_call_cnt > MAX_TAIL_CALL_CNT)
        83 f8 20
  57:   ja     0x000000000000007a
        77 21
  59:   add    $0x1,%eax                <--- tail_call_cnt++
        83 c0 01
  5c:   mov    %eax,-0x204(%rbp)
        89 85 fc fd ff ff
  62:   lea    -0x80(%rsi,%rdx,8),%rax  <--- prog = array->prog[index]
        48 8d 44 d6 80
  67:   mov    (%rax),%rax
        48 8b 00
  6a:   cmp    $0x0,%rax                <--- check for NULL
        48 83 f8 00
  6e:   je     0x000000000000007a
        74 0a
  70:   mov    0x20(%rax),%rax          <--- GPF triggered here! fetch of bpf_func
        48 8b 40 20                              [ matches <48> 8b 40 20 ... from above ]
  74:   add    $0x33,%rax               <--- prologue skip of new prog
        48 83 c0 33
  78:   jmpq   *%rax                    <--- jump to new prog insns
        ff e0
  [...]

The problem is that rax has 5a5a5a5a5a5a5a5a, which suggests a tail call
jump to map slot 0 is pointing to a poisoned page. The issue is the following:

lea instruction has a wrong offset, i.e. it should be ...

  lea    0x80(%rsi,%rdx,8),%rax

... but it actually seems to be ...

  lea   -0x80(%rsi,%rdx,8),%rax

... where 0x80 is offsetof(struct bpf_array, prog), thus the offset needs
to be positive instead of negative. Disassembling the interpreter, we btw
similarly do:

  [...]
  c88:  lea     0x80(%rax,%rdx,8),%rax  <--- prog = array->prog[index]
        48 8d 84 d0 80 00 00 00
  c90:  add     $0x1,%r13d
        41 83 c5 01
  c94:  mov     (%rax),%rax
        48 8b 00
  [...]

Now the other interesting fact is that this panic triggers only when things
like CONFIG_LOCKDEP are being used. In that case offsetof(struct bpf_array,
prog) starts at offset 0x80 and in non-CONFIG_LOCKDEP case at offset 0x50.
Reason is that the work_struct inside struct bpf_map grows by 48 bytes in my
case due to the lockdep_map member (which also has CONFIG_LOCK_STAT enabled
members).

Changing the emitter to always use the 4 byte displacement in the lea
instruction fixes the panic on my side. It increases the tail call instruction
emission by 3 more byte, but it should cover us from various combinations
(and perhaps other future increases on related structures).

After patch, disassembly:

  [...]
  9e:   lea    0x80(%rsi,%rdx,8),%rax   <--- CONFIG_LOCKDEP/CONFIG_LOCK_STAT
        48 8d 84 d6 80 00 00 00
  a6:   mov    (%rax),%rax
        48 8b 00
  [...]

  [...]
  9e:   lea    0x50(%rsi,%rdx,8),%rax   <--- No CONFIG_LOCKDEP
        48 8d 84 d6 50 00 00 00
  a6:   mov    (%rax),%rax
        48 8b 00
  [...]

Fixes: b52f00e6a7 ("x86: bpf_jit: implement bpf_tail_call() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 17:02:19 -07:00
Nikolay Aleksandrov 7ae90a4f96 bridge: mdb: fix delmdb state in the notification
Since mdb states were introduced when deleting an entry the state was
left as it was set in the delete request from the user which leads to
the following output when doing a monitor (for example):
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 temp
^^^
Note the "temp" state in the delete notification which is wrong since
the entry was permanent, the state in a delete is always reported as
"temp" regardless of the real state of the entry.

After this patch:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent

There's one important note to make here that the state is actually not
matched when doing a delete, so one can delete a permanent entry by
stating "temp" in the end of the command, I've chosen this fix in order
not to break user-space tools which rely on this (incorrect) behaviour.

So to give an example after this patch and using the wrong state:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 temp
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent

Note the state of the entry that got deleted is correct in the
notification.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: ccb1c31a7a ("bridge: add flags to distinguish permanent mdb entires")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 15:02:30 -07:00
Satish Ashok 544586f742 bridge: mcast: give fast leave precedence over multicast router and querier
When fast leave is configured on a bridge port and an IGMP leave is
received for a group, the group is not deleted immediately if there is
a router detected or if multicast querier is configured.
Ideally the group should be deleted immediately when fast leave is
configured.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 14:57:05 -07:00
Toshiaki Makita df356d5e81 bridge: Fix network header pointer for vlan tagged packets
There are several devices that can receive vlan tagged packets with
CHECKSUM_PARTIAL like tap, possibly veth and xennet.
When (multiple) vlan tagged packets with CHECKSUM_PARTIAL are forwarded
by bridge to a device with the IP_CSUM feature, they end up with checksum
error because before entering bridge, the network header is set to
ETH_HLEN (not including vlan header length) in __netif_receive_skb_core(),
get_rps_cpu(), or drivers' rx functions, and nobody fixes the pointer later.

Since the network header is exepected to be ETH_HLEN in flow-dissection
and hash-calculation in RPS in rx path, and since the header pointer fix
is needed only in tx path, set the appropriate network header on forwarding
packets.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 12:20:16 -07:00
Alexander Drozdov dbd46ab412 packet: tpacket_snd(): fix signed/unsigned comparison
tpacket_fill_skb() can return a negative value (-errno) which
is stored in tp_len variable. In that case the following
condition will be (but shouldn't be) true:

tp_len > dev->mtu + dev->hard_header_len

as dev->mtu and dev->hard_header_len are both unsigned.

That may lead to just returning an incorrect EMSGSIZE errno
to the user.

Fixes: 52f1454f62 ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 00:09:58 -07:00
Eric Dumazet 11c91ef98f arp: filter NOARP neighbours for SIOCGARP
When arp is off on a device, and ioctl(SIOCGARP) is queried,
a buggy answer is given with MAC address of the device, instead
of the mac address of the destination/gateway.

We filter out NUD_NOARP neighbours for /proc/net/arp,
we must do the same for SIOCGARP ioctl.

Tested:

lpaa23:~# ./arp 10.246.7.190
MAC=00:01:e8:22:cb:1d      // correct answer

lpaa23:~# ip link set dev eth0 arp off
lpaa23:~# cat /proc/net/arp   # check arp table is now 'empty'
IP address       HW type     Flags       HW address    Mask     Device
lpaa23:~# ./arp 10.246.7.190
MAC=00:1a:11:c3:0d:7f   // buggy answer before patch (this is eth0 mac)

After patch :

lpaa23:~# ip link set dev eth0 arp off
lpaa23:~# ./arp 10.246.7.190
ioctl(SIOCGARP) failed: No such device or address

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Vytautas Valancius <valas@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-28 23:41:24 -07:00
David Ward 865b804244 net/ipv4: suppress NETDEV_UP notification on address lifetime update
This notification causes the FIB to be updated, which is not needed
because the address already exists, and more importantly it may undo
intentional changes that were made to the FIB after the address was
originally added. (As a point of comparison, when an address becomes
deprecated because its preferred lifetime expired, a notification on
this chain is not generated.)

The motivation for this commit is fixing an incompatibility between
DHCP clients which set and update the address lifetime according to
the lease, and a commercial VPN client which replaces kernel routes
in a way that outbound traffic is sent only through the tunnel (and
disconnects if any further route changes are detected via netlink).

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-28 23:38:13 -07:00
Nikolay Aleksandrov 76b91c32dd bridge: stp: when using userspace stp stop kernel hello and hold timers
These should be handled only by the respective STP which is in control.
They become problematic for devices with limited resources with many
ports because the hold_timer is per port and fires each second and the
hello timer fires each 2 seconds even though it's global. While in
user-space STP mode these timers are completely unnecessary so it's better
to keep them off.
Also ensure that when the bridge is up these timers are started only when
running with kernel STP.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-28 23:33:20 -07:00
Lars Westerhoff 158cd4af8d packet: missing dev_put() in packet_do_bind()
When binding a PF_PACKET socket, the use count of the bound interface is
always increased with dev_hold in dev_get_by_{index,name}.  However,
when rebound with the same protocol and device as in the previous bind
the use count of the interface was not decreased.  Ultimately, this
caused the deletion of the interface to fail with the following message:

unregister_netdevice: waiting for dummy0 to become free. Usage count = 1

This patch moves the dev_put out of the conditional part that was only
executed when either the protocol or device changed on a bind.

Fixes: 902fefb82e ('packet: improve socket create/bind latency in some cases')
Signed-off-by: Lars Westerhoff <lars.westerhoff@newtec.eu>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 15:38:58 -07:00
Ivan Vecera c5c62f1bb0 macvtap: fix network header pointer for VLAN tagged pkts
Network header is set with offset ETH_HLEN but it is not true for VLAN
(multiple-)tagged and results in checksum issues in lower devices.

v2: leave skb->protocol untouched (thx Vlad), comment added
v3: moved after skb_probe_transport_header() call (thx Toshiaki)

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 14:49:54 -07:00
Alexander Duyck 1513069edc fib_trie: Drop unnecessary calls to leaf_pull_suffix
It was reported that update_suffix was taking a long time on systems where
a large number of leaves were attached to a single node.  As it turns out
fib_table_flush was calling update_suffix for each leaf that didn't have all
of the aliases stripped from it.  As a result, on this large node removing
one leaf would result in us calling update_suffix for every other leaf on
the node.

The fix is to just remove the calls to leaf_pull_suffix since they are
redundant as we already have a call in resize that will go through and
update the suffix length for the node before we exit out of
fib_table_flush or fib_table_flush_external.

Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 14:29:11 -07:00
David S. Miller 7a6e0706c0 macb: Fix build with macro'ized readl/writel.
If an architecture defines readl/writel using CPP macros, we
get the following kinds of build failure:

> > > drivers/net/ethernet/cadence/macb.c:164:1: error: macro "writel"
> > > passed 3 arguments, but takes just 2
>      macb_or_gem_writel(bp, SA1B, bottom);
>     ^

Rename the methods so that this doesn't happen.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 14:24:48 -07:00
Andrew Lunn 8fff755e9f net: fec: Ensure clocks are enabled while using mdio bus
When a switch is attached to the mdio bus, the mdio bus can be used
while the interface is not open. If the IPG clock is not enabled, MDIO
reads/writes will simply time out.

Add support for runtime PM to control this clock. Enable/disable this
clock using runtime PM, with open()/close() and mdio read()/write()
function triggering runtime PM operations. Since PM is optional, the
IPG clock is enabled at probe and is no longer modified by
fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is
guaranteed the clock is running when MDIO operations are performed.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Cc: tyler.baker@linaro.org
Cc: fabio.estevam@freescale.com
Cc: shawn.guo@linaro.org
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Tested-by: Tyler Baker <tyler.baker@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:21:47 -07:00
WingMan Kwok 7025e88a79 net: netcp: Fixes SGMII reset on network interface shutdown
This patch asserts SGMII RTRESET, i.e. resetting the SGMII Tx/Rx
logic,  during network interface shutdown to avoid having the
hardware wedge when shutting down with high incoming traffic rates.
This is cleared (brought out of RTRESET) when the interface is
brought back up.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:14:26 -07:00
David S. Miller 54109da31f Merge branch 'macb-fixes'
Andy Shevchenko says:

====================
net/macb: fix for AVR32 and clean up

It seems no one had tested recently the driver on AVR32 platforms such as
ATNGW100. This series bring it back to work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:10:30 -07:00
Andy Shevchenko e018a0cce3 net/macb: convert to kernel doc
This patch coverts struct description to the kernel doc format. There is no
functional change.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:10:29 -07:00
Andy Shevchenko 94b295edc2 net/macb: replace macb_count_tx_descriptors() by DIV_ROUND_UP()
macb_count_tx_descriptors() repeats the generic macro DIV_ROUND_UP(). The patch
does a replacement.

There is no functional change.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:10:29 -07:00
Andy Shevchenko 8bcbf82f31 net/macb: suppress compiler warnings
This patch fixes the following warnings:
drivers/net/ethernet/cadence/macb.c: In function ‘macb_handle_link_change’:
drivers/net/ethernet/cadence/macb.c:266: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c:267: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c:291: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c: In function ‘gem_update_stats’:
drivers/net/ethernet/cadence/macb.c:1908: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c: In function ‘gem_get_ethtool_strings’:
drivers/net/ethernet/cadence/macb.c:1988: warning: comparison between signed and unsigned

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:10:29 -07:00
Andy Shevchenko a35919e174 net/macb: use dev_*() when netdev is not yet registered
To avoid messages like

macb macb.0 (unnamed net_device) (uninitialized): Cadence caps 0x00000000
macb macb.0 (unnamed net_device) (uninitialized): invalid hw address, using random

let's use dev_*() macros.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:10:29 -07:00
Andy Shevchenko f36dbe6a28 net/macb: check if macb_config present
The commit 98b5a0f4a2 introduces jumbo frame support, but also it assumes
that macb_config present which is not always true.

The configuration without macb_config fails to boot.

 Unable to handle kernel NULL pointer dereference at virtual address 00000010
 ptbr = 90350000 pgd = 00000000
 Oops: Kernel access of bad area, sig: 11 [#1]
 FRAME_POINTER chip: 0x01f:0x1e82 rev 2
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper Not tainted 4.2.0-rc3-next-20150723+ #13
 task: 91c26000 ti: 91c28000 task.ti: 91c28000
 PC is at macb_probe+0x140/0x61c

Fixes: 98b5a0f4a2 (net: macb: Add support for jumbo frames)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:10:29 -07:00
Andy Shevchenko f2ce8a9e48 net/macb: improve big endian CPU support
The commit a50dad355a (net: macb: Add big endian CPU support) converted I/O
accessors to readl_relaxed() and writel_relaxed() and consequentially broke
MACB driver on AVR32 platforms such as ATNGW100.

This patch improves I/O access by checking endiannes first and use the
corresponding methods.

Fixes: a50dad355a (net: macb: Add big endian CPU support)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:10:29 -07:00
Sabrina Dubroca dfbafc9953 tcp: fix recv with flags MSG_WAITALL | MSG_PEEK
Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called
with flags = MSG_WAITALL | MSG_PEEK.

sk_wait_data waits for sk_receive_queue not empty, but in this case,
the receive queue is not empty, but does not contain any skb that we
can use.

Add a "last skb seen on receive queue" argument to sk_wait_data, so
that it sleeps until the receive queue has new skbs.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
Reported-by: Enrico Scholz <rh-bugzilla@ensc.de>
Reported-by: Dan Searle <dan@censornet.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:06:53 -07:00
David S. Miller 3d3af88592 Merge branch 'r8152-fixes'
Hayes Wang says:

====================
r8152: issues fix

v2:
Replace patch #2 with "r8152: fix wakeup settings".

v1:
These patches are used to fix issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 00:56:39 -07:00