Commit Graph

548540 Commits

Author SHA1 Message Date
Hante Meuleman 124d517211 brcmfmac: Fix station info rate information.
Txrate and rxrate in get_station got assigned first with value
in kbps and then divided by 100 to get it in 100kbps unit. The
problem with that is that type of rate is u16 which resulted
in incorrect values for high data rate values.

Reviewed-by: Arend Van Spriel <arend@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-21 10:56:46 +03:00
Hante Meuleman 2b76acdbc0 brcmfmac: Rework p2p attach, use single method for p2p dev creation.
When module param p2pon is used a p2p device is created at init.
This patch reworks how this is done by using the same method as
for a dynamically (by user space) created p2p device.

Reviewed-by: Arend Van Spriel <arend@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Signed-off-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-21 10:56:45 +03:00
Arend van Spriel 43569bfaf6 brcmfmac: remove conversational comment
Removing a comment that was only useful during the review of
the change that introduced it and which should never have been
submitted.

Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-21 10:56:42 +03:00
Franky Lin 4a3462843f brcmfmac: rename firmware_path to alternative_fw_path
In brcmfmac the module parameter "firmware_path" is used as an
alternative relative path under the search path used by firmware_class
or ueventhelper. Rename the parameter to alternative_fw_path to avoid
confusion.

Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Reviewed-by: Arend Van Spriel <arend@broadcom.com>
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Franky Lin <frankyl@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-21 10:56:40 +03:00
Hante Meuleman 185f0eb0b5 brcmfmac: Fix race condition between USB probe/load and disconnect.
When a USB device gets disconnected due to for example removal
then it is possible that it is still in the loading phase due to
the asynchronous load routines. These routines can then possible
access memory which has been freed. Fix this by mutex locking the
device init phase.

Reviewed-by: Arend Van Spriel <arend@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-21 10:56:39 +03:00
Arend van Spriel ff4445a850 brcmfmac: expose device memory to devcoredump subsystem
Upon PSM watchdog event received from firmware the driver will obtain
a memory snapshot of the device and expose it to user-space through
the devcoredump framework. This will trigger a uevent.

Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-21 10:56:23 +03:00
Jes Sorensen 26f1fad29a New driver: rtl8xxxu (mac80211)
This is an alternate driver for a number of Realtek WiFi USB devices,
including RTL8723AU, RTL8188CU, RTL8188RU, RTL8191CU, and RTL8192CU.
It was written from scratch utilizing the Linux mac80211 stack.

After spending months cleaning up the vendor provided rtl8723au
driver, which comes with it's own 802.11 stack included, I decided to
rewrite this driver from the bottom up.

Many thanks to Johannes Berg for 802.11 insights and help and Larry
Finger for help with the vendor driver.

The full git log for the development of this driver can be found here:
git git://git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git
    branch rtl8723au-mac80211

This driver is still under development, but has proven to be very
stable for me. It currently supports station mode only. It has support
for OFDM and CCK rates. It does lack certain features found in the
staging driver, such as power management, AMPDU, and 40MHz channel
support. In addition it does not support AD-HOC, AP, and monitor mode
support at this point.

The driver is known to work with the following devices:
Lenovo Yoga (rtl8723au)
TP-Link TL-WN823N (rtl8192cu)
Etekcity 6R (rtl8188cu)
Daffodil LAN03 (rtl8188cu)
Alfa AWUS036NHR (rtl8188ru)

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-21 10:53:29 +03:00
Xinming Hu 8785955bbc mwifiex: remove unnecessary NULL check
ra_list cannot be NULL here, so remove the unnecessary NULL check.

Signed-off-by: Xinming Hu <huxm@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:22:08 +03:00
Amitkumar Karwar fe24372d1b mwifiex: add ndo_validate_addr netdev ops
ndo_validate_addr is set to generic eth_validate_addr() function
used for MAC address validation.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:22:07 +03:00
Ganapathi Bhat 505c5cb82d mwifiex: fix AP VHT behaviour
Even if hostapd configuration file contains VHT parameters,
they were not getting reflected in beacons. The reason is
we are resetting them before starting AP. This patch removes
redundant BSS_STOP and SYS_RESET firmware commands before
starting AP to fix the problem.

Signed-off-by: Ganapathi Bhat <gbhat@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:22:06 +03:00
Amitkumar Karwar 46dbe2476c mwifiex: control WLAN and bluetooth coexistence modes
By default our chip will be in spatial coexistence mode.
This patch adds a provision to change it to timeshare mode
via debugfs command.

Enable timeshare coexistence mode
   echo 1 > /sys/kernel/debug/mwifiex/mlan0/timeshare_coex

Go back to spacial coexistence mode
   echo 0 > /sys/kernel/debug/mwifiex/mlan0/timeshare_coex

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:21:29 +03:00
Geliang Tang a484804b3e mwifiex: fix a comment typo
Just fix a typo in the code comment.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Acked-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:16:56 +03:00
Miaoqing Pan cfa2b42b4d ath9k: fix QCA9561 XLNA rxgain initial
A small bugfix for commit ede6a5e7b8 ("ath9k: Add QCA956x HW support").
I guess I would have skipped renaming (that initial QCA956x commit has
been there already for almost a year with the "5g" in the name) and move
the call outside AR_SREV_9462_20_OR_LATER() to make it reachable.

Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:15:57 +03:00
Miaoqing Pan 871d0051f0 ath9k: rename ini_modes_rxgain_5g_xlna to ini_modes_rxgain_xlna
rename the variable as preparation for using the array with 2.4 GHz
band, etc.

Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:15:55 +03:00
Priit Laes 16a4ea5065 rtlwifi: rtl8192cu: Add missing case in rtl92cu_get_hw_reg
Driver was reporting 'switch case not processed' after association,
so HW_VAR_KEEP_ALIVE was added and filled similarily to other drivers.

Positive side effect to this seems to be a bit more stable connection.

Signed-off-by: Priit Laes <plaes@plaes.org>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:15:03 +03:00
Amitkumar Karwar ce03966a16 mwifiex: correction in USB8997 chipset's product ID
For 8897 chipset, mwifiex_usb unnecessarily used to
come into picture when wlan is supposed to be used
via PCIe interface and USB interface is for
bluetooth.

This problem has been resolved for newer chipset by
having separate USB product ids for USB-USB8997 and
PCIe-USB8997 chipset variants. This patch ensures to
use wlan specific product id.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Frank Huang <frankh@marvell.com>
Signed-off-by: Nishant Sarmukadam <nishants@marvell.com>
Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:14:18 +03:00
Amitkumar Karwar 60a188a271 mwifiex: remove USB8897 chipset support
We don't have any customer using this chipset via USB
interface. if both mwifiex_pcie and mwifiex_usb modules
are enabled by user, sometimes mwifiex_usb wins the race
even if user wants wlan interface to be on PCIe and
USB for bluetooth. This patch solves the problem.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Frank Huang <frankh@marvell.com>
Signed-off-by: Nishant Sarmukadam <nishants@marvell.com>
Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:14:16 +03:00
Martin Blumenstingl 406df18f1f ath9k: Fix NF CCA limits for AR9287 and AR9227
The FreeBSD driver [0] uses the same 2G values as for the AR9280 chips.
Using the same values in ath9k results in much better throughput for me.

Before this patch I had a huge amount of packet loss (sometimes up to
40%) and the max transfer speed was somewhere around 5Mbit/s. With this
patch applied I have zero packet loss and ten times the throughput.
My device uses a AR9227 which is the PCI variant of the AR9287.

[0] http://bxr.su/FreeBSD/sys/dev/ath/ath_hal/ar9002/ar9287.h

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 14:05:00 +03:00
Larry Finger f1d2b4d338 rtlwifi: rtl818x: Move drivers into new realtek directory
Now that a new mac80211-based driver for Realtek devices has been submitted,
it is time to reorganize the directories. Rather than having directories
rtlwifi and rtl818x be in drivers/net/wireless/, they will now be in
drivers/net/wireless/realtek/. This change simplifies the directory
structure, but does not result in any configuration changes that are
visable to the user.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-10-14 13:33:10 +03:00
Nikolay Aleksandrov 6623c60dc2 bridge: vlan: enforce no pvid flag in vlan ranges
Currently it's possible for someone to send a vlan range to the kernel
with the pvid flag set which will result in the pvid bouncing from a
vlan to vlan and isn't correct, it also introduces problems for hardware
where it doesn't make sense having more than 1 pvid. iproute2 already
enforces this, so let's enforce it on kernel-side as well.

Reported-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:59:15 -07:00
Tillmann Heidsieck cbb41b91e6 atm: iphase: fix misleading indention
Fix a smatch warning:
drivers/atm/iphase.c:1178 rx_pkt() warn: curly braces intended?

The code is correct, the indention is misleading. In case the allocation
of skb fails, we want to skip to the end.

Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:56:27 -07:00
Tillmann Heidsieck 21e26ff993 atm: iphase: return -ENOMEM instead of -1 in case of failed kmalloc()
Smatch complains about returning hard coded error codes, silence this
warning.

drivers/atm/iphase.c:115 ia_enque_rtn_q() warn: returning -1 instead of -ENOMEM is sloppy

Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:56:26 -07:00
Roopa Prabhu 8c5b83f0f2 ipv6 route: use err pointers instead of returning pointer by reference
This patch makes ip6_route_info_create return err pointer instead of
returning the rt pointer by reference as suggested  by Dave

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:47:34 -07:00
huangdaode 99dcc7dfb1 net: hns: fix the unknown phy_nterface_t type error
This patch fix the building error reported by Jiri Pirko <jiri@resnulli.us>

drivers/net/ethernet/hisilicon/hns/hnae.h:465:2: error: unknown type
name 'phy_interface_t'
        phy_interface_t phy_if;
	^
the full build log is on https://lists.01.org/pipermail/kbuild-all.

Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Signed-off-by: yankejian <yankejian@huawei.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:46:46 -07:00
Eric Dumazet 5fcd2d8be4 tun: use sk_fullsock() before reading sk->sk_tsflags
timewait or request sockets are small and do not contain sk->sk_tsflags

Without this fix, we might read garbage, and crash later in

__skb_complete_tx_timestamp()
 -> sock_queue_err_skb()

(These pseudo sockets do not have an error queue either)

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:45:48 -07:00
David S. Miller b7a4609591 Merge branch 'netns-defrag'
Eric W. Biederman says:

====================
net: Pass net into defragmentation

This is the next installment of my work to pass struct net through the
output path so the code does not need to guess how to figure out which
network namespace it is in, and ultimately routes can have output
devices in another network namespace.

In netfilter and af_packet we defragment packets in the output path,
and there is the usual amount of confusion about how to compute which
net we are processing the packets in.  This patchset clears that
confusion up by explicitly passing in struct net in ip_defrag,
ip_check_defrag, and nf_ct_frag6_gather.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:44:22 -07:00
Eric W. Biederman b72775977c ipv6: Pass struct net into nf_ct_frag6_gather
The function nf_ct_frag6_gather is called on both the input and the
output paths of the networking stack.  In particular ipv6_defrag which
calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain
on input and the LOCAL_OUT chain on output.

The addition of a net parameter makes it explicit which network
namespace the packets are being reassembled in, and removes the need
for nf_ct_frag6_gather to guess.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:44:17 -07:00
Eric W. Biederman 19bcf9f203 ipv4: Pass struct net into ip_defrag and ip_check_defrag
The function ip_defrag is called on both the input and the output
paths of the networking stack.  In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.

So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:44:16 -07:00
Eric W. Biederman 37fcbab61b ipv4: Only compute net once in ip_call_ra_chain
ip_call_ra_chain is called early in the forwarding chain from
ip_forward and ip_mr_input, which makes skb->dev the correct
expression to get the input network device and dev_net(skb->dev) a
correct expression for the network namespace the packet is being
processed in.

Compute the network namespace and store it in a variable to make the
code clearer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:44:14 -07:00
Eric Dumazet 161642e24f packet: fix match_fanout_group()
Recent TCP listener patches exposed a prior af_packet bug :
match_fanout_group() blindly assumes it is always safe
to cast sk to a packet socket to compare fanout with af_packet_priv

But SYNACK packets can be sent while attached to request_sock, which
are smaller than a "struct sock".

We can read non existent memory and crash.

Fixes: c0de08d042 ("af_packet: don't emit packet on orig fanout group")
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:42:38 -07:00
David S. Miller 9916596742 Major changes:
iwlwifi
 
 * some debugfs improvements
 * fix signedness in beacon statistics
 * deinline some functions to reduce size when device tracing is enabled
 * filter beacons out in AP mode when no stations are associated
 * deprecate firmwares version -12
 * fix a runtime PM vs. legacy suspend race
 * one-liner fix for a ToF bug
 * clean-ups in the rx code
 * small debugging improvement
 * fix WoWLAN with new firmware versions
 * more clean-ups towards multiple RX queues;
 * some rate scaling fixes and improvements;
 * some time-of-flight fixes;
 * other generic improvements and clean-ups;
 
 brcmfmac
 
 * rework code dealing with multiple interfaces
 * allow logging firmware console using debug level
 * support for BCM4350, BCM4365, and BCM4366 PCIE devices
 * fixed for legacy P2P and P2P device handling
 * correct set and get tx-power
 
 ath9k
 
 * add support for Outside Context of a BSS (OCB) mode
 
 mwifiex
 
 * add USB multichannel feature
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJWF9ciAAoJEG4XJFUm622bVaAH/3Fi4CaKrDF6L8lxSRWUZzft
 Ie2X0FC+d5knpS7dOd7iI02MuEuKCg3f6dmtDrCDFBqFohvfO5NkG4XU81jdIiWM
 Xkyxlgcy/1TuILNjQfNh/2nhjpvvHDCyptl+jimeT2VR2ITD/Vj3IOAMA5l4khyx
 OeWmgW7dT9xLwYYy20ql5QLGkbxwJlHawUw/d+3yiS+AHO+6dVGJL2OtpyrlPP/F
 0KpSj0lZY9UNRL+i6FbONDCBYeG+q/lA5G5nGXBF6zEeZ6BcuWNRcBBGr2n/6uMy
 gQMAunqBIunfYkfpEKYEPF5zoyO/wCmvPLxx56iS8okGSVw4KzQ2DtQ0leFbjBw=
 =1po3
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-for-davem-2015-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
Major changes:

iwlwifi

* some debugfs improvements
* fix signedness in beacon statistics
* deinline some functions to reduce size when device tracing is enabled
* filter beacons out in AP mode when no stations are associated
* deprecate firmwares version -12
* fix a runtime PM vs. legacy suspend race
* one-liner fix for a ToF bug
* clean-ups in the rx code
* small debugging improvement
* fix WoWLAN with new firmware versions
* more clean-ups towards multiple RX queues;
* some rate scaling fixes and improvements;
* some time-of-flight fixes;
* other generic improvements and clean-ups;

brcmfmac

* rework code dealing with multiple interfaces
* allow logging firmware console using debug level
* support for BCM4350, BCM4365, and BCM4366 PCIE devices
* fixed for legacy P2P and P2P device handling
* correct set and get tx-power

ath9k

* add support for Outside Context of a BSS (OCB) mode

mwifiex

* add USB multichannel feature
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:39:18 -07:00
Paolo Abeni e2ca690b65 ipv4/icmp: redirect messages can use the ingress daddr as source
This patch allows configuring how the source address of ICMP
redirect messages is selected; by default the old behaviour is
retained, while setting icmp_redirects_use_orig_daddr force the
usage of the destination address of the packet that caused the
redirect.

The new behaviour fits closely the RFC 5798 section 8.1.1, and fix the
following scenario:

Two machines are set up with VRRP to act as routers out of a subnet,
they have IPs x.x.x.1/24 and x.x.x.2/24, with VRRP holding on to
x.x.x.254/24.

If a host in said subnet needs to get an ICMP redirect from the VRRP
router, i.e. to reach a destination behind a different gateway, the
source IP in the ICMP redirect is chosen as the primary IP on the
interface that the packet arrived at, i.e. x.x.x.1 or x.x.x.2.

The host will then ignore said redirect, due to RFC 1122 section 3.2.2.2,
and will continue to use the wrong next-op.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:38:02 -07:00
Jiri Pirko 0944d6b5a2 bridge: try switchdev op first in __vlan_vid_add/del
Some drivers need to implement both switchdev vlan ops and
vid_add/kill ndos. For that to work in bridge code, we need to try
switchdev op first when adding/deleting vlan id.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:35:20 -07:00
wangweidong 3703ebe403 BNX2: free temp_stats_blk on error path
In bnx2_init_board, missing free temp_stats_blk on error path when
some operations do failed. Just add the 'kfree' operation.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:33:46 -07:00
David S. Miller 76973dd79f Merge branch 'setsockopt_incoming_cpu'
Eric Dumazet says:

====================
tcp: better smp listener behavior

As promised in last patch series, we implement a better SO_REUSEPORT
strategy, based on cpu hints if given by the application.

We also moved sk_refcnt out of the cache line containing the lookup
keys, as it was considerably slowing down smp operations because
of false sharing. This was simpler than converting listen sockets
to conventional RCU (to avoid sk_refcnt dirtying)

Could process 6.0 Mpps SYN instead of 4.2 Mpps on my test server.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:28:32 -07:00
Eric Dumazet d475f090bf tcp: shrink tcp_timewait_sock by 8 bytes
Reducing tcp_timewait_sock from 280 bytes to 272 bytes
allows SLAB to pack 15 objects per page instead of 14 (on x86)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:28:24 -07:00
Eric Dumazet ed53d0ab76 net: shrink struct sock and request_sock by 8 bytes
One 32bit hole is following skc_refcnt, use it.
skc_incoming_cpu can also be an union for request_sock rcv_wnd.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:28:22 -07:00
Eric Dumazet 8e5eb54d30 net: align sk_refcnt on 128 bytes boundary
sk->sk_refcnt is dirtied for every TCP/UDP incoming packet.
This is a performance issue if multiple cpus hit a common socket,
or multiple sockets are chained due to SO_REUSEPORT.

By moving sk_refcnt 8 bytes further, first 128 bytes of sockets
are mostly read. As they contain the lookup keys, this has
a considerable performance impact, as cpus can cache them.

These 8 bytes are not wasted, we use them as a place holder
for various fields, depending on the socket type.

Tested:
 SYN flood hitting a 16 RX queues NIC.
 TCP listener using 16 sockets and SO_REUSEPORT
 and SO_INCOMING_CPU for proper siloing.

 Could process 6.0 Mpps SYN instead of 4.2 Mpps

 Kernel profile looked like :
    11.68%  [kernel]  [k] sha_transform
     6.51%  [kernel]  [k] __inet_lookup_listener
     5.07%  [kernel]  [k] __inet_lookup_established
     4.15%  [kernel]  [k] memcpy_erms
     3.46%  [kernel]  [k] ipt_do_table
     2.74%  [kernel]  [k] fib_table_lookup
     2.54%  [kernel]  [k] tcp_make_synack
     2.34%  [kernel]  [k] tcp_conn_request
     2.05%  [kernel]  [k] __netif_receive_skb_core
     2.03%  [kernel]  [k] kmem_cache_alloc

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:28:22 -07:00
Eric Dumazet 70da268b56 net: SO_INCOMING_CPU setsockopt() support
SO_INCOMING_CPU as added in commit 2c8c56e15d was a getsockopt() command
to fetch incoming cpu handling a particular TCP flow after accept()

This commits adds setsockopt() support and extends SO_REUSEPORT selection
logic : If a TCP listener or UDP socket has this option set, a packet is
delivered to this socket only if CPU handling the packet matches the specified
one.

This allows to build very efficient TCP servers, using one listener per
RX queue, as the associated TCP listener should only accept flows handled
in softirq by the same cpu.
This provides optimal NUMA behavior and keep cpu caches hot.

Note that __inet_lookup_listener() still has to iterate over the list of
all listeners. Following patch puts sk_refcnt in a different cache line
to let this iteration hit only shared and read mostly cache lines.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:28:20 -07:00
Edward Jee c7d39e3263 packet: support per-packet fwmark for af_packet sendmsg
Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:25:22 -07:00
Edward Jee f28ea365cd sock: support per-packet fwmark
It's useful to allow users to set fwmark for an individual packet,
without changing the socket state. The function this patch adds in
sock layer can be used by the protocols that need such a feature.

Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:25:21 -07:00
David S. Miller c1bf5fe031 Merge branch 'bpf-unprivileged'
Alexei Starovoitov says:

====================
bpf: unprivileged

v1-v2:
- this set logically depends on cb patch
  "bpf: fix cb access in socket filter programs":
  http://patchwork.ozlabs.org/patch/527391/
  which is must have to allow unprivileged programs.
  Thanks Daniel for finding that issue.
- refactored sysctl to be similar to 'modules_disabled'
- dropped bpf_trace_printk
- split tests into separate patch and added more tests
  based on discussion

v1 cover letter:
I think it is time to liberate eBPF from CAP_SYS_ADMIN.
As was discussed when eBPF was first introduced two years ago
the only piece missing in eBPF verifier is 'pointer leak detection'
to make it available to non-root users.
Patch 1 adds this pointer analysis.
The eBPF programs, obviously, need to see and operate on kernel addresses,
but with these extra checks they won't be able to pass these addresses
to user space.
Patch 2 adds accounting of kernel memory used by programs and maps.
It changes behavoir for existing root users, but I think it needs
to be done consistently for both root and non-root, since today
programs and maps are only limited by number of open FDs (RLIMIT_NOFILE).
Patch 2 accounts program's and map's kernel memory as RLIMIT_MEMLOCK.

Unprivileged eBPF is only meaningful for 'socket filter'-like programs.
eBPF programs for tracing and TC classifiers/actions will stay root only.

In parallel the bpf fuzzing effort is ongoing and so far
we've found only one verifier bug and that was already fixed.
The 'constant blinding' pass also being worked on.
It will obfuscate constant-like values that are part of eBPF ISA
to make jit spraying attacks even harder.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:13:41 -07:00
Alexei Starovoitov bf5088773f bpf: add unprivileged bpf tests
Add new tests samples/bpf/test_verifier:

unpriv: return pointer
  checks that pointer cannot be returned from the eBPF program

unpriv: add const to pointer
unpriv: add pointer to pointer
unpriv: neg pointer
  checks that pointer arithmetic is disallowed

unpriv: cmp pointer with const
unpriv: cmp pointer with pointer
  checks that comparison of pointers is disallowed
  Only one case allowed 'void *value = bpf_map_lookup_elem(..); if (value == 0) ...'

unpriv: check that printk is disallowed
  since bpf_trace_printk is not available to unprivileged

unpriv: pass pointer to helper function
  checks that pointers cannot be passed to functions that expect integers
  If function expects a pointer the verifier allows only that type of pointer.
  Like 1st argument of bpf_map_lookup_elem() must be pointer to map.
  (applies to non-root as well)

unpriv: indirectly pass pointer on stack to helper function
  checks that pointer stored into stack cannot be used as part of key
  passed into bpf_map_lookup_elem()

unpriv: mangle pointer on stack 1
unpriv: mangle pointer on stack 2
  checks that writing into stack slot that already contains a pointer
  is disallowed

unpriv: read pointer from stack in small chunks
  checks that < 8 byte read from stack slot that contains a pointer is
  disallowed

unpriv: write pointer into ctx
  checks that storing pointers into skb->fields is disallowed

unpriv: write pointer into map elem value
  checks that storing pointers into element values is disallowed
  For example:
  int bpf_prog(struct __sk_buff *skb)
  {
    u32 key = 0;
    u64 *value = bpf_map_lookup_elem(&map, &key);
    if (value)
       *value = (u64) skb;
  }
  will be rejected.

unpriv: partial copy of pointer
  checks that doing 32-bit register mov from register containing
  a pointer is disallowed

unpriv: pass pointer to tail_call
  checks that passing pointer as an index into bpf_tail_call
  is disallowed

unpriv: cmp map pointer with zero
  checks that comparing map pointer with constant is disallowed

unpriv: write into frame pointer
  checks that frame pointer is read-only (applies to root too)

unpriv: cmp of frame pointer
  checks that R10 cannot be using in comparison

unpriv: cmp of stack pointer
  checks that Rx = R10 - imm is ok, but comparing Rx is not

unpriv: obfuscate stack pointer
  checks that Rx = R10 - imm is ok, but Rx -= imm is not

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:13:37 -07:00
Alexei Starovoitov aaac3ba95e bpf: charge user for creation of BPF maps and programs
since eBPF programs and maps use kernel memory consider it 'locked' memory
from user accounting point of view and charge it against RLIMIT_MEMLOCK limit.
This limit is typically set to 64Kbytes by distros, so almost all
bpf+tracing programs would need to increase it, since they use maps,
but kernel charges maximum map size upfront.
For example the hash map of 1024 elements will be charged as 64Kbyte.
It's inconvenient for current users and changes current behavior for root,
but probably worth doing to be consistent root vs non-root.

Similar accounting logic is done by mmap of perf_event.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:13:36 -07:00
Alexei Starovoitov 1be7f75d16 bpf: enable non-root eBPF programs
In order to let unprivileged users load and execute eBPF programs
teach verifier to prevent pointer leaks.
Verifier will prevent
- any arithmetic on pointers
  (except R10+Imm which is used to compute stack addresses)
- comparison of pointers
  (except if (map_value_ptr == 0) ... )
- passing pointers to helper functions
- indirectly passing pointers in stack to helper functions
- returning pointer from bpf program
- storing pointers into ctx or maps

Spill/fill of pointers into stack is allowed, but mangling
of pointers stored in the stack or reading them byte by byte is not.

Within bpf programs the pointers do exist, since programs need to
be able to access maps, pass skb pointer to LD_ABS insns, etc
but programs cannot pass such pointer values to the outside
or obfuscate them.

Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
so that socket filters (tcpdump), af_packet (quic acceleration)
and future kcm can use it.
tracing and tc cls/act program types still require root permissions,
since tracing actually needs to be able to see all kernel pointers
and tc is for root only.

For example, the following unprivileged socket filter program is allowed:
int bpf_prog1(struct __sk_buff *skb)
{
  u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
  u64 *value = bpf_map_lookup_elem(&my_map, &index);

  if (value)
	*value += skb->len;
  return 0;
}

but the following program is not:
int bpf_prog1(struct __sk_buff *skb)
{
  u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
  u64 *value = bpf_map_lookup_elem(&my_map, &index);

  if (value)
	*value += (u64) skb;
  return 0;
}
since it would leak the kernel address into the map.

Unprivileged socket filter bpf programs have access to the
following helper functions:
- map lookup/update/delete (but they cannot store kernel pointers into them)
- get_random (it's already exposed to unprivileged user space)
- get_smp_processor_id
- tail_call into another socket filter program
- ktime_get_ns

The feature is controlled by sysctl kernel.unprivileged_bpf_disabled.
This toggle defaults to off (0), but can be set true (1).  Once true,
bpf programs and maps cannot be accessed from unprivileged process,
and the toggle cannot be set back to false.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:13:35 -07:00
Arnd Bergmann 0fa28877b2 net: HNS: fix MDIO dependencies
The newly introduced HNS_MDIO Kconfig symbol selects 'MDIO', but
that is the wrong symbol as the code used by this driver is
provided by PHYLIB rather than the MDIO driver. Also, there is
no need to make this driver user selectable, because it is already
selected by all drivers that need it.

This changes the Kconfig file to select the correct library, and
to make the option silent.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5b904d3940 ("net: add Hisilicon Network Subsystem MDIO support")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 05:41:15 -07:00
Daniel Pieczko c577e59ed7 sfc: fully reset if MC_REBOOT event received without warm_boot_count increment
On EF10, MC_CMD_VPORT_RECONFIGURE can cause a CODE_MC_REBOOT event
to be sent to a function without incrementing the (adapter-wide)
warm_boot_count.  In this case, the reboot is not detected by the
loop on efx_mcdi_poll_reboot(), so prepare for recovery from an MC
reboot anyway.  When this codepath is run, the MC has always just
rebooted, so this recovery is valid.

The loop on efx_mcdi_poll_reboot() is still required for other MC
reboot cases, so that actions in response to an MC reboot are
performed, such as clearing locally calculated statistics.
Siena NICs are unaffected by this change as the above scenario
does not apply.

Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 05:35:25 -07:00
David S. Miller d5404915a9 Merge branch 'switchdev_ageing_time'
Scott Feldman says:

====================
switchdev: push bridge ageing_time attribute down

Push bridge-level attributes down to switchdev drivers.  This patchset
adds the infrastructure and then pushes, as an example, ageing_time attribute
down from bridge to switchdev (rocker) driver.  Add some range-checking
for ageing_time.

RTNETLINK answers: Numerical result out of range

Up until now, switchdev attrs where port-level attrs, so the netdev used in
switchdev_attr_set() would be a switch port or bond of switch ports.  With
bridge-level attrs, the netdev passed to switchdev_attr_set() is the bridge
netdev.  The same recusive algo is used to visit the leaves of the stacked
drivers to set the attr, it's just in this case we start one layer higher in
the stack.  One note is not all ports in the bridge may support setting a
bridge-level attribute, so rather than failing the entire set, we'll skip over
those ports returning -EOPNOTSUPP.

v2->v3: Per Jiri review: push only ageing_time attr down at this time, and
don't pass raw bridge IFLA_BR_* values; rather use new switchdev attr ID for
ageing_time.

v1->v2: rebase w/ net-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 05:20:28 -07:00
Scott Feldman d0cf57f9dd rocker: handle setting bridge ageing_time
The FDB cleanup timer will get rescheduled to re-evaluate FDB entries
based on new ageing_time.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 05:20:22 -07:00
Scott Feldman c62987bbd8 bridge: push bridge setting ageing_time down to switchdev
Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that don't
support setting ageing_time (or setting bridge attrs in general).

If push fails, don't update ageing_time in bridge and return err to user.

If push succeeds, update ageing_time in bridge and run gc_timer now to
recalabrate when to run gc_timer next, based on new ageing_time.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 05:20:20 -07:00