Commit Graph

634650 Commits

Author SHA1 Message Date
Hadar Hen Zion c9f1b073d0 net/mlx5: Add creation flags when adding new flow table
When creating flow tables, allow the caller to specify creation flags.
Currently no flags are used and as such this patch doesn't add any new
functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:41:56 -05:00
Hadar Hen Zion 43f93839e3 net/mlx5: Check max encap header size capability
Instead of comparing to a const value, check the value of max encap
header size capability as reported by the Firmware.

Fixes: 575ddf5888 ('net/mlx5: Introduce alloc_encap and dealloc_encap commands')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:41:55 -05:00
Hadar Hen Zion ae9f83ac24 net/mlx5: Move alloc/dealloc encap commands declarations to common header file
The alloc and dealloc encap commands will be used in the mlx5e driver,
as such, declare them in a common header file.

Also, rename the functions: mlx5_cmd_{de}alloc_encap is replaced with
mlx5_encap_{de}alloc.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:41:55 -05:00
Hadar Hen Zion 75bfbca01e net/sched: act_tunnel_key: Add UDP dst port option
The current tunnel set action supports only IP addresses and key
options. Add UDP dst port option.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:41:55 -05:00
Hadar Hen Zion 24ba898d43 net/dst: Add dst port to dst_metadata utility functions
Add dst port parameter to __ip_tun_set_dst and __ipv6_tun_set_dst
utility functions.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:41:54 -05:00
Hadar Hen Zion f4d997fd61 net/sched: cls_flower: Add UDP port to tunnel parameters
The current IP tunneling classification supports only IP addresses and key.
Enhance UDP based IP tunneling classification parameters by adding UDP
src and dst port.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:41:54 -05:00
Hadar Hen Zion 519d10521c net/sched: cls_flower: Allow setting encapsulation fields as used key
When encapsulation field is set, mark it as used key for the flow
dissector. This will be used by offloading drivers.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:41:54 -05:00
Hadar Hen Zion 9ba6a9a9f7 flow_dissector: Add enums for encapsulation keys
New encapsulation keys were added to the flower classifier, which allow
classification according to outer (encapsulation) headers attributes
such as key and IP addresses.
In order to expose those attributes outside flower, add
corresponding enums in the flow dissector.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:41:54 -05:00
Hadar Hen Zion 9ce183b4c4 net/sched: act_tunnel_key: add helper inlines to access tcf_tunnel_key
Needed for drivers to pick the relevant action when offloading tunnel
key act.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:41:53 -05:00
Lorenzo Colitti 35b80733b3 net: core: add missing check for uid_range in rule_exists.
Without this check, it is not possible to create two rules that
are identical except for their UID ranges. For example:

root@net-test:/# ip rule add prio 1000 lookup 300
root@net-test:/# ip rule add prio 1000 uidrange 100-200 lookup 300
RTNETLINK answers: File exists
root@net-test:/# ip rule add prio 1000 uidrange 100-199 lookup 100
root@net-test:/# ip rule add prio 1000 uidrange 200-299 lookup 200
root@net-test:/# ip rule add prio 1000 uidrange 300-399 lookup 100
RTNETLINK answers: File exists

Tested: https://android-review.googlesource.com/#/c/299980/
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:28:10 -05:00
Mintz, Yuval bb48024284 qed: Prevent stack corruption on MFW interaction
Driver uses a union for copying data to & from management firmware
when interacting with it.
Problem is that the function always copies sizeof(union) while commit
2edbff8dcb ("qed: Learn resources from management firmware") is casting
a union elements which is of smaller size [24-byte instead of 88-bytes].

Also, the union contains some inappropriate elements which increase its
size [should have been 32-bytes]. While this shouldn't corrupt other
PF messages to the MFW [as management firmware enforces permissions so
that each PF is allowed to write only to its own mailbox] we fix this
here as well.

Fixes: 2edbff8dcb ("qed: Learn resources from management firmware")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:27:25 -05:00
Philippe Reynes b12ab9b119 net: 3com: typhoon: fix typhoon_get_link_ksettings
When moving from typhoon_get_settings to typhoon_getlink_ksettings
in the commit f7a5537cd2 ("net: 3com: typhoon: use new api
ethtool_{get|set}_link_ksettings"), we use a local variable supported
but we forgot to update the struct ethtool_link_ksettings with
this value.

We also initialize advertising to zero, because otherwise it may
be uninitialized if no case of the switch (tp->xcvr_select) is used.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:25:14 -05:00
Philippe Reynes 90fdd04e2c net: xgbe: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:25:13 -05:00
Philippe Reynes ea74df816f net: amd: pcnet32: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:25:13 -05:00
Philippe Reynes 1435003c2c net: amd8111e: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:25:13 -05:00
Philippe Reynes d17970d746 net: alteon: acenic: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:25:12 -05:00
Philippe Reynes f1cd5aa078 net: adaptec: starfire: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:25:12 -05:00
David S. Miller 35887d3217 Merge branch 'stmmac-dwmac-rk-PM'
Joachim Eastwood says:

====================
stmmac: dwmac-rk: convert to standard PM/remove functions

This patch set aims to remove the init/exit callbacks from the
dwmac-rk driver and instead use standard PM callbacks. Eventually
the init/exit callbacks will be deprecated and removed from all
drivers dwmac-* except for dwmac-generic. Drivers will be refactored
to use standard PM and remove callbacks.

This conversion was pretty straight forward, but it would really nice
if some chromium people could test suspend/resume with this patch set.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:21:25 -05:00
Joachim Eastwood 5a3c7805c4 Revert "net: stmmac: allow to split suspend/resume from init/exit callbacks"
Instead of adding hooks inside stmmac_platform it is better to just use
the standard PM callbacks within the specific dwmac-driver. This only
used by the dwmac-rk driver.

This reverts commit cecbc5563a ("stmmac: allow to split suspend/resume
from init/exit callbacks").

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:21:25 -05:00
Joachim Eastwood 07a5e76924 stmmac: dwmac-rk: absorb rk_gmac_init into probe
Since the rk_gmac_init() only calls another function move this
function call into probe so rk_gmac_init() can be removed.

Since commit cecbc5563a ("stmmac: allow to split suspend/resume
from init/exit callbacks") the init hook is no longer used in
dwmac-rk so this can be removed.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:21:24 -05:00
Joachim Eastwood 0de8c4c9a9 stmmac: dwmac-rk: turn exit into standard driver remove callback
Convert the exit hook into a standard driver remove function as
the hook doesn't really buy us anything extra.

Eventually the exit hook will be deprecated in favor of the driver
remove function.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:21:24 -05:00
Joachim Eastwood 5619468a41 stmmac: dwmac-rk: turn resume/suspend into standard PM callbacks
Use standard PM resume/suspend callbacks instead of the hooks in
stmmac_platform. This gives the driver more control and flexibility
when implementing PM functionality. The hooks in stmmac_platform
also doesn't buy us anything extra.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:21:24 -05:00
David S. Miller c68d7f1b63 Merge branch 'tcp_get_info-locking'
Eric Dumazet says:

====================
tcp: tcp_get_info() locking changes

This short series prepares tcp_get_info() for more detailed infos.

In order to not slow down fast path, our goal is to use the normal
socket spinlock instead of custom synchronization.

All we need to ensure is that tcp_get_info() is not called with
ehash lock, which might dead lock, since packet processing would acquire
the spinlocks in reverse way.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:02:28 -05:00
Eric Dumazet 67db3e4bfb tcp: no longer hold ehash lock while calling tcp_get_info()
We had various problems in the past in tcp_get_info() and used
specific synchronization to avoid deadlocks.

We would like to add more instrumentation points for TCP, and
avoiding grabing socket lock in tcp_getinfo() was too costly.

Being able to lock the socket allows to provide consistent set
of fields.

inet_diag_dump_icsk() can make sure ehash locks are not
held any more when tcp_get_info() is called.

We can remove syncp added in commit d654976cbf
("tcp: fix a potential deadlock in tcp_get_info()"), but we need
to use lock_sock_fast() instead of spin_lock_bh() since TCP input
path can now be run from process context.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:02:27 -05:00
Eric Dumazet ccbf3bfaee tcp: shortcut listeners in tcp_get_info()
Being lockless in tcp_get_info() is hard, because we need to add
specific synchronization in TCP fast path, like seqcount.

Following patch will change inet_diag_dump_icsk() to no longer
hold any lock for non listeners, so that we can properly acquire
socket lock in get_tcp_info() and let it return more consistent counters.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:02:27 -05:00
David S. Miller 721ad32144 Merge branch 'Meson-GXL-internal-phy'
Neil Armstrong says:

====================
ARM64: Add Internal PHY support for Meson GXL

The Amlogic Meson GXL SoCs have an internal RMII PHY that is muxed with the
external RGMII pins.

In order to support switching between the two PHYs links, extended registers
size for mdio-mux-mmioreg must be added.

The DT related patches submitted as RFC in [3] will be sent in a separate
patchset due to multiple patchsets and DTSI migrations.

Changes since v2 RFC patchset at : [3]
 - Change phy Kconfig/Makefile alphabetic order
 - GXL dtsi cleanup

Changes since original RFC patchset at : [2]
 - Remove meson8b experimental phy switching
 - Switch to mdio-mux-mmioreg with extennded size support
 - Add internal phy support for S905x and p231
 - Add external PHY support for p230

[1] http://lkml.kernel.org/r/1477932286-27482-1-git-send-email-narmstrong@baylibre.com
[2] http://lkml.kernel.org/r/1477060838-14164-1-git-send-email-narmstrong@baylibre.com
[3] http://lkml.kernel.org/r/1477932987-27871-1-git-send-email-narmstrong@baylibre.com
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 12:50:56 -05:00
Neil Armstrong 7334b3e47a net: phy: Add Meson GXL Internal PHY driver
Add driver for the Internal RMII PHY found in the Amlogic Meson GXL SoCs.

This PHY seems to only implement some standard registers and need some
workarounds to provide autoneg values from vendor registers.

Some magic values are currently used to configure the PHY, and this a
temporary setup until clarification about these registers names and
registers fields are provided by Amlogic.

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 12:50:55 -05:00
Neil Armstrong 9a4c803748 net: mdio-mux-mmioreg: Add support for 16bit and 32bit register sizes
In order to support PHY switching on Amlogic GXL SoCs, add support for
16bit and 32bit registers sizes.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 12:50:55 -05:00
David S. Miller ddc5e15729 Merge branch 'rds-tcp-fixes'
Sowmini Varadhan says:

====================
RDS: TCP: bug fixes

A couple of bug fixes identified during testing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 12:47:50 -05:00
Sowmini Varadhan 117d15bbfd RDS: TCP: start multipath acceptor loop at 0
The for() loop in rds_tcp_accept_one() assumes that the 0'th
rds_tcp_conn_path is UP and starts multipath accepts at index 1.
But this assumption may not always be true: if the 0'th path
has failed (ERROR or DOWN state) an incoming connection request
should be used to resurrect this path.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 12:47:49 -05:00
Sowmini Varadhan 1ac507d4ff RDS: TCP: report addr/port info based on TCP socket in rds-info
The socket argument passed to rds_tcp_tc_info() is a PF_RDS socket,
so it is incorrect to report the address port info based on
rds_getname() as part of TCP state report.

Invoke inet_getname() for the t_sock associated with the
rds_tcp_connection instead.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 12:47:49 -05:00
Soheil Hassas Yeganeh f5f99309fa sock: do not set sk_err in sock_dequeue_err_skb
Do not set sk_err when dequeuing errors from the error queue.
Doing so results in:
a) Bugs: By overwriting existing sk_err values, it possibly
   hides legitimate errors. It is also incorrect when local
   errors are queued with ip_local_error. That happens in the
   context of a system call, which already returns the error
   code.
b) Inconsistent behavior: When there are pending errors on
   the error queue, sk_err is sometimes 0 (e.g., for
   the first timestamp on the error queue) and sometimes
   set to an error code (after dequeuing the first
   timestamp).
c) Suboptimality: Setting sk_err to ENOMSG on simple
   TX timestamps can abort parallel reads and writes.

Removing this line doesn't break userspace. This is because
userspace code cannot rely on sk_err for detecting whether
there is something on the error queue. Except for ICMP messages
received for UDP and RAW, sk_err is not set at enqueue time,
and as a result sk_err can be 0 while there are plenty of
errors on the error queue.

For ICMP packets in UDP and RAW, sk_err is set when they are
enqueued on the error queue, but that does not result in aborting
reads and writes. For such cases, sk_err is only readable via
getsockopt(SO_ERROR) which will reset the value of sk_err on
its own. More importantly, prior to this patch,
recvmsg(MSG_ERRQUEUE) has a race on setting sk_err (i.e.,
sk_err is set by sock_dequeue_err_skb without atomic ops or
locks) which can store 0 in sk_err even when we have ICMP
messages pending. Removing this line from sock_dequeue_err_skb
eliminates that race.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 20:29:10 -05:00
David S. Miller 5f7f75027f Merge branch 'IFF_NO_QUEUE-semantics'
Jesper Dangaard Brouer says:

====================
qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices

This patchset is a cleanup for IFF_NO_QUEUE devices.  It will
hopefully help userspace get a more consistent behavior when attaching
qdisc to such virtual devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 20:15:56 -05:00
Jesper Dangaard Brouer 84c46dd865 qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.

Why should the kernel catch such a misconfiguration?  Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device.  The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).

This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device.  The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc.  The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].

[1] https://github.com/docker/libcontainer/pull/193
[2] https://github.com/openshift/origin/pull/11126

Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
catch the misconfiguration on the kernel side.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 20:15:55 -05:00
Jesper Dangaard Brouer 1159708432 net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len
The flag IFF_NO_QUEUE marks virtual device drivers that doesn't need a
default qdisc attached, given they will be backed by physical device,
that already have a qdisc attached for pushback.

It is still supported to attach a qdisc to a IFF_NO_QUEUE device, as
this can be useful for difference policy reasons (e.g. bandwidth
limiting containers).  For this to work, the tx_queue_len need to have
a sane value, because some qdiscs inherit/copy the tx_queue_len
(namely, pfifo, bfifo, gred, htb, plug and sfb).

Commit a813104d92 ("IFF_NO_QUEUE: Fix for drivers not calling
ether_setup()") caught situations where some drivers didn't initialize
tx_queue_len.  The problem with the commit was choosing 1 as the
fallback value.

A qdisc queue length of 1 causes more harm than good, because it
creates hard to debug situations for userspace. It gives userspace a
false sense of a working config after attaching a qdisc.  As low
volume traffic (that doesn't activate the qdisc policy) works,
like ping, while traffic that e.g. needs shaping cannot reach the
configured policy levels, given the queue length is too small.

This patch change the value to DEFAULT_TX_QUEUE_LEN, given other
IFF_NO_QUEUE devices (that call ether_setup()) also use this value.

Fixes: a813104d92 ("IFF_NO_QUEUE: Fix for drivers not calling ether_setup()")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 20:15:55 -05:00
Jesper Dangaard Brouer d0a81f67cd net: make default TX queue length a defined constant
The default TX queue length of Ethernet devices have been a magic
constant of 1000, ever since the initial git import.

Looking back in historical trees[1][2] the value used to be 100,
with the same comment "Ethernet wants good queues". The commit[3]
that changed this from 100 to 1000 didn't describe why, but from
conversations with Robert Olsson it seems that it was changed
when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the
link speed increased x10 the queue size were also adjusted.  This
value later caused much heartache for the bufferbloat community.

This patch merely moves the value into a defined constant.

[1] https://git.kernel.org/cgit/linux/kernel/git/davem/netdev-vger-cvs.git/
[2] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/
[3] https://git.kernel.org/tglx/history/c/98921832c232

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 20:15:55 -05:00
David S. Miller fc13fd3986 Merge branch 'udp-fwd-mem-sched-on-dequeue'
Paolo Abeni says:

====================
udp: do fwd memory scheduling on dequeue

After commit 850cbaddb5 ("udp: use it's own memory accounting schema"),
the udp code needs to acquire twice the receive queue spinlock on dequeue.

This patch series remove the need for the second lock at skb free time,
moving the udp memory scheduling inside the dequeue operation; the skb
destructor field is not used anymore and an additional sk argument is added
to ip_cmsg_recv_offset() to cope with null skb->sk after dequeue.

Many thanks to Eric Dumazed for suggesting pretty all much the above.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:24:42 -05:00
Paolo Abeni 7c13f97ffd udp: do fwd memory scheduling on dequeue
A new argument is added to __skb_recv_datagram to provide
an explicit skb destructor, invoked under the receive queue
lock.
The UDP protocol uses such argument to perform memory
reclaiming on dequeue, so that the UDP protocol does not
set anymore skb->desctructor.
Instead explicit memory reclaiming is performed at close() time and
when skbs are removed from the receive queue.
The in kernel UDP protocol users now need to call a
skb_recv_udp() variant instead of skb_recv_datagram() to
properly perform memory accounting on dequeue.

Overall, this allows acquiring only once the receive queue
lock on dequeue.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr sinks	vanilla		patched
1		440		560
3		2150		2300
6		3650		3800
9		4450		4600
12		6250		6450

v1 -> v2:
 - do rmem and allocated memory scheduling under the receive lock
 - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
 - avoid the typdef for the dequeue callback

Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:24:41 -05:00
Paolo Abeni ad959036a7 net/sock: add an explicit sk argument for ip_cmsg_recv_offset()
So that we can use it even after orphaining the skbuff.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:24:41 -05:00
David Ahern cd2c0f4540 net: Update raw socket bind to consider l3 domain
Binding a raw socket to a local address fails if the socket is bound
to an L3 domain:

    $ vrf-test  -s -l 10.100.1.2 -R -I red
    error binding socket: 99: Cannot assign requested address

Update raw_bind to look consider if sk_bound_dev_if is bound to an L3
domain and use inet_addr_type_table to lookup the address.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:14:34 -05:00
David S. Miller 0ca6e000f5 Merge branch 'ns2-amac'
Jon Mason says:

====================
add NS2 support to bgmac

Changes in v6:
* Use a common bgmac_phy_connect_direct (per Rafal Milecki)
* Rebased on latest net-next
* Added Reviewed-by to the relevant patches

Changes in v5:
* Change a pr_err to netdev_err (per Scott Branden)
* Reword the lane swap binding documentation (per Andrew Lunn)

Changes in v4:
* Actually send out the lane swap binding doc patch (Per Scott Branden)
* Remove unused #define (Per Andrew Lunn)

Changes in v3:
* Clean-up the bgmac DT binding doc (per Rob Herring)
* Document the lane swap binding and make it generic (Per Andrew Lunn)

Changes in v2:
* Remove the PHY power-on (per Andrew Lunn)
* Misc PHY clean-ups regarding comments and #defines (per Andrew Lunn)
  This results on none of the original PHY code from Vikas being
  present.  So, I'm removing him as an author and giving him
  "Inspired-by" credit.
* Move PHY lane swapping to PHY driver (per Andrew Lunn and Florian
  Fainelli)
* Remove bgmac sleep (per Florian Fainelli)
* Re-add bgmac chip reset (per Florian Fainelli and Ray Jui)
* Rebased on latest net-next
* Added patch for bcm54xx_auxctl_read, which is used in the BCM54810
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:11:23 -05:00
Jon Mason dddc3c9d7d arm64: dts: NS2: add AMAC ethernet support
Add support for the AMAC ethernet to the Broadcom Northstar2 SoC device
tree

Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:11:22 -05:00
Jon Mason dd5c5d037f net: ethernet: bgmac: add NS2 support
Add support for the variant of amac hardware present in the Broadcom
Northstar2 based SoCs.  Northstar2 requires an additional register to be
configured with the port speed/duplexity (NICPM).  This can be added to
the link callback to hide it from the instances that do not use this.
Also, clearing of the pending interrupts on init is required due to
observed issues on some platforms.

Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:11:22 -05:00
Jon Mason 1676aba5ef net: ethernet: bgmac: device tree phy enablement
Change the bgmac driver to allow for phy's defined by the device tree

Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:11:22 -05:00
Jon Mason 0086f0944f Documentation: devicetree: net: add NS2 bindings to amac
Clean-up the documentation to the bgmac-amac driver, per suggestion by
Rob Herring, and add details for NS2 support.

Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:11:22 -05:00
Jon Mason b14995ac25 net: phy: broadcom: Add BCM54810 PHY entry
The BCM54810 PHY requires some semi-unique configuration, which results
in some additional configuration in addition to the standard config.
Also, some users of the BCM54810 require the PHY lanes to be swapped.
Since there is no way to detect this, add a device tree query to see if
it is applicable.

Inspired-by: Vikas Soni <vsoni@broadcom.com>
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:11:22 -05:00
Jon Mason 3b9feb60e1 Documentation: devicetree: add PHY lane swap binding
Add the documentation for PHY lane swapping.  This is a boolean entry to
notify the phy device drivers that the TX/RX lanes need to be swapped.

Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:11:21 -05:00
Jon Mason 5b4e290051 net: phy: broadcom: add bcm54xx_auxctl_read
Add a helper function to read the AUXCTL register for the BCM54xx.  This
mirrors the bcm54xx_auxctl_write function already present in the code.

Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:11:21 -05:00
David S. Miller 94edc86bf1 Merge branch 'dwmac-sti-refactor-cleanup'
Joachim Eastwood says:

====================
stmmac: dwmac-sti refactor+cleanup

This patch set aims to remove the init/exit callbacks from the
dwmac-sti driver and instead use standard PM callbacks. Doing this
will also allow us to cleanup the driver.

Eventually the init/exit callbacks will be deprecated and removed
from all drivers dwmac-* except for dwmac-generic. Drivers will be
refactored to use standard PM and remove callbacks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-06 22:00:17 -05:00
Joachim Eastwood cb42d4b64b stmmac: dwmac-sti: remove unused priv dev member
The dev member of struct sti_dwmac is not used anywhere in the driver
so lets just remove it.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-06 22:00:15 -05:00