Currently we use the non UAPI values and we miss erring on
the modify action which is not supported, fix that.
Fixes: 8b32580df1 ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing the eswitch inline mode can potentially cause already configured
flows not to match the policy. E.g. set policy L4, add some L4 rules,
set policy to L2 --> bad! Hence we disallow it.
Keep track of how many offloaded rules are now set and refuse
inline mode changes if this isn't zero.
Fixes: bffaa91658 ("net/mlx5: E-Switch, Add control for inline mode")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the code to deal with add/del TC rules to have handler per NIC/E-switch
offloading use case, and push the latter into the e-switch code. This provides
better separation and is to be used in down-stream patch for applying a fix.
Fixes: bffaa91658 ("net/mlx5: E-Switch, Add control for inline mode")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch cases for the rate limit set and query commands were
missing, which could get us wrong under fw error or driver reset
flow, fix that.
Fixes: 1466cc5b23 ('net/mlx5: Rate limit tables support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
iwlwifi
* fix a user reported warning in DQA
mwifiex
* fix a potential double free
* fix lost early debug logs
* fix init wakeup warning message from device framework
* add Ganapathi and Xinming as maintainers
ath10k
* fix regression with QCA6174 during resume and firmware crash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJY0S5DAAoJEG4XJFUm622bYoYH/AxiLF0UleRgeHeQAQd4d1d0
QsISkT+Xm1p43P3GQ9qoKocHd0S0GvWCqzHC2HAeDmLFAUf5gfo3hhGqEvTVEvCo
eUiNjIJyezNcvo9JJwmB/P8O2nVJH9tUtSISwKfnvG/XT4olSJfE+vPi0edOXG4s
IvDdO8AiuoZHoP11utNtMsg4vnA1uvy75ti20fe69PX5g9G/+gcDBkFHh/IpcfML
GSRZaZjKxsaWKnSUimPJlJsvsJE+mcsXD2rPImEiinQbR2yTfkAcmdxaQR9fkVSP
1FdT4MvWoxrK4sUaPSn3ZO/eVzNeDXi9bQPGQwBpq6rUZp7r9OPkdRqGWbi/hW8=
=zzBr
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-for-davem-2017-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.11
iwlwifi
* fix a user reported warning in DQA
mwifiex
* fix a potential double free
* fix lost early debug logs
* fix init wakeup warning message from device framework
* add Ganapathi and Xinming as maintainers
ath10k
* fix regression with QCA6174 during resume and firmware crash
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not open code getting the MAC address exclusively from the
"local-mac-address" property, but instead use of_get_mac_address() which
looks up the MAC address using the 3 typical property names.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now, tipc_nametbl_unsubscribe() is called at subscriptions
reference count cleanup. Usually the subscriptions cleanup is
called at subscription timeout or at subscription cancel or at
subscriber delete.
We have ignored the possibility of this being called from other
locations, which causes deadlock as we try to grab the
tn->nametbl_lock while holding it already.
CPU1: CPU2:
---------- ----------------
tipc_nametbl_publish
spin_lock_bh(&tn->nametbl_lock)
tipc_nametbl_insert_publ
tipc_nameseq_insert_publ
tipc_subscrp_report_overlap
tipc_subscrp_get
tipc_subscrp_send_event
tipc_close_conn
tipc_subscrb_release_cb
tipc_subscrb_delete
tipc_subscrp_put
tipc_subscrp_put
tipc_subscrp_kref_release
tipc_nametbl_unsubscribe
spin_lock_bh(&tn->nametbl_lock)
<<grab nametbl_lock again>>
CPU1: CPU2:
---------- ----------------
tipc_nametbl_stop
spin_lock_bh(&tn->nametbl_lock)
tipc_purge_publications
tipc_nameseq_remove_publ
tipc_subscrp_report_overlap
tipc_subscrp_get
tipc_subscrp_send_event
tipc_close_conn
tipc_subscrb_release_cb
tipc_subscrb_delete
tipc_subscrp_put
tipc_subscrp_put
tipc_subscrp_kref_release
tipc_nametbl_unsubscribe
spin_lock_bh(&tn->nametbl_lock)
<<grab nametbl_lock again>>
In this commit, we advance the calling of tipc_nametbl_unsubscribe()
from the refcount cleanup to the intended callers.
Fixes: d094c4d5f5 ("tipc: add subscription refcount to avoid invalid delete")
Reported-by: John Thompson <thompa.atl@gmail.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix Coverity scan errors by not dereferencing lio->glists_dma_base pointer
if it's NULL.
See http://marc.info/?l=linux-netdev&m=149002294305614&w=2
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: VSR Burru <veerasenareddy.burru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When user_mss is zero, it means use the default value. But the current
codes don't permit user set TCP_MAXSEG to the default value.
It would return the -EINVAL when val is zero.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Zhou says:
====================
net-next sample action optimization v4
The sample action can be used for translating Openflow 'clone' action.
However its implementation has not been sufficiently optimized for this
use case. This series attempts to close the gap.
Patch 3 commit message has more details on the specific optimizations
implemented.
---
v3->v4: Enhance patch 4.
Fix two bugs pointed out by Pravin,
Remove 'is_sample' variable.
v2->v3: Enhance patch 4, Rafctor to move more common logic to clone_execute().
v1->v2: Address Pravin's comment, Refactor recirc and sample
to share more common code
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Added clone_execute() that both the sample and the recirc
action implementation can use.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the introduction of open flow 'clone' action, the OVS user space
can now translate the 'clone' action into kernel datapath 'sample'
action, with 100% probability, to ensure that the clone semantics,
which is that the packet seen by the clone action is the same as the
packet seen by the action after clone, is faithfully carried out
in the datapath.
While the sample action in the datpath has the matching semantics,
its implementation is only optimized for its original use.
Specifically, there are two limitation: First, there is a 3 level of
nesting restriction, enforced at the flow downloading time. This
limit turns out to be too restrictive for the 'clone' use case.
Second, the implementation avoid recursive call only if the sample
action list has a single userspace action.
The main optimization implemented in this series removes the static
nesting limit check, instead, implement the run time recursion limit
check, and recursion avoidance similar to that of the 'recirc' action.
This optimization solve both #1 and #2 issues above.
One related optimization attempts to avoid copying flow key as
long as the actions enclosed does not change the flow key. The
detection is performed only once at the flow downloading time.
Another related optimization is to rewrite the action list
at flow downloading time in order to save the fast path from parsing
the sample action list in its original form repeatedly.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logic of allocating and copy key for each 'exec_actions_level'
was specific to execute_recirc(). However, future patches will reuse
as well. Refactor the logic into its own function clone_key().
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
add_deferred_actions() API currently requires actions to be passed in
as a fully encoded netlink message. So far both 'sample' and 'recirc'
actions happens to carry actions as fully encoded netlink messages.
However, this requirement is more restrictive than necessary, future
patch will need to pass in action lists that are not fully encoded
by themselves.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern says:
====================
net: vrf: performance improvements
Device based features for VRF such as qdisc, netfilter and packet
captures are implemented by switching the dst on skbuffs to its per-VRF
dst. This has the effect of controlling the output function which points
a function in the VRF driver. [1] The skb proceeds down the stack with
dst->dev pointing to the VRF device. Netfilter, qdisc and tc rules and
network taps are evaluated based on this device. Finally, the skb makes
it to the vrf_xmit function which resets the dst based on a FIB lookup.
The feature comes at cost - between 5 and 10% depending on test (TCP vs
UDP, stream vs RR and IPv4 vs IPv6). The main cost is requiring a FIB
lookup in the VRF driver for each packet sent through it. The FIB lookup
is required because the real dst gets dropped so that the skb can
traverse the stack with dst->dev set to the VRF device.
All of that is really driven by the qdisc and not replicating the
processing of __dev_queue_xmit if a qdisc is set up on the device. But,
VRF devices by default do not have a qdisc and really have no need for
multiple Tx queues. This means the performance overhead is inflicted upon
all users for the potential use case of a qdisc being configured.
The overhead can be avoided by checking if the default configuration
applies to a specific VRF device before switching the dst. If a device
does not have a qdisc, the pass through netfilter hooks and packet taps
can be done inline without dropping the dst and thus avoiding the
performance penalty. With this change performance overhead of VRF drops
to neglible (difference with run-over-run variance) to 3% depending on
test type.
netperf performance comparison for 3 cases:
1. L3_MASTER_DEVICE compiled out
2. VRF with this patch set
3. current VRF code
IPv4
----
no-l3mdev new-vrf old-vrf
TCP_RR 28778 28938* 27169
TCP_CRR 10706 10490 9770
UDP_RR 30750 29813 29256
* Although higher in the final run used for submitting this patch set, I
think what this really represents is a neglible performance overhead for
VRF with this change (i.e, within the +-1% variance of runs). Most
notably the FIB lookups in the Tx path are avoided for TCP_RR.
IPv6
----
no-l3mdev new-vrf old-vrf
TCP_RR 29495 29432 27794
TCP_CRR 10520 10338 9870
UDP_RR 26137 27019* 26511
* UDP is consistently better with VRF for two reasons:
1. Source address selection with L3 domains is considering fewer
addresses since only addresses on interfaces in the domain are
considered for the selection. Specifically, perf-top shows
shows ipv6_get_saddr_eval, ipv6_dev_get_saddr and __ipv6_dev_get_saddr
running much lower with vrf than without.
2. The VRF table contains all routes (i.e, there are no separate local
and main tables per VRF). That means ip6_pol_route_output only has 1
lookup for VRF where it does 2 without it (1 in the local table and 1
in the main table).
[1] http://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The VRF driver allows users to implement device based features for an
entire domain. For example, a qdisc or netfilter rules can be attached
to a VRF device or tcpdump can be used to view packets for all devices
in the L3 domain.
The device-based features come with a performance penalty, most
notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
to switch the dst on an skb to its private dst. This allows the skb
to traverse the xmit stack with the device set to the VRF device
which in turn enables the netfilter and qdisc features. The VRF
driver then performs the FIB lookup again and reinserts the packet.
This patch avoids the redirect for IPv6 packets if a qdisc has not
been attached to a VRF device which is the default config. In this
case the netfilter hooks and network taps are directly traversed in
the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
then the redirect using the vrf dst is done.
Additional overhead is removed by only checking packet taps if a
socket is open on the device (vrf_dev->ptype_all list is not empty).
Packet sockets bound to any device will still get a copy of the
packet via the real ingress or egress interface.
The end result of this change is a decrease in the overhead of VRF
for the default, baseline case (ie., no netfilter rules, no packet
sockets, no qdisc) from a +3% improvement for UDP which has a lookup
per packet (VRF being better than no l3mdev) to ~2% loss for TCP_CRR
which connects a socket for each request-response.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VRF driver allows users to implement device based features for an
entire domain. For example, a qdisc or netfilter rules can be attached
to a VRF device or tcpdump can be used to view packets for all devices
in the L3 domain.
The device-based features come with a performance penalty, most
notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
to switch the dst on an skb to its private dst. This allows the skb
to traverse the xmit stack with the device set to the VRF device
which in turn enables the netfilter and qdisc features. The VRF
driver then performs the FIB lookup again and reinserts the packet.
This patch avoids the redirect for IPv4 packets if a qdisc has not
been attached to a VRF device which is the default config. In this
case the netfilter hooks and network taps are directly traversed in
the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
then the redirect using the vrf dst is done.
Additional overhead is removed by only checking packet taps if a
socket is open on the device (vrf_dev->ptype_all list is not empty).
Packet sockets bound to any device will still get a copy of the
packet via the real ingress or egress interface.
The end result of this change is a decrease in the overhead of VRF
for the default, baseline case (ie., no netfilter rules, no packet
sockets, no qdisc) to ~3% for UDP which has a lookup per packet and
< 1% overhead for connected sockets that leverage early demux and
avoid FIB lookups.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows reading of SK_MEMINFO_VARS via socket option. This way an
application can get all meminfo related information in single socket
option call instead of multiple calls.
Adds helper function, sk_get_meminfo(), and uses that for both
getsockopt and sock_diag_put_meminfo().
Suggested by Eric Dumazet.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The arguments packets and bytes to call mlxsw_sp_acl_rule_get_stats are
in the wrong order. Fix this by swapping them.
Detected by CoverityScan, CID#1419705 ("Arguments in wrong order")
Fixes: 7c1b8eb175 ("mlxsw: spectrum: Add support for TC flower offload statistics")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the unnecessary temporary variable 'err' from
sctp_association_init.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_stream_free uses struct sctp_stream as a param, but struct sctp_stream
is defined after it's declaration.
This patch is to declare struct sctp_stream before sctp_stream_free.
Fixes: a83863174a ("sctp: prepare asoc stream for stream reconf")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With posix timers having become optional, we get a build error with
the cpts time sync option of the CPSW driver:
drivers/net/ethernet/ti/cpts.c: In function 'cpts_find_ts':
drivers/net/ethernet/ti/cpts.c:291:23: error: implicit declaration of function 'ptp_classify_raw';did you mean 'ptp_classifier_init'? [-Werror=implicit-function-declaration]
This adds a hard dependency on PTP_CLOCK to avoid the problem, as
building it without PTP support makes no sense anyway.
Fixes: baa73d9e47 ("posix-timers: Make them configurable")
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dependency is reversed: cpsw and netcp call into cpts,
but cpts depends on the other two in Kconfig. This can lead
to cpts being a loadable module and its callers built-in:
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_remove':
cpsw.c:(.text.cpsw_remove+0xd0): undefined reference to `cpts_release'
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_rx_handler':
cpsw.c:(.text.cpsw_rx_handler+0x2dc): undefined reference to `cpts_rx_timestamp'
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_tx_handler':
cpsw.c:(.text.cpsw_tx_handler+0x7c): undefined reference to `cpts_tx_timestamp'
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_ndo_stop':
As a workaround, I'm introducing another Kconfig symbol to
control the compilation of cpts, while making the actual
module controlled by a silent symbol that is =y when necessary.
Fixes: 6246168b4a ("net: ethernet: ti: netcp: add support of cpts")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are using the smallest padding boundary (8 bytes), which isn't
smaller than the Memory Controller Read/Write Size
We get best performance in 100G when the Packing Boundary is a multiple
of the Maximum Payload Size. Its related to inefficient chopping of DMA
packets by PCIe, that causes more overhead on bus. So driver is helping
by making the starting address alignment to be MPS size.
We will try to determine PCIE MaxPayloadSize capabiltiy and set
IngPackBoundary based on this value. If cache line size is greater than
MPS or determinig MPS fails, we will use cache line size to determine
IngPackBoundary(as before).
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When building the driver as a module, we get a warning about the
lack of a license:
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/synopsys/dwc-xlgmac.o
see include/linux/module.h for more information
Curiously the text in the .c files only mentions GPLv2+, while the license
tag in the PCI driver contains both GPL and BSD. I picked the license text
as the more definite reference here and put a GPL tag in there.
Fixes: 65e0ace2c5 ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this header, we can run into a build error:
drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c: In function 'xlgmac_config_queue_mapping':
drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c:1548:36: error: 'IEEE_8021QAZ_MAX_TCS' undeclared (first use in this function)
prio_queues = min_t(unsigned int, IEEE_8021QAZ_MAX_TCS,
Fixes: 65e0ace2c5 ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jie Deng <jiedeng@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang says:
====================
r8152: fix the rx settings of RTL8153
The RMS and the rx early size should base on the same rx size. However,
the RMS is set to 9K bytes now and the rx early depends on mtu. For using
the rx buffer effectively, sync the two settings according to the mtu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
revert commit a59e6d8152 ("r8152: correct the rx early size") and
fix the rx early size as
(rx buffer size - rx packet size - rx desc size - alignment) / 4
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the received maximum size (RMS) according to the mtu size. It is
unnecessary to receive a packet which is more than the size we could
transmit. Besides, this could let the rx buffer be used effectively.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
neigh notifications today carry pid 0 for nlmsg_pid
in all cases. This patch fixes it to carry calling process
pid when available. Applications (eg. quagga) rely on
nlmsg_pid to ignore notifications generated by their own
netlink operations. This patch follows the routing subsystem
which already sets this correctly.
Reported-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The net_cls controller controls the classid field of each socket which
is associated with the cgroup. Because the classid is per-socket
attribute, when a task migrates to another cgroup or the configured
classid of the cgroup changes, the controller needs to walk all
sockets and update the classid value, which was implemented by
3b13758f51 ("cgroups: Allow dynamically changing net_classid").
While the approach is not scalable, migrating tasks which have a lot
of fds attached to them is rare and the cost is born by the ones
initiating the operations. However, for simplicity, both the
migration and classid config change paths call update_classid() which
scans all fds of all tasks in the target css. This is an overkill for
the migration path which only needs to cover a much smaller subset of
tasks which are actually getting migrated in.
On cgroup v1, this can lead to unexpected scalability issues when one
tries to migrate a task or process into a net_cls cgroup which already
contains a lot of fds. Even if the migration traget doesn't have many
to get scanned, update_classid() ends up scanning all fds in the
target cgroup which can be extremely numerous.
Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is
even worse. Before bfc2cf6f61 ("cgroup: call subsys->*attach() only
for subsystems which are actually affected by migration"), cgroup core
would call the ->css_attach callback even for controllers which don't
see actual migration to a different css.
As net_cls is always disabled but still mounted on cgroup v2, whenever
a process is migrated on the cgroup v2 hierarchy, net_cls sees
identity migration from root to root and cgroup core used to call
->css_attach callback for those. The net_cls ->css_attach ends up
calling update_classid() on the root net_cls css to which all
processes on the system belong to as the controller isn't used. This
makes any cgroup v2 migration O(total_number_of_fds_on_the_system)
which is horrible and easily leads to noticeable stalls triggering RCU
stall warnings and so on.
The worst symptom is already fixed in upstream by bfc2cf6f61
("cgroup: call subsys->*attach() only for subsystems which are
actually affected by migration"); however, backporting that commit is
too invasive and we want to avoid other cases too.
This patch updates net_cls's cgrp_attach() to iterate fds of only the
processes which are actually getting migrated. This removes the
surprising migration cost which is dependent on the total number of
fds in the target cgroup. As this leaves write_classid() the only
user of update_classid(), open-code the helper into write_classid().
Reported-by: David Goode <dgoode@fb.com>
Fixes: 3b13758f51 ("cgroups: Allow dynamically changing net_classid")
Cc: stable@vger.kernel.org # v4.4+
Cc: Nina Schiff <ninasc@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2017-03-21
This series contains updates to e1000, e1000e, igb, igbvf and ixgb.
This finishes up the work Philippe Reynes did to update the Intel drivers
to the new API for ethtool (get|set)_link_ksettings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This gets qmicli working with the MDM6600 modem.
Cc: Bjørn Mork <bjorn@mork.no>
Reviewed-by: Sebastian Reichel <sre@kernel.org>
Tested-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz says:
====================
qed: IOV related clenaups
This patch series targets IOV functionality [on both PF and VF].
Patches #2, #3 and #5 fix flows relating to malicious VFs, either by
upgrading and aligning current safe-guards or by correcing racy flows.
Patches #1 and #8 make some malicious/dysnfunctional VFs logging appear
by default in logs.
The rest of the patches either cleanup the existing code or else correct
some possible [yet fairly insignicant] issues in VF behavior.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The link information exists only on the leading hwfn,
but some of its derivatives [e.g., min/max rate] need to
be configured for each hwfn.
When re-basing the VF link view, use the leading hwfn
information as basis for all existing hwfns to allow
said configurations to stick.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Malicious VF existance should be interesting enough for the
hyperuser. Change the PF indication that one of its child VF
became malicious to appear by default.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PF<->VF interface allows for the VF to request
multiple queues closure via a single message, but this has
never been used by any official driver.
We now deprecate this option, forcing each queue close
to arrive via a different command; This would be required
for future TLVs that are going to extend the queue TLVs with
additional information on a per-queue basis.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PF needs to validate the status of VF queues before asking firmware
to configure anything for them, but that validation is done in various
different forms - sometimes inadequate.
Add auxillary functions that can be used for testing of the queue
state and convert the various flows to use those instead of current
existing flows; Also, add missing validations where needed.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When starting the VF's vport, the PF would first configure
the status blocks of the VF and then reset them.
That would cause some of the configured information to be lost -
specifically it would mean that all the VFs queues would use
the Rx coalescing state-machine of the status block.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When PF responds to the VF requests it also cleans the HW-channel
indication in firmware to allow further VF messages to arrive,
but the order currently applied is wrong -
The PF is copying by DMAE the response the VF is polling on for
completion, and only afterwards sets the HW-channel to ready state.
This creates a race condition where the VF would be able to send
an additional message to the PF before the channel would get ready
again, causing the firmware to consider the VF as malicious.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a VF is considered malicious, driver handling of the VF
FLR flow would clean said indication - but not if the FLR is
part of an sriov-disable flow.
That leads to further issues, as PF wouldn't re-enable the
previously malicious VF when sriov is re-enabled.
No reason for that - simply clean malicious indications in
the sriov-disable flow as well.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VFs are currently logging errors when communicating
with their PFs in a too-low verbosity that wouldn't
be shown by default. As timeouts and failed commands
are crucial for VF operability, make them appear by
default.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge of 'linux-kselftest-4.11-rc1':
1. Partially removed use of 'test_objs' target, breaking force rebuild of
BPFOBJ, introduced in commit d498f8719a ("bpf: Rebuild bpf.o for any
dependency update").
Update target so dependency on BPFOBJ is restored.
2. Introduced commit 2047f1d8ba ("selftests: Fix the .c linking rule")
which fixes order of LDLIBS.
Commit d02d8986a7 ("bpf: Always test unprivileged programs") added
libcap dependency into CFLAGS. Use LDLIBS instead to fix linking of
test_verifier.
3. Introduced commit d83c3ba0b9 ("selftests: Fix selftests build to
just build, not run tests").
Reordering the Makefile allows us to remove the 'all' target.
Tested both:
selftests/bpf$ make
and
selftests$ make TARGETS=bpf
on Ubuntu 16.04.2.
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled
while packets are collected on the error queue.
So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags
is not enough to safely assume that the skb contains
OPT_STATS data.
Add a bit in sock_exterr_skb to indicate whether the
skb contains opt_stats data.
Fixes: 1c885808e4 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
Reported-by: JongHwan Kim <zzoru007@gmail.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__sock_recv_timestamp can be called for both normal skbs (for
receive timestamps) and for skbs on the error queue (for transmit
timestamps).
Commit 1c885808e4
(tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING)
assumes any skb passed to __sock_recv_timestamp are from
the error queue, containing OPT_STATS in the content of the skb.
This results in accessing invalid memory or generating junk
data.
To fix this, set skb->pkt_type to PACKET_OUTGOING for packets
on the error queue. This is safe because on the receive path
on local sockets skb->pkt_type is never set to PACKET_OUTGOING.
With that, copy OPT_STATS from a packet, only if its pkt_type
is PACKET_OUTGOING.
Fixes: 1c885808e4 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
Reported-by: JongHwan Kim <zzoru007@gmail.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to fix the issue that sctp_prsctp_prune_sent forgot
to update q->out_qlen when removing a chunk from unsent queue.
Fixes: 8dbdf1f5b0 ("sctp: implement prsctp PRIO policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As tp->dst_pending_confirm's value can only be set 0 or 1, this
patch is to change to define it as a bit instead of __u32.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c86a773c78 ("sctp: add dst_pending_confirm flag") introduced
a temporary variable "confirm" in sctp_packet_transmit.
But it broke the rule that longer lines should be above shorter ones.
Besides, this variable is not necessary, so this patch is to just
remove it and use tp->dst_pending_confirm directly.
Fixes: c86a773c78 ("sctp: add dst_pending_confirm flag")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>