Total vports are already stored during eswitch initialization. Instead
of calculating everytime, read directly from eswitch.
Additionally, host PF's SF vport information is available using
QUERY_HCA_CAP command. It is not available through HCA_CAP of the
eswitch manager PF.
Hence, this patch prepares the return total eswitch vport count from the
existing eswitch struct.
This further helps to keep eswitch port counting macros and logic within
eswitch.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5_eswitch_get_total_vports() doesn't honor MLX5_ESWICH Kconfig flag.
When MLX5_ESWITCH is disabled, FS layer continues to initialize eswitch
specific ACL namespaces.
Instead, start honoring MLX5_ESWITCH flag and perform vport specific
initialization only when vport count is non zero.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Added .config_intr and .handle_interrupt callbacks.
Link event interrupt will trigger an interrupt every time when the link
goes up or down.
Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a printk message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We found that with the latest mainline kernel (5.12.0-051200rc8) on
some KVM instances / bare-metal systems, the following tests will take
longer than the kselftest framework default timeout (45 seconds) to
run and thus got terminated with TIMEOUT error:
* xfrm_policy.sh - took about 2m20s
* pmtu.sh - took about 3m5s
* udpgso_bench.sh - took about 60s
Bump the timeout setting to 5 minutes to allow them have a chance to
finish.
https://bugs.launchpad.net/bugs/1856010
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mat Martineau says:
====================
mptcp: Compatibility with common msg flags
These patches from the MPTCP tree handle some of the msg flags that are
typically used with TCP, to make it easier to adapt userspace programs
for use with MPTCP.
Patches 1, 2, and 4 add support for MSG_ERRQUEUE (no-op for now),
MSG_TRUNC, and MSG_PEEK on the receive side.
Patch 3 ignores unsupported msg flags for send and receive.
Patch 5 adds a selftest for MSG_PEEK.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend mptcp_connect tool with MSG_PEEK support and add a test case in
mptcp_connect.sh that checks the data received from/after recv() with
MSG_PEEK.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for MSG_PEEK flag. Packets are not removed
from the receive_queue if MSG_PEEK set in recv() system call.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently mptcp_sendmsg() fails with EOPNOTSUPP if the
user-space provides some unsupported flag. That is unexpected
and may foul existing applications migrated to MPTCP, which
expect a different behavior.
Change the mentioned function to silently ignore the unsupported
flags except MSG_FASTOPEN. This is the only flags currently not
supported by MPTCP with user-space visible side-effects.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/162
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mentioned flag is currently silenlty ignored. This
change implements the TCP-like behaviour, dropping the
pending data up to the specified length.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Sigend-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mptcp_recvmsg() currently silently ignores MSG_ERRQUEUE, returning
input data instead of error cmsg.
This change provides a dummy implementation for MSG_ERRQUEUE - always
returns no data. That is consistent with the current lack of a suitable
IP_RECVERR setsockopt() support.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2021-04-23
This series contains updates to i40e and iavf drivers.
Aleksandr adds support for VIRTCHNL_VF_CAP_ADV_LINK_SPEED in i40e which
allows for reporting link speed to VF as a value instead of using an
enum; helper functions are created to remove repeated code.
Coiby Xu reduces memory use of i40e when using kdump by reducing Tx, Rx,
and admin queue to minimum values. Current use causes failure of kdump.
Stefan Assmann removes duplicated free calls in iavf.
Haiyue cleans up a loop to return directly when if the value is found
and changes some magic numbers to defines for better maintainability
in iavf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata says:
====================
selftests: mlxsw: Fixes
This patch set carries fixes to selftest issues that we have hit in our
nightly regression run. Almost all are in mlxsw selftests, though one is in
a generic forwarding selftest.
- In patch #1, in an ERSPAN test, install an FDB entry as static instead of
(implicitly) as local.
- In the mlxsw resource-scale test, an if statement overrides the value of
$?, which is supposed to contain the result of the test. As a result, the
resource scale test can spuriously pass.
In patches #2 and #3, remove the if statements to fix the issue in,
respectively, port_scale test and tc_flower_scale tests.
- Again in the mlxsw resource-scale test, when more then one sub-test is
run, a successful sub-test overrides any previous failures. This causes a
spurious pass of the overall test. This is fixed in patch #4.
- In patch #5, increase a tolerance in a mlxsw-specific RED backlog test.
This test is very noisy, due to rounding errors and the unpredictability
of software traffic generation. By bumping the tolerance from 5 % to 10,
get the failure rate to zero. This shouldn't impact the accuracy,
mistakes in backlog configuration (e.g. due to wrong cell size) are
likely to cause a much larger discrepancy.
- In patch #6, fix mausezahn invocation in the mlxsw ERSPAN scale
test. The test failed because of the wrong invocation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The mirror_gre_scale test creates as many ERSPAN sessions as the underlying
chip supports, and tests that they all work. In order to determine that it
issues a stream of ICMP packets and checks if they are mirrored as
expected.
However, the mausezahn invocation missed the -6 flag to identify the use of
IPv6 protocol, and was sending ICMP messages over IPv6, as opposed to
ICMP6. It also didn't pass an explicit source IP address, which apparently
worked at some point in the past, but does not anymore.
To fix these issues, extend the function mirror_test() in mirror_lib by
detecting the IPv6 protocol addresses, and using a different ICMP scheme.
Fix __mirror_gre_test() in the selftest itself to pass a source IP address.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The intention behind this test is to make sure that qdisc limit is
correctly projected to the HW. However, first, due to rounding in the
qdisc, and then in the driver, the number cannot actually be accurate. And
second, the approach to testing this is to oversubscribe the port with
traffic generated on the same switch. The actual backlog size therefore
fluctuates.
In practice, this test proved to be noisier than the rest, and spuriously
fails every now and then. Increase the tolerance to 10 % to avoid these
issues.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the resource scale test checks a few cases, when the error code
resets between the cases. So for example, if one case fails and the
consecutive case passes, the error code eventually will fit the last test
and will be 0.
Save a new return code that will hold the 'or' return codes of all the
cases, so the final return code will consider all the cases.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the error return code of the failure condition is lost after
using an if statement, so the test doesn't fail when it should.
Remove the if statement that separates the condition and the error code
check, so the test won't always pass.
Fixes: abfce9e062 ("selftests: mlxsw: Reduce running time using offload indication")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the error return code of the failure condition is lost after
using an if statement, so the test doesn't fail when it should.
Remove the if statement that separates the condition and the error code
check, so the test won't always pass.
Fixes: 5154b1b826 ("selftests: mlxsw: Add a scale test for physical ports")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FDB roaming test installs a destination MAC address on the wrong
interface of an FDB database and tests whether the mirroring fails, because
packets are sent to the wrong port. The test by mistake installs the FDB
entry as local. This worked previously, because drivers were notified of
local FDB entries in the same way as of static entries. However that has
been fixed in the commit 6ab4c3117a ("net: bridge: don't notify switchdev
for local FDB addresses"), and local entries are not notified anymore. As a
result, the HW is not reconfigured for the FDB roam, and mirroring keeps
working, failing the test.
To fix the issue, mark the FDB entry as static.
Fixes: 9c7c8a8244 ("selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Third, and final, set of patches for v5.13. We got one more week
before the merge window and this includes from that extra week.
Smaller features to rtw88 and mt76, but mostly this contains fixes.
rtw88
* 8822c: Add gap-k calibration to improve long range performance
mt76
* parse rate power limits from DT
* debugfs file to test firmware crash
* debugfs to disable NAPI threaded mode
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJggrbhAAoJEG4XJFUm622boIoH/RBPQ/qebSDf96s1D2Sqs64o
d/vp2Av8q8RiMo45v+jMQSZzRPEqqyCI6idFseYHojg+keq9wkHzLuNML3OSarD5
0TwdYVJGJ++ErhWmtnS0bQ7yWhO5qGnGSDVapQpaJ+HVREYEwCbLv+BihsY8sDli
1IaOnDglG/BxOpQBZFVB4kuLfoLcrW0LdbuY8r9s3iypb6p8WLK5zlBHnj0ebc6h
bI1dtwGXWrIkGyK+WDyAOmWuTaRV94j6RNLE8W2XKClSWlpvA1NSzttAG6BBKXLj
s2X0x43OcLGvv3kwaHY5gR3OmuJz4TBkPrEz4wXlI9d/htCDLVk84jCC0d/+JMg=
=6Fio
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-2021-04-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for v5.13
Third, and final, set of patches for v5.13. We got one more week
before the merge window and this includes from that extra week.
Smaller features to rtw88 and mt76, but mostly this contains fixes.
rtw88
* 8822c: Add gap-k calibration to improve long range performance
mt76
* parse rate power limits from DT
* debugfs file to test firmware crash
* debugfs to disable NAPI threaded mode
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang says:
====================
r8152: adjust REALTEK_USB_DEVICE
Modify REALTEK_USB_DEVICE macro.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Redefine REALTEK_USB_DEVICE macro with USB_DEVICE_INTERFACE_CLASS and
USB_DEVICE_AND_INTERFACE_INFO to simply the code.
Although checkpatch.pl shows the following error, it is more readable.
ERROR: Macros with complex values should be enclosed in parentheses
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RTL8156 support CDC NCM mode. And users could set the configuration
of the USB device between vendor and NCM mode dynamically by themselves.
That is, the driver doesn't need to set vendor mode from NCM mode.
Fixes: 195aae321c ("r8152: support new chips")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch to support PTP Sync packet one-step timestamping
described one-step timestamping packet handling logic as below in
commit message:
- Trasmit packet immediately if no other one in transfer, or queue to
skb queue if there is already one in transfer.
The test_and_set_bit_lock() is used here to lock and check state.
- Start a work when complete transfer on hardware, to release the bit
lock and to send one skb in skb queue if has.
There was not problem of the description, but there was a mistake in
implementation. The locking/test_and_set_bit_lock() should be put in
enetc_start_xmit() which may be called by worker, rather than in
enetc_xmit(). Otherwise, the worker calling enetc_start_xmit() after
bit lock released is not able to lock again for transfer.
Fixes: 7294380c52 ("enetc: support PTP Sync packet one-step timestamping")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2021-04-23
1) The SPI flow key in struct flowi has no consumers,
so remove it. From Florian Westphal.
2) Remove stray synchronize_rcu from xfrm_init.
From Florian Westphal.
3) Use the new exit_pre hook to reset the netlink socket
on net namespace destruction. From Florian Westphal.
4) Remove an unnecessary get_cpu() in ipcomp, that
code is always called with BHs off.
From Sabrina Dubroca.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lipnitskiy says:
====================
mtk_eth_soc: fixes and performance improvements
Most of these changes come from OpenWrt where they have been present and
tested for months.
First three patches are bug fixes. The rest are performance
improvements. The last patch is a cleanup to use the iopoll.h macro for
busy-waiting instead of a custom loop.
v2:
- Reverse christmas tree in "use iopoll.h macro for DMA init"
- Use cond_resched() instead of iopoll.h macro in "reduce MDIO bus
access latency"
- Use napi_complete_done and rework NAPI callbacks in a new patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace a tight busy-wait loop without a pause with a standard
readx_poll_timeout_atomic routine with a 5 us poll period.
Tested by booting a MT7621 device to ensure the driver initializes
properly.
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This improves GRO performance
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[Ilya: Use MTK_RXD4_FOE_ENTRY instead of GENMASK(13, 0)]
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use napi_complete_done to communicate total TX and RX work done to NAPI.
Count total RX work up instead of remaining work down for clarity.
Remove unneeded local variables for clarity. Use do {} while instead of
goto for clarity.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid rearming interrupt if napi_complete returns false
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uncached memory access is expensive, and there is no need to access all
descriptor words if we can't process them anyway
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The value is only updated by the CPU, so it is cheaper to access from the
ring data structure than from a hardware register.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduces the number of interrupts under load
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[Ilya: add documentation for new struct fields]
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
256 descriptors is not enough for multi-gigabit traffic under load on
MT7622. Bump it to 512 to improve performance.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improves tx performance
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running short on descriptors, only stop the queue for the netdev that
tx was attempted for. By the time something tries to send on the other
netdev, the ring might have some more room already.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
usleep_range often ends up sleeping much longer than the 10-20us provided
as a range here. This causes significant latency in mdio bus acceses,
which easily adds multiple seconds to the boot time on MT7621 when polling
DSA slave ports.
Use cond_resched instead of usleep_range, since the MDIO access does not
take much time
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should improve performance, since it can use bulk free
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case build_skb fails, call skb_free_frag on the correct pointer. Also
update the DMA structures with the new mapping before exiting, because
the mapping was successful
Suggested-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since build_skb accesses the data area (for initializing shinfo), dma unmap
needs to happen before that call
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[Ilya: split build_skb cleanup fix into a separate commit]
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VLAN ID in the rx descriptor is only valid if the RX_DMA_VTAG bit is
set. Fixes frames wrongly marked with VLAN tags.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[Ilya: fix commit message]
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mana_gd_poll_cq() may return -1 if an overflow error is detected (this
should never happen unless there is a bug in the driver or the hardware).
Fix the type of the variable "comp_read" by using int rather than u32.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tw_prot_cleanup will check the twsk_prot.
Fixes: 0f5907af39 ("net: Fix potential memory leak in proto_register()")
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the FDIR entry is found, just return the result directly to break
the loop.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The maximum number (2) of flex-byte support is derived from ethtool
use-def data size (8 byte).
Change the magic number 2 to macro definition, and add the comment to
track the design thinking, so the code is clear and easily maintained.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Both iavf_free_all_tx_resources() and iavf_free_all_rx_resources() have
already been called in the very same function.
Remove the duplicate calls.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The minimum size of admin send/receive queue is 1 and 2 respectively.
The admin send queue can't be set to 1 because in that case, the
firmware would fail to init.
Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use the minimum of the number of descriptors thus we will allocate the
minimal ring buffers for kdump.
Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>