Commit Graph

953034 Commits

Author SHA1 Message Date
David S. Miller f2e834694b Merge branch 'drop_monitor-Convert-to-use-devlink-tracepoint'
Ido Schimmel says:

====================
drop_monitor: Convert to use devlink tracepoint

Drop monitor is able to monitor both software and hardware originated
drops. Software drops are monitored by having drop monitor register its
probe on the 'kfree_skb' tracepoint. Hardware originated drops are
monitored by having devlink call into drop monitor whenever it receives
a dropped packet from the underlying hardware.

This patch set converts drop monitor to monitor both software and
hardware originated drops in the same way - by registering its probe on
the relevant tracepoint.

In addition to drop monitor being more consistent, it is now also
possible to build drop monitor as module instead of as a builtin and
still monitor hardware originated drops. Initially, CONFIG_NET_DEVLINK
implied CONFIG_NET_DROP_MONITOR, but after commit def2fbffe6
("kconfig: allow symbols implied by y to become m") we can have
CONFIG_NET_DEVLINK=y and CONFIG_NET_DROP_MONITOR=m and hardware
originated drops will not be monitored.

Patch set overview:

Patch #1 adds a tracepoint in devlink for trap reports.

Patch #2 prepares probe functions in drop monitor for the new
tracepoint.

Patch #3 converts drop monitor to use the new tracepoint.

Patches #4-#6 perform cleanups after the conversion.

Patch #7 adds a test case for drop monitor. Both software originated
drops and hardware originated drops (using netdevsim) are tested.

Tested:

| CONFIG_NET_DEVLINK | CONFIG_NET_DROP_MONITOR | Build | SW drops | HW drops |
| -------------------|-------------------------|-------|----------|----------|
|          y         |            y            |   v   |     v    |     v    |
|          y         |            m            |   v   |     v    |     v    |
|          y         |            n            |   v   |     x    |     x    |
|          n         |            y            |   v   |     v    |     x    |
|          n         |            m            |   v   |     v    |     x    |
|          n         |            n            |   v   |     x    |     x    |
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 18:01:27 -07:00
Ido Schimmel b7cc6d3c5c selftests: net: Add drop monitor test
Test that drop monitor correctly captures both software and hardware
originated packet drops.

# ./drop_monitor_tests.sh

Software drops test
    TEST: Capturing active software drops                               [ OK ]
    TEST: Capturing inactive software drops                             [ OK ]

Hardware drops test
    TEST: Capturing active hardware drops                               [ OK ]
    TEST: Capturing inactive hardware drops                             [ OK ]

Tests passed:   4
Tests failed:   0

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 18:01:26 -07:00
Ido Schimmel 93e155967c drop_monitor: Filter control packets in drop monitor
Previously, devlink called into drop monitor in order to report hardware
originated drops / exceptions. devlink intentionally filtered control
packets and did not pass them to drop monitor as they were not dropped
by the underlying hardware.

Now drop monitor registers its probe on a generic 'devlink_trap_report'
tracepoint and should therefore perform this filtering itself instead of
having devlink do that.

Add the trap type as metadata and have drop monitor ignore control
packets.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 18:01:26 -07:00
Ido Schimmel a848c05f4b drop_monitor: Remove duplicate struct
'struct net_dm_hw_metadata' is a duplicate of 'struct
devlink_trap_metadata'.

Remove the former and simplify the code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 18:01:26 -07:00
Ido Schimmel de9cbb81bd drop_monitor: Remove no longer used functions
The old probe functions that were invoked by drop monitor code are no
longer called and can thus be removed. They were replaced by actual
probe functions that are registered on the recently introduced
'devlink_trap_report' tracepoint.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 18:01:26 -07:00
Ido Schimmel 8ee2267ad3 drop_monitor: Convert to using devlink tracepoint
Convert drop monitor to use the recently introduced
'devlink_trap_report' tracepoint instead of having devlink call into
drop monitor.

This is both consistent with software originated drops ('kfree_skb'
tracepoint) and also allows drop monitor to be built as a module and
still report hardware originated drops.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 18:01:26 -07:00
Ido Schimmel 5855357cd4 drop_monitor: Prepare probe functions for devlink tracepoint
Drop monitor supports two alerting modes: Summary and packet. Prepare a
probe function for each, so that they could be later registered on the
devlink tracepoint by calling register_trace_devlink_trap_report(),
based on the configured alerting mode.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 18:01:26 -07:00
Ido Schimmel 5b88823bfe devlink: Add a tracepoint for trap reports
Add a tracepoint for trap reports so that drop monitor could register
its probe on it. Use trace_devlink_trap_report_enabled() to avoid
wasting cycles setting the trap metadata if the tracepoint is not
enabled.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 18:01:26 -07:00
David S. Miller 8333c1c4ee linux-can-next-for-5.10-20200930
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl90420THG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCpyVqK+u3vqdH0B/0bI+/KTQAN7OGq7pyBGUwzD8VEsJop
 si5JAGVBIi3RYimJRqZ3Q3+vJ/cT4rp1NJMaVRzE6xxqvxp4T4hWBB5y8KBp39zC
 8PjbIlI0RDqYnKAGd/9DXxkLHlbN9FwYSBpjIezzfEnRNhtnxJTMvDQ/nEUi+EEw
 x2HAw8P4Vvodc55CjPinpKRBpRQr8KKknkZ36idvKw3yuYzYPYcee3C70YK0GNOE
 SI8JXf9IV1cvDWleS6r93K03A/4cQdH/MQmuJWrz/wQoHnU7E+v1pLyI6/GxWAaQ
 YsOiVKisw+TqZUaGFlw+ZqqYrqsEjmwRYUzpzwLkLS0sCE8ntwInSZo6
 =W8ez
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-5.10-20200930' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2020-09-30

this is a pull request of 13 patches for net-next.

The first 10 target the mcp25xxfd driver (which is renamed to mcp251xfd during
this series).

The first two patches are by Thomas Kopp, which adds reference to the just
related errata and updates the documentation and log messages.

Dan Carpenter's patch fixes a resource leak during ifdown.

A patch by me adds the missing initialization of a variable.

Oleksij Rempel updates the DT binding documentation as requested by Rob
Herring.

The next 5 patches are by Thomas Kopp and me. During review Geert Uytterhoeven
suggested to use "microchip,mcp251xfd" instead of "microchip,mcp25xxfd" as the
DT autodetection compatible to avoid clashes with future but incompatible
devices. We decided not only to rename the compatible but the whole driver from
"mcp25xxfd" to "mcp251xfd". This is done in several patches.

Joakim Zhang contributes three patches for the flexcan driver. The first one
adds support for the ECC feature, which is implemented on some modern IP cores,
by initializing the controller's memory during ifup. The next patch adds
support for the i.MX8MP (which supports ECC) and the last patch properly
disables the runtime PM if device registration fails.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 15:21:43 -07:00
David S. Miller 11789fe776 Merge branch 'ionic-watchdog-training'
Shannon Nelson says:

====================
ionic watchdog training

Our link watchdog displayed a couple of unfriendly behaviors in some recent
stress testing.  These patches change the startup and stop timing in order
to be sure that expected structures are ready to be used by the watchdog.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 15:11:09 -07:00
Shannon Nelson 0816e0c818 ionic: prevent early watchdog check
In one corner case scenario, the driver device lif setup can
get delayed such that the ionic_watchdog_cb() timer goes off
before the ionic->lif is set, thus causing a NULL pointer panic.
We catch the problem by checking for a NULL lif just a little
earlier in the callback.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 15:11:09 -07:00
Shannon Nelson df8aeaa826 ionic: stop watchdog timer earlier on remove
We need to be better at making sure we don't have a link check
watchdog go off while we're shutting things down, so let's stop
the timer as soon as we start the remove.

Meanwhile, since that was the only thing in
ionic_dev_teardown(), simplify and remove that function.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 15:11:09 -07:00
David S. Miller 9b5cbf469d Merge branch 'tcp-exponential-backoff-in-tcp_send_ack'
Eric Dumazet says:

====================
tcp: exponential backoff in tcp_send_ack()

We had outages caused by repeated skb allocation failures in tcp_send_ack()

It is time to add exponential backoff to reduce number of attempts.
Before doing so, first patch removes icsk_ack.blocked to make
room for a new field (icsk_ack.retry)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:21:30 -07:00
Eric Dumazet a37c2134be tcp: add exponential backoff in __tcp_send_ack()
Whenever host is under very high memory pressure,
__tcp_send_ack() skb allocation fails, and we setup
a 200 ms (TCP_DELACK_MAX) timer before retrying.

On hosts with high number of TCP sockets, we can spend
considerable amount of cpu cycles in these attempts,
add high pressure on various spinlocks in mm-layer,
ultimately blocking threads attempting to free space
from making any progress.

This patch adds standard exponential backoff to avoid
adding fuel to the fire.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:21:30 -07:00
Eric Dumazet b6b6d6533a inet: remove icsk_ack.blocked
TCP has been using it to work around the possibility of tcp_delack_timer()
finding the socket owned by user.

After commit 6f458dfb40 ("tcp: improve latencies of timer triggered events")
we added TCP_DELACK_TIMER_DEFERRED atomic bit for more immediate recovery,
so we can get rid of icsk_ack.blocked

This frees space that following patch will reuse.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:21:30 -07:00
Alexandre Belloni 20c168be68 net: macb: move pdata to private header
struct macb_platform_data is only used by macb_pci to register the platform
device, move its definition to cadence/macb.h and remove platform_data/macb.h

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:18:19 -07:00
David S. Miller e13dbc4f41 Merge branch 'mlxsw-PFC-and-headroom-selftests'
Petr Machata says:

====================
mlxsw: PFC and headroom selftests

Recent changes in the headroom management code made it clear that an
automated way of testing this functionality is needed. This patchset brings
two tests: a synthetic headroom behavior test, which verifies mechanics of
headroom management. And a PFC test, which verifies whether this behavior
actually translates into a working lossless configuration.

Both of these tests rely on mlnx_qos[1], a tool that interfaces with Linux
DCB API. The tool was originally written to work with Mellanox NICs, but
does not actually rely on anything Mellanox-specific, and can be used for
mlxsw as well as for any other NIC-like driver. Unlike Open LLDP it does
support buffer commands and permits a fire-and-forget approach to
configuration, which makes it very handy for writing of selftests.

Patches #1-#3 extend the selftest devlink_lib.sh in various ways. Patch #4
then adds a helper wrapper for mlnx_qos to mlxsw's qos_lib.sh.

Patch #5 adds a test for management of port headroom.

Patch #6 adds a PFC test.

[1] https://github.com/Mellanox/mlnx-tools/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:06:54 -07:00
Petr Machata bfa804784e selftests: mlxsw: Add a PFC test
Add a test for PFC. Runs 10MB of traffic through a bottleneck and checks
that none of it gets lost.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:06:54 -07:00
Petr Machata a65cc53a0e selftests: mlxsw: Add headroom handling test
Add a test for headroom configuration. This covers projection of ETS
configuration to ingress, PFC, adjustments for MTU, the qdisc / TC
mode and the effect of egress SPAN session on buffer configuration.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:06:54 -07:00
Petr Machata 4b94a2fad8 selftests: mlxsw: qos_lib: Add a wrapper for running mlnx_qos
mlnx_qos is a script for configuration of DCB. Despite the name it is not
actually Mellanox-specific in any way. It is currently the only ad-hoc tool
available (in contrast to a daemon that manages an interface on an ongoing
basis). However, it is very verbose and parsing out error messages is not
really possible. Add a wrapper that makes it easier to use the tool.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:06:54 -07:00
Petr Machata 5b3a53c9c8 selftests: forwarding: devlink_lib: Support port-less topologies
Some selftests may not need any actual ports. Technically those are not
forwarding selftests, but devlink_lib can still be handy. Fall back on
NETIF_NO_CABLE in those cases.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:06:54 -07:00
Petr Machata 294f44c19f selftests: forwarding: devlink_lib: Add devlink_cell_size_get()
Add a helper that answers the cell size of the devlink device.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:06:54 -07:00
Petr Machata 6e0972e0c5 selftests: forwarding: devlink_lib: Split devlink_..._set() into save & set
Changing pool type from static to dynamic causes reinterpretation of
threshold values. They therefore need to be saved before pool type is
changed, then the pool type can be changed, and then the new values need
to be set up.

For that reason, set cannot subsume save, because it would be saving the
wrong thing, with possibly a nonsensical value, and restore would then fail
to restore the nonsensical value.

Thus extract a _save() from each of the relevant _set()'s. This way it is
possible to save everything up front, then to tweak it, and then restore in
the required order.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 14:06:54 -07:00
Joakim Zhang 5a9323f55d can: flexcan: disable runtime PM if register flexcandev failed
Disable runtime PM if register flexcandev failed, and balance reference
of usage_count.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20200929211557.14153-4-qiangqing.zhang@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 21:56:58 +02:00
Joakim Zhang 3aa2539536 can: flexcan: add flexcan driver for i.MX8MP
Add flexcan driver for i.MX8MP, which supports CAN FD and ECC.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20200929211557.14153-3-qiangqing.zhang@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 21:56:57 +02:00
Joakim Zhang a6597121d6 can: flexcan: initialize all flexcan memory for ECC function
One issue was reported at a baremetal environment, which is used for
FPGA verification. "The first transfer will fail for extended ID
format(for both 2.0B and FD format), following frames can be transmitted
and received successfully for extended format, and standard format don't
have this issue. This issue occurred randomly with high possiblity, when
it occurs, the transmitter will detect a BIT1 error, the receiver a CRC
error. According to the spec, a non-correctable error may cause this
transfer failure."

With FLEXCAN_QUIRK_DISABLE_MECR quirk, it supports correctable errors,
disable non-correctable errors interrupt and freeze mode. Platform has
ECC hardware support, but select this quirk, this issue may not come to
light. Initialize all FlexCAN memory before accessing them, at least it
can avoid non-correctable errors detected due to memory uninitialized.
The internal region can't be initialized when the hardware doesn't support
ECC.

According to IMX8MPRM, Rev.C, 04/2020. There is a NOTE at the section
11.8.3.13 Detection and correction of memory errors:
"All FlexCAN memory must be initialized before starting its operation in
order to have the parity bits in memory properly updated. CTRL2[WRMFRZ]
grants write access to all memory positions that require initialization,
ranging from 0x080 to 0xADF and from 0xF28 to 0xFFF when the CAN FD feature
is enabled. The RXMGMASK, RX14MASK, RX15MASK, and RXFGMASK registers need to
be initialized as well. MCR[RFEN] must not be set during memory initialization."

Memory range from 0x080 to 0xADF, there are reserved memory (unimplemented
by hardware, e.g. only configure 64 MBs), these memory can be initialized or not.
In this patch, initialize all flexcan memory which includes reserved memory.

In this patch, create FLEXCAN_QUIRK_SUPPORT_ECC for platforms which has ECC
feature. If you have a ECC platform in your hand, please select this
qurik to initialize all flexcan memory firstly, then you can select
FLEXCAN_QUIRK_DISABLE_MECR to only enable correctable errors.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20200929211557.14153-2-qiangqing.zhang@nxp.com
[mkl: wrap long lines]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 21:56:57 +02:00
Marc Kleine-Budde eb79a267c9 can: mcp251xfd: rename all remaining occurrence to mcp251xfd
In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver,
which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming,
but incompatible chips.

In the previous patch the autodetect DT compatbile has been renamed to
"microchip,mcp251xfd", this patch changes all non user facing occurrence of
"mcp25xxfd" to "mcp251xfd" and "MCP25XXFD" to "MCP251XFD".

[1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Link: https://lore.kernel.org/r/20200930091424.792165-10-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 21:55:28 +02:00
Marc Kleine-Budde f4f77366f2 can: mcp251xfd: rename all user facing strings to mcp251xfd
In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver,
which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming,
but incompatible chips.

In the previous patch the autodetect DT compatbile has been renamed to
"microchip,mcp251xfd", this patch changes all user facing strings from
"mcp25xxfd" to "mcp251xfd" and "MCP25XXFD" to "MCP251XFD", including:
- kconfig symbols
- name of kernel module
- DT and SPI compatible

[1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Link: https://lore.kernel.org/r/20200930091424.792165-9-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 21:55:04 +02:00
Marc Kleine-Budde 1f0e21a0c0 can: mcp251xfd: rename driver files and subdir to mcp251xfd
In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver,
which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming,
but incompatible chips.

In the previous patch the autodetect DT compatbile has been renamed to
"microchip,mcp251xfd", this patch changes the name of the driver subdir and the
individual files accordinly.

[1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Link: https://lore.kernel.org/r/20200930091424.792165-8-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 21:54:30 +02:00
Andrii Nakryiko f4d385e4d5 selftests/bpf: Test "incremental" btf_dump in C format
Add test validating that btf_dump works fine with BTFs that are modified and
incrementally generated.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929232843.1249318-5-andriin@fb.com
2020-09-30 12:30:46 -07:00
Andrii Nakryiko 9c6c5c48d7 libbpf: Make btf_dump work with modifiable BTF
Ensure that btf_dump can accommodate new BTF types being appended to BTF
instance after struct btf_dump was created. This came up during attemp to
use btf_dump for raw type dumping in selftests, but given changes are not
excessive, it's good to not have any gotchas in API usage, so I decided to
support such use case in general.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929232843.1249318-2-andriin@fb.com
2020-09-30 12:30:22 -07:00
Alexei Starovoitov ea7da1d563 Merge branch 'Various BPF helper improvements'
Daniel Borkmann says:

====================
This series adds two BPF helpers, that is, one for retrieving the classid
of an skb and another one to redirect via the neigh subsystem, and improves
also the cookie helpers by removing the atomic counter. I've also added
the bpf_tail_call_static() helper to the libbpf API that we've been using
in Cilium for a while now, and last but not least the series adds a few
selftests. For details, please check individual patches, thanks!

v3 -> v4:
  - Removed out_rec error path (Martin)
  - Integrate BPF_F_NEIGH flag into rejecting invalid flags (Martin)
    - I think this way it's better to avoid bit overlaps given it's
      right in the place that would need to be extended on new flags
v2 -> v3:
  - Removed double skb->dev = dev assignment (David)
  - Added headroom check for v6 path (David)
  - Set set flowi4_proto for ip_route_output_flow (David)
  - Rebased onto latest bpf-next
v1 -> v2:
  - Rework cookie generator to support nested contexts (Eric)
  - Use ip_neigh_gw6() and container_of() (David)
  - Rename __throw_build_bug() and improve comments (Andrii)
  - Use bpf_tail_call_static() also in BPF samples (Maciej)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-30 11:50:35 -07:00
Daniel Borkmann eef4a011f3 bpf, selftests: Add redirect_neigh selftest
Add a small test that exercises the new redirect_neigh() helper for the
IPv4 and IPv6 case.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/0fc7d9c5f9a6cc1c65b0d3be83b44b1ec9889f43.1601477936.git.daniel@iogearbox.net
2020-09-30 11:50:35 -07:00
Daniel Borkmann faef26fa44 bpf, selftests: Use bpf_tail_call_static where appropriate
For those locations where we use an immediate tail call map index use the
newly added bpf_tail_call_static() helper.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/3cfb2b799a62d22c6e7ae5897c23940bdcc24cbc.1601477936.git.daniel@iogearbox.net
2020-09-30 11:50:35 -07:00
Daniel Borkmann 0e9f6841f6 bpf, libbpf: Add bpf_tail_call_static helper for bpf programs
Port of tail_call_static() helper function from Cilium's BPF code base [0]
to libbpf, so others can easily consume it as well. We've been using this
in production code for some time now. The main idea is that we guarantee
that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the
JITed BPF insns with direct jumps instead of having to fall back to using
expensive retpolines. By using inline asm, we guarantee that the compiler
won't merge the call from different paths with potentially different
content of r2/r3.

We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable())
in different places as a neat trick to trigger compilation errors when
compiler does not remove code at compilation time. This works for the BPF
back end as it does not implement the __builtin_trap().

  [0] f5537c2602

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net
2020-09-30 11:50:35 -07:00
Daniel Borkmann b4ab314149 bpf: Add redirect_neigh helper as redirect drop-in
Add a redirect_neigh() helper as redirect() drop-in replacement
for the xmit side. Main idea for the helper is to be very similar
in semantics to the latter just that the skb gets injected into
the neighboring subsystem in order to let the stack do the work
it knows best anyway to populate the L2 addresses of the packet
and then hand over to dev_queue_xmit() as redirect() does.

This solves two bigger items: i) skbs don't need to go up to the
stack on the host facing veth ingress side for traffic egressing
the container to achieve the same for populating L2 which also
has the huge advantage that ii) the skb->sk won't get orphaned in
ip_rcv_core() when entering the IP routing layer on the host stack.

Given that skb->sk neither gets orphaned when crossing the netns
as per 9c4c325252 ("skbuff: preserve sock reference when scrubbing
the skb.") the helper can then push the skbs directly to the phys
device where FQ scheduler can do its work and TCP stack gets proper
backpressure given we hold on to skb->sk as long as skb is still
residing in queues.

With the helper used in BPF data path to then push the skb to the
phys device, I observed a stable/consistent TCP_STREAM improvement
on veth devices for traffic going container -> host -> host ->
container from ~10Gbps to ~15Gbps for a single stream in my test
environment.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
2020-09-30 11:50:35 -07:00
Daniel Borkmann 92acdc58ab bpf, net: Rework cookie generator as per-cpu one
With its use in BPF, the cookie generator can be called very frequently
in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg)
and attached to the root cgroup, for example, when used in v1/v2 mixed
environments. In particular, when there's a high churn on sockets in the
system there can be many parallel requests to the bpf_get_socket_cookie()
and bpf_get_netns_cookie() helpers which then cause contention on the
atomic counter.

As similarly done in f991bd2e14 ("fs: introduce a per-cpu last_ino
allocator"), add a small helper library that both can use for the 64 bit
counters. Given this can be called from different contexts, we also need
to deal with potential nested calls even though in practice they are
considered extremely rare. One idea as suggested by Eric Dumazet was
to use a reverse counter for this situation since we don't expect 64 bit
overflows anyways; that way, we can avoid bigger gaps in the 64 bit
counter space compared to just batch-wise increase. Even on machines
with small number of cores (e.g. 4) the cookie generation shrinks from
min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run
in parallel from multiple CPUs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
2020-09-30 11:50:35 -07:00
Daniel Borkmann b426ce83ba bpf: Add classid helper only based on skb->sk
Similarly to 5a52ae4e32 ("bpf: Allow to retrieve cgroup v1 classid
from v2 hooks"), add a helper to retrieve cgroup v1 classid solely
based on the skb->sk, so it can be used as key as part of BPF map
lookups out of tc from host ns, in particular given the skb->sk is
retained these days when crossing net ns thanks to 9c4c325252
("skbuff: preserve sock reference when scrubbing the skb."). This
is similar to bpf_skb_cgroup_id() which implements the same for v2.
Kubernetes ecosystem is still operating on v1 however, hence net_cls
needs to be used there until this can be dropped in with the v2
helper of bpf_skb_cgroup_id().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net
2020-09-30 11:50:34 -07:00
Song Liu 963ec27a10 bpf: fix raw_tp test run in preempt kernel
In preempt kernel, BPF_PROG_TEST_RUN on raw_tp triggers:

[   35.874974] BUG: using smp_processor_id() in preemptible [00000000]
code: new_name/87
[   35.893983] caller is bpf_prog_test_run_raw_tp+0xd4/0x1b0
[   35.900124] CPU: 1 PID: 87 Comm: new_name Not tainted 5.9.0-rc6-g615bd02bf #1
[   35.907358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1ubuntu1 04/01/2014
[   35.916941] Call Trace:
[   35.919660]  dump_stack+0x77/0x9b
[   35.923273]  check_preemption_disabled+0xb4/0xc0
[   35.928376]  bpf_prog_test_run_raw_tp+0xd4/0x1b0
[   35.933872]  ? selinux_bpf+0xd/0x70
[   35.937532]  __do_sys_bpf+0x6bb/0x21e0
[   35.941570]  ? find_held_lock+0x2d/0x90
[   35.945687]  ? vfs_write+0x150/0x220
[   35.949586]  do_syscall_64+0x2d/0x40
[   35.953443]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by calling migrate_disable() before smp_processor_id().

Fixes: 1b4d60ec16 ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-30 08:34:08 -07:00
Thomas Kopp dba1572c23 can: mcp25xxfd: narrow down wildcards in device tree bindings to "microchip,mcp251xfd"
The wildcard should be narrowed down to prevent existing and future devices
that are not compatible from matching. It is very unlikely that incompatible
devices will be released that do not match the wildcard.

Discussion Reference: https://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200930091423.755-1-thomas.kopp@microchip.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 11:23:53 +02:00
Thomas Kopp 0e051294c0 dt-binding: can: mcp251xfd: narrow down wildcards in device tree bindings to "microchip,mcp251xfd"
The wildcard should be narrowed down to prevent existing and future devices
that are not compatible from matching. It is very unlikely that incompatible
devices will be released that do not match the wildcard.

This is the documentation part of the commit.

Discussion Reference: https://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200930091423.755-2-thomas.kopp@microchip.com
[mkl: rename file, too]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 11:22:26 +02:00
Oleksij Rempel 9d5c8df1b9 dt-binding: can: mcp25xxfd: documentation fixes
Apply following fixes:
- Use 'interrupts'. (interrupts-extended will automagically be supported
  by the tools)
- *-supply is always a single item. So, drop maxItems=1
- add "additionalProperties: false" flag to detect unneeded properties.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20200923125301.27200-1-o.rempel@pengutronix.de
Reported-by: Rob Herring <robh@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Fixes: 1b5a78e69c ("dt-binding: can: mcp25xxfd: document device tree bindings")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 10:34:31 +02:00
Marc Kleine-Budde 727fba74b5 can: mcp25xxfd: mcp25xxfd_irq(): add missing initialization of variable set_normal mode
This patch fixes the following warning:

    drivers/net/can/spi/mcp25xxfd/mcp25xxfd-core.c:2155 mcp25xxfd_irq()
    error: uninitialized symbol 'set_normal_mode'.

by adding the missing initialization.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Fixes: 55e5b97f00 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Link: https://lore.kernel.org/r/20200923114726.2704426-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 10:34:30 +02:00
Dan Carpenter 8cffc6fe65 can: mcp25xxfd: mcp25xxfd_ring_free(): fix memory leak during cleanup
This loop doesn't free the first element of the array.  The "i > 0" has
to be changed to "i >= 0".

Fixes: 55e5b97f00 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20200923112752.GA1473821@mwanda
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 10:34:30 +02:00
Thomas Kopp f5b84dedf7 can: mcp25xxfd: mcp25xxfd_probe(): add SPI clk limit related errata information
This patch adds a reference to the recent released MCP2517FD and MCP2518FD
errata sheets and paste the explanation.

The driver already implements the proposed fix.

Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200925065606.358-1-thomas.kopp@microchip.com
[mkl: split into two patches, adjust subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 10:34:30 +02:00
Thomas Kopp 788b83ea2c can: mcp25xxfd: mcp25xxfd_handle_eccif(): add ECC related errata and update log messages
This patch adds a reference to the recent released MCP2517FD and MCP2518FD
errata sheets and paste the explanation.

The single error correction does not always work, so always indicate that a
single error occurred. If the location of the ECC error is outside of the
TX-RAM always use netdev_notice() to log the problem. For ECC errors in the
TX-RAM, there is a recovery procedure.

Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200925065606.358-1-thomas.kopp@microchip.com
[mkl: split into two patches, adjust subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30 10:34:30 +02:00
David S. Miller 611ba7536e Merge branch 'HW-support-for-VCAP-IS1-and-ES0-in-mscc_ocelot'
Vladimir Oltean says:

====================
HW support for VCAP IS1 and ES0 in mscc_ocelot

The patches from RFC series "Offload tc-flower to mscc_ocelot switch
using VCAP chains" have been split into 2:
https://patchwork.ozlabs.org/project/netdev/list/?series=204810&state=*

This is the boring part, that deals with the prerequisites, and not with
tc-flower integration. Apart from the initialization of some hardware
blocks, which at this point still don't do anything, no new
functionality is introduced.

- Key and action field offsets are defined for the supported switches.
- VCAP properties are added to the driver for the new TCAM blocks. But
  instead of adding them manually as was done for IS2, which is error
  prone, the driver is refactored to read these parameters from
  hardware, which is possible.
- Some improvements regarding the processing of struct ocelot_vcap_filter.
- Extending the code to be compatible with full and quarter keys.

This series was tested, along with other patches not yet submitted, on
the Felix and Seville switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29 18:26:42 -07:00
Vladimir Oltean 98642d1aa2 net: mscc: ocelot: look up the filters in flower_stats() and flower_destroy()
Currently a new filter is created, containing just enough correct
information to be able to call ocelot_vcap_block_find_filter_by_index()
on it.

This will be limiting us in the future, when we'll have more metadata
associated with a filter, which will matter in the stats() and destroy()
callbacks, and which we can't make up on the spot. For example, we'll
start "offloading" some dummy tc filter entries for the TCAM skeleton,
but we won't actually be adding them to the hardware, or to block->rules.
So, it makes sense to avoid deleting those rules too. That's the kind of
thing which is difficult to determine unless we look up the real filter.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29 18:26:24 -07:00
Vladimir Oltean 085f5b9162 net: mscc: ocelot: add a new ocelot_vcap_block_find_filter_by_id function
And rename the existing find to ocelot_vcap_block_find_filter_by_index.
The index is the position in the TCAM, and the id is the flow cookie
given by tc.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29 18:26:24 -07:00
Vladimir Oltean 642942637c net: mscc: ocelot: rename variable 'cnt' in vcap_data_offset_get()
The 'cnt' variable is actually used for 2 purposes, to hold the number
of sub-words per VCAP entry, and the number of sub-words per VCAP
action.

In fact, I'm pretty sure these 2 numbers can never be different from one
another. By hardware definition, the entry (key) TCAM rows are divided
into the same number of sub-words as its associated action RAM rows.
But nonetheless, let's at least rename the variables such that
observations like this one are easier to make in the future.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29 18:26:24 -07:00