Commit Graph

815932 Commits

Author SHA1 Message Date
Jiri Pirko 1667f7667d mlxsw: spectrum_acl: Push rehash start/end code into separate functions
In preparations for interrupt/continue of rehash work, put the code at
the beginning/end of the rehash function into separate functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 21:44:10 -08:00
Jiri Pirko 559c276810 mlxsw: spectrum_acl: Introduce new rehash context struct and save hint_priv there
Prepare for continued migration. Introduce a new structure to track
rehash context and save hint_priv into it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 21:44:10 -08:00
Jiri Pirko 6ca219e7de mlxsw: spectrum_acl: Don't migrate already migrated entry
Check if the entry is already in a chunk where we want it to be. In that
case, skip migration. This is preparation for "per parts" migration
where this situation may occur.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 21:44:10 -08:00
Jiri Pirko f9b274ce01 mlxsw: spectrum_acl: Push rehash dw struct into rehash sub-struct
More rehash related fields are going to come. Push "dw" into sub-struct
that will accommodate the others as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 21:44:10 -08:00
Heiner Kallweit ed8fe20205 net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode
When debugging another issue I faced an interrupt storm in this
driver (88E6390, port 9 in SGMII mode), consisting of alternating
link-up / link-down interrupts. Analysis showed that the driver
wanted to set a cmode that was set already. But so far
mv88e6390x_port_set_cmode() doesn't check this and powers down
SERDES, what causes the link to break, and eventually results in
the described interrupt storm.

Fix this by checking whether the cmode actually changes. We want
that the very first call to mv88e6390x_port_set_cmode() always
configures the registers, therefore initialize port.cmode with
a value that is different from any supported cmode value.
We have to take care that we only init the ports cmode once
chip->info->num_ports is set.

v2:
- add small helper and init the number of actual ports only

Fixes: 364e9d7776 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 21:37:05 -08:00
Florian Fainelli 91cf8eceff switchdev: Remove unused transaction item queue
There are no more in tree users of the
switchdev_trans_item_{dequeue,enqueue} or switchdev_trans_item structure
in the kernel since commit 00fc0c51e3 ("rocker: Change world_ops API
and implementation to be switchdev independant").

Remove this unused code and update the documentation accordingly since.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 21:35:19 -08:00
Daniel Borkmann 3612af783c bpf: fix sanitation rewrite in case of non-pointers
Marek reported that he saw an issue with the below snippet in that
timing measurements where off when loaded as unpriv while results
were reasonable when loaded as privileged:

    [...]
    uint64_t a = bpf_ktime_get_ns();
    uint64_t b = bpf_ktime_get_ns();
    uint64_t delta = b - a;
    if ((int64_t)delta > 0) {
    [...]

Turns out there is a bug where a corner case is missing in the fix
d3bd7413e0 ("bpf: fix sanitation of alu op with pointer / scalar
type from different paths"), namely fixup_bpf_calls() only checks
whether aux has a non-zero alu_state, but it also needs to test for
the case of BPF_ALU_NON_POINTER since in both occasions we need to
skip the masking rewrite (as there is nothing to mask).

Fixes: d3bd7413e0 ("bpf: fix sanitation of alu op with pointer / scalar type from different paths")
Reported-by: Marek Majkowski <marek@cloudflare.com>
Reported-by: Arthur Fabre <afabre@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-01 21:24:08 -08:00
David S. Miller 9bfc445e0e Merge branch 'doc-net-ieee802154-move-from-plain-text-to-rst'
Stefan Schmidt says:

====================
doc: net: ieee802154: move from plain text to rst

The ieee802154 subsystem doc was still in plain text. With the networking book
taking shape I thought it was time to do the first step and move it over to rst.
This really is only the minimal conversion. I need to take some time to update
and extend the docs.

The patches are based on net-next, but they only touch the networking book so I
would not expect and trouble. From what I have seen they would go through
Jonathan's tree after being acked by Dave? If you want this patches against a
different tree let me know.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 17:03:00 -08:00
Stefan Schmidt 8a42eda258 doc: net: ieee802154: remove old plain text docs after switching to rst
The plain text docs are converted to rst now, which allows us to remove
the old text file from the tree.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 17:03:00 -08:00
Stefan Schmidt 8e4a07405d doc: net: ieee802154: introduce IEEE 802.15.4 subsystem doc in rst style
Moving the ieee802154 docs from a plain text file into the new rst
style. This commit only does the minimal needed change to bring the
documentation over. Follow up patches will improve and extend on this.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 17:03:00 -08:00
Jakub Kicinski eeaadd8285 devlink: fix kdoc
devlink suffers from a few kdoc warnings:

net/core/devlink.c:5292: warning: Function parameter or member 'dev' not described in 'devlink_register'
net/core/devlink.c:5351: warning: Function parameter or member 'port_index' not described in 'devlink_port_register'
net/core/devlink.c:5753: warning: Function parameter or member 'parent_resource_id' not described in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Function parameter or member 'size_params' not described in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'top_hierarchy' description in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'reload_required' description in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'parent_reosurce_id' description in 'devlink_resource_register'
net/core/devlink.c:6451: warning: Function parameter or member 'region' not described in 'devlink_region_snapshot_create'
net/core/devlink.c:6451: warning: Excess function parameter 'devlink_region' description in 'devlink_region_snapshot_create'

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 17:00:22 -08:00
David S. Miller 6ae8762653 Merge branch 'net-aquantia-minor-bug-fixes-after-static-analysis'
Igor Russkikh says:

====================
net: aquantia: minor bug fixes after static analysis

This patchset fixes minor errors and warnings found by smatch and kasan.

Extra patch is to replace AQ_HW_WAIT_FOR with readx_poll_timeout
to improve readability.

V2:
use readx_poll
resubmitted to net-next since the changeset became quite big.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 16:45:16 -08:00
Nikita Danilov 0b926d461f net: aquantia: use better wrappers for state registers
Replace some direct registers reads with better
online functions.

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 16:45:16 -08:00
Nikita Danilov 6a7f227731 net: aquantia: replace AQ_HW_WAIT_FOR with readx_poll_timeout_atomic
David noticed the original define was hiding 'err' variable
reference. Thats confusing and counterintuitive.

Andrew noted the whole macro could be replaced with standard readx_poll
kernel macro. This makes code more readable.

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 16:45:16 -08:00
Igor Russkikh 8006e3730b net: aquantia: fixed instack structure overflow
This is a real stack undercorruption found by kasan build.

The issue did no harm normally because it only overflowed
2 bytes after `bitary` array which on most architectures
were mapped into `err` local.

Fixes: bab6de8fd1 ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.")
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 16:45:15 -08:00
Nikita Danilov 13b7997a10 net: aquantia: fixed buffer overflow
The overflow is detected by smatch:

drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c: 175
  aq_pci_func_free_irqs() error: buffer overflow 'self->aq_vec' 8 <= 31

In reality msix_entry_mask always restricts number of iterations.
Adding extra condition to make logic clear and smatch happy.

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 16:45:15 -08:00
Nikita Danilov ea4854ddbc net: aquantia: added newline at end of file
drivers/net/ethernet/aquantia/atlantic/aq_nic.c: 991:1:
  warning: no newline at end of file

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 16:45:15 -08:00
Nikita Danilov ff83dbf21e net: aquantia: fixed memcpy size
Not careful array dereference caused analysis tools
to think there could be memory overflow.

There was actually no corruption because the array is
two dimensional.

drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c: 140
  aq_ethtool_get_strings() error:
    memcpy() '*aq_ethtool_stat_names' too small (32 vs 704)

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 16:45:15 -08:00
Hangbin Liu 5e1a99eae8 ipv4: Add ICMPv6 support when parse route ipproto
For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers.
But for ip -6 route, currently we only support tcp, udp and icmp.

Add ICMPv6 support so we can match ipv6-icmp rules for route lookup.

v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to
rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family.

Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: eacb9384a3 ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 16:41:27 -08:00
Yonghong Song b74e21ab7d samples/bpf: silence compiler warning for xdpsock_user.c
Compiling xdpsock_user.c with 4.8.5, I hit the following
compilation warning:
    HOSTCC  samples/bpf/xdpsock_user.o
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c: In function ‘main’:
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:449:6: warning: ‘idx_cq’ may be used unini
  tialized in this function [-Wmaybe-uninitialized]
    u32 idx_cq, idx_fq;
        ^
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:606:7: warning: ‘idx_rx’ may be used unini
  tialized in this function [-Wmaybe-uninitialized]
     u32 idx_rx, idx_tx = 0;
         ^
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:506:6: warning: ‘idx_rx’ may be used unini
  tialized in this function [-Wmaybe-uninitialized]
    u32 idx_rx, idx_fq = 0;

As an example, the code pattern looks like:
    u32 idx_cq;
    ...
    ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
    if (ret) {
      ...
    }
    ... idx_fq ...
The compiler warns since it does not know whether &idx_fq is assigned
or not inside the library function xsk_ring_prod__reserve().

Let us assign an initial value 0 to such auto variables to silence
compiler warning.

Fixes: 248c7f9c0e ("samples/bpf: convert xdpsock to use libbpf for AF_XDP access")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-02 01:07:10 +01:00
Yonghong Song a83de90658 selftests/bpf: set unlimited RLIMIT_MEMLOCK for test_sock_fields
This is to avoid permission denied error. A lot of systems
may have a much lower number, e.g., 64KB, for RLIMIT_MEMLOCK,
which may not be sufficient for the test to run successfully.

Fixes: e0b27b3f97 ("bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-02 01:06:23 +01:00
Daniel Borkmann 4269f69bc9 Merge branch 'bpf-doc-improvements'
Andrii Nakryiko says:

====================
A bunch of BPF-related docs typo, wording and formatting fixes.

v1->v2:
- split off non-documentation changes into separate patchset
====================

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-02 00:40:05 +01:00
Andrii Nakryiko 46604676c8 docs/bpf: minor casing/punctuation fixes
Fix few casing and punctuation glitches.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-02 00:40:04 +01:00
Andrii Nakryiko 9ab5305dbe docs/btf: reflow text to fill up to 78 characters
Reflow paragraphs to more fully and evenly fill 78 character lines.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-02 00:40:04 +01:00
Andrii Nakryiko 5efc529fb4 docs/btf: fix typos, improve wording
Fix various typos, some of the formatting and wording for
Documentation/btf.rst.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-02 00:40:04 +01:00
Eric Dumazet 4b9113045b bpf: fix u64_stats_init() usage in bpf_prog_alloc()
We need to iterate through all possible cpus.

Fixes: 492ecee892 ("bpf: enable program stats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-02 00:31:36 +01:00
Paul Burton d1a2930d8a MIPS: eBPF: Fix icache flush end address
The MIPS eBPF JIT calls flush_icache_range() in order to ensure the
icache observes the code that we just wrote. Unfortunately it gets the
end address calculation wrong due to some bad pointer arithmetic.

The struct jit_ctx target field is of type pointer to u32, and as such
adding one to it will increment the address being pointed to by 4 bytes.
Therefore in order to find the address of the end of the code we simply
need to add the number of 4 byte instructions emitted, but we mistakenly
add the number of instructions multiplied by 4. This results in the call
to flush_icache_range() operating on a memory region 4x larger than
intended, which is always wasteful and can cause crashes if we overrun
into an unmapped page.

Fix this by correcting the pointer arithmetic to remove the bogus
multiplication, and use braces to remove the need for a set of brackets
whilst also making it obvious that the target field is a pointer.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: b6bd53f9c4 ("MIPS: Add missing file for eBPF JIT.")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-02 00:04:15 +01:00
Eran Ben Elisha 85327a9c41 net/mlx5: Update the list of the PCI supported devices
Add the upcoming ConnectX-6 Dx.

In addition, add "ConnectX Family mlx5Gen Virtual Function" device ID.
Every new HCA VF will be identified with this device ID. Different VF
models will be distinguished by their revision id.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:18 -08:00
Roi Dayan 10fbb1cdd0 net/mlx5e: Set peer flow needed also for multipath
Update the predicate that determines if to duplicate rules installed on
vport reps to account also for the multipath case.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:18 -08:00
Roi Dayan 68931c7dd7 net/mlx5e: Update check for merged eswitch device
The current check only validates if both netdevs use the same ops
which means both are vf reps or both uplink reps.

Unlike the case where the two uplinks are bonded (VF LAG), under
multipath scheme the switchdev parent id is not unified between the
uplink reps (and all the associated vf reps). However, we still want
to duplicate in the driver encap flows, adjust the merged eswitch
check for that matter.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:18 -08:00
Roi Dayan 5fb091e813 net/mlx5e: Use hint to resolve route when in HW multipath mode
As part of creating the tunnel headers while offloading TC encap rules,
we resolve the route and neighbour in order to get the source /
destination mac.

Since the way we offload multipath route is by having two HW rules,
one per uplink port, doing naive route lookup might get us a "wrong"
routing path which goes through the peer uplink and this will get us
eventually to create a wrong L2 header for the tunnel.

To avoid that, we use a device hint to get the correct route.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:17 -08:00
Roi Dayan 316d5f72b4 net/mlx5e: Always query offloaded tc peer rule counter
Under multipath when encap rules are duplicated to HW in the driver,
it's possible for one flow to be currently un-offloaded (e.g. lack of
next-hop route or neigh entry) while the other flow is offloaded. As
such, we move to query the counters of both flows at all times.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:17 -08:00
Roi Dayan b4a23329e2 net/mlx5e: Re-attempt to offload flows on multipath port affinity events
Under multipath it's possible for us to offload the flow only through
the e-switch for which proper route through the uplink exists.
When the port is up and the next-hop route is set again we want to
offload through it as well.

We generate SW event from the FIB event handler when multipath port
affinity changes. The tc offloads code gets this event, goes over the
flows which were marked as of having missing route and attempts to
offload them.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:17 -08:00
Roi Dayan 6997b1c9ca net/mlx5: Emit port affinity event for multipath offloads
Under multipath offload scheme, as part of handling fib events, emit
mlx5 port affinity event on the enabled ports which will be handled by
the tc offloads code.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:17 -08:00
Roi Dayan ef06c9ee89 net/mlx5e: Allow one failure when offloading tc encap rules under multipath
In a similar manner to uplink/VF LAG, under multipath we add encap peer
rule on the second port as well.

However, unlike the LAG case, we do want to allow failure for adding
one of the rules. This happens due to using a routing hint while doing
the route lookup when one path (next hop device) is down.

Introduce a new flag to indicate that route lookup failed for encap
flow. Note that a flow may still not be offloaded to hw due to missing
neighbour, in that case, the neigh update event will take care of it.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:16 -08:00
Roi Dayan 95dc1902c3 net/mlx5e: Don't inherit flow flags on peer flow creation
Currently the peer flow inherits the flags from the original flow
after we've set it. At this time the flags are set according to
the flow state, e.g marked as going to slow path and such.

Even if not getting us to real bugs now, this opens the door to
get us to troubles later. Future proof the code and avoid the
inheritance, use the peer flags as were set on input when we
started adding the original flow.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:16 -08:00
Roi Dayan 544fe7c2e6 net/mlx5e: Activate HW multipath and handle port affinity based on FIB events
To support multipath offload we are going to track SW multipath route
and related nexthops. To do that we register to FIB notifier and handle
the route and next-hops events and reflect that as port affinity to HW.

When there is a new multipath route entry that all next-hops are the
ports of an HCA we will activate LAG in HW.

Egress wise, we use HW LAG as the means to emulate multipath on current
HW which doesn't support port selection based on xmit hash. In the
presence of multiple VFs which use multiple SQs (send queues) this
yields fairly good distribution.

HA wise, HW LAG buys us the ability for a given RQ (receive queue) to
receive traffic from both ports and for SQs to migrate xmitting over
the active port if their base port fails.

When the route entry is being updated to single path we will update
the HW port affinity to use that port only.

If a next-hop becomes dead we update the HW port affinity to the living
port.

When all next-hops are alive again we reset the affinity to default.

Due to FW/HW limitations, when a route is deleted we are not disabling
the HW LAG since doing so will not allow us to enable it again while
VFs are bounded. Typically this is just a temporary state when a
routing daemon removes dead routes and later adds them back as needed.

This patch only handles events for AF_INET.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:16 -08:00
Roi Dayan 724b509ca0 net/mlx5: Add multipath mode
In order to offload ecmp-on-host scheme where next-hop routes are used,
we will make use of HW LAG. Add accessor function to let upper layers
in the driver to realize if the lag acts in multi-path mode.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:16 -08:00
Roi Dayan e6ee5e7166 net/mlx5: Use own workqueue for lag netdev events processing
Instead of using the system workqueue, allocate our own workqueue.
This workqueue will be used to handle more work in the next patch.

This patch doesn't change functionality.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:16 -08:00
Roi Dayan 10a193ed78 net/mlx5: Expose lag operations in header file
The change is a refactoring step towards a multipath use case.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:15 -08:00
Roi Dayan bb19ad0d8d net/mlx5: Use unsigned int bit instead of bool as a struct member
This fix checkpatch check
CHECK: Avoid using bool structure members because of possible alignment
issues

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:15 -08:00
Roi Dayan 0ad060ee9c net/mlx5e: Don't make internal use of errno to denote missing neigh
EAGAIN is treated as a specific case when we consider the attachment
successful but wait for neigh event before offloading the flow.
This can result in unwanted behavior when sub calls on the offloading
path will return EAGAIN and we pass this error up.

Instead of attaching to a specific error code return a  boolean value
from the attach encap operation saying if the encap is valid or not.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:15 -08:00
Roi Dayan 733d4f367c net/mlx5e: Cleanup attach encap function
Remove the tunnel info argument which we can get from the other args.
Also reorder the args to have input args first and output args later.

This patch doesn't change functionality.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:15 -08:00
Eran Ben Elisha 6bdbc1cb6c net/mlx5e: Declare mlx5e_tx_reporter_recover_from_ctx as static
Function mlx5e_tx_reporter_recover_from_ctx is only used within mlx5e tx
reporter, move it to be statically declared in en/reporter_tx.c.

Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01 12:04:15 -08:00
David S. Miller 699be71534 Merge branch 'nfp-control-processor-DMA-support-and-RJ45'
Jakub Kicinski says:

====================
nfp: control processor DMA support and RJ45

This series starts with adding support for reporting twisted pair
media type in ethtool.

Remaining patches add support for using DMA with the control/service
processor.  Currently we always copy the command data into card's
memory.  DMA support allows us to have the NSP read the data from
host memory by itself.  Unfortunately, the FW loading and flashing
cannot directly map the buffers for DMA because (a) the firmware
ABI returns const buffers, and (b) the buffers may be vmalloc()ed
in many mysterious/unmappable way.  So just bite the bullet -
allocate new host buffer for the command and copy.

As Dirk explains, the NSP now supports updating all FWs at once
which means the max flashing time grew significantly.  He bumps
the max wait to avoid timeouts.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 11:36:01 -08:00
Dirk van der Merwe 35697764d7 nfp: nsp: set higher timeout for flash bundle
The management firmware now supports being passed a bundle with
multiple components to be stored in flash at once. This makes it
easier to update all components to a known state with a single
user command, however, this also has the potential to increase
the time required to perform the update significantly.

The management firmware only updates the components out of a bundle
which are outdated, however, we need to make sure we can handle
the absolute worst case where a CPLD update can take a long time
to perform.

We set a very conservative total timeout of 900s which already
adds a contingency.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 11:36:01 -08:00
Jakub Kicinski 345415138d nfp: nsp: allow the use of DMA buffer
Newer versions of NSP can access host memory.  Simplest access
type requires all data to be in one contiguous area.  Since we
don't have the guarantee on where callers of the NSP ABI will
allocate their buffers we allocate a bounce buffer and copy
the data in and out.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 11:36:01 -08:00
Jakub Kicinski 66487abe2f nfp: nsp: move default buffer handling into its own function
DMA version of NSP communication is coming, move the code which
copies data into the NFP buffer into a separate function.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 11:36:00 -08:00
Jakub Kicinski 882cdcb5d3 nfp: nsp: use fractional size of the buffer
NSP expresses the buffer size in MB and 4 kB blocks.  For small
buffers the kB part may make a difference, so count it in.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 11:36:00 -08:00
Jakub Kicinski 1e301a1407 nfp: report RJ45 connector in ethtool
Add support for reporting twisted pair port type.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01 11:36:00 -08:00