Commit Graph

785686 Commits

Author SHA1 Message Date
Shay Agroskin 4cb4e98e5b net/mlx5e: Added 'raw_errors_laneX' fields to ethtool statistics
These are counters for errors received on rx side, such as
FEC errors.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:32:57 -07:00
Shay Agroskin 67daf11860 net/mlx5: Added "per_lane_error_counters" cap bit to PCAM
Added "Per lane raw errors" capability bit in
Ports Capabilities Mask (PCAM) enhanced features
layout.

This bit determines if the fields "phy_raw_errors_laneX"
in "Physical Layer statistical" counters group are supported.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:32:36 -07:00
Shay Agroskin 6cfa946050 net/mlx5e: Ethtool driver callback for query/set FEC policy
Driver callback function for 'ethtool --show-fec',
'ethtool --set-fec' commands.

The query function returns active and configured FEC policy
for current link speed.

The set function sets FEC policy for all supported link
speeds.
1) If current link speed doesn't support requested FEC policy,
   the function fails.
2) If a different link speed doesn't support requested FEC
   policy, FEC capbilities for this speed are turned off.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:13:31 -07:00
Shay Agroskin 2095b26414 net/mlx5e: Add port FEC get/set functions
Added functions to query and set link FEC policy.
To get/set FEC capabilities in PPLM reg we need to query
current link speed.
'mlx5_get_fec_speed_field' queries current link speed and returns
correct field offset.

FEC Query's return value is divided into 'active FEC policy', which is
the FEC policy used by the link, and 'configured FEC policy', which
is the FEC policy requested by the user.
The two values may differ if:
1) FEC policy was configured to 'auto',
   in which case the active FEC policy would be the default FEC policy
   for current link speed.

2) FEC policy was changed, but no link reset is performed. In which case,
   the active FEC policy would become the configured one after a link
   reset.

FEC set function sets FEC policy for all link speeds and perform link
reset.
1) If current link speed doesn't support requested FEC policy,
   the function fails.
2) If a different link speed doesn't support requested FEC policy,
   FEC capbilities for this speed are turned off and a warning message
   is printed.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:13:31 -07:00
Shay Agroskin 4b5b9c7d97 net/mlx5: Add FEC fields to Port Phy Link Mode (PPLM) reg
Added FEC related fields to PPLM layout.
These fields are needed to set and query FEC policy
for different link speeds.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:13:31 -07:00
Vlad Buslov 2a4c429802 net/mlx5: Remove counter from idr after removing it from list
Fs_counters list can temporary become unsorted when new counters are
created/deleted concurrently. Idr is used to quickly lookup position to
insert new counter in logarithmic time. However, if new flows are
concurrently inserted during time window when flows with adjacent ids are
already removed from idr but are still present in counters list,
mlx5_fc_stats_work() observes counters list in inconsistent state, which
results following warning:

[ 1839.561955] mlx5_core 0000:81:00.0: mlx5_cmd_fc_bulk_get:587:(pid 729): Flow counter id (0x102d5) out of range (0x1c0a8..0x1c10b). Counter ignored.

Move idr_remove() call to be executed synchronously with counter deletion
from list. Extract this code to mlx5_fc_stats_remove() helper function that
is called by workqueue job handler mlx5_fc_stats_work().

Fixes: 12d6066c3b ("net/mlx5: Add flow counters idr")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2018-10-18 13:13:31 -07:00
Vlad Buslov fd33071303 net/mlx5: Take fs_counters dellist before addlist
In fs_counters elements from both addlist and dellist are removed by
mlx5_fc_stats_work() without any locking. This introduces race condition
when batch of new rules is created and then immediately deleted (for
example, when error occurred during flow creation). In such case some of
the rules might be in dellist, but not in addlist when mlx5_fc_stats_work()
is executed concurrently with tc, which will result rule deletion and
use-after-free on next iteration because deleted rules are still in
addlist.

Always take dellist first to guarantee that rules can only be deleted after
they were removed from addlist.

Fixes: 6e5e228391 ("net/mlx5: Add new list to store deleted flow counters")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reported-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2018-10-18 13:13:31 -07:00
Tariq Toukan 4972e6fa3a net/mlx5: Refactor fragmented buffer struct fields and init flow
Take struct mlx5_frag_buf out of mlx5_frag_buf_ctrl, as it is not
needed to manage and control the datapath of the fragmented buffers API.

struct mlx5_frag_buf contains control info to manage the allocation
and de-allocation of the fragmented buffer.
Its fields are not relevant for datapath, so here I take them out of the
struct mlx5_frag_buf_ctrl, except for the fragments array itself.

In addition, modified mlx5_fill_fbc to initialise the frags pointers
as well. This implies that the buffer must be allocated before the
function is called.

A set of type-specific *_get_byte_size() functions are replaced by
a generic one.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:13:31 -07:00
David S. Miller 3a3295bfa6 Merge branch 'sctp-fix-sk_wmem_queued-and-use-it-to-check-for-writable-space'
Xin Long says:

====================
sctp: fix sk_wmem_queued and use it to check for writable space

sctp doesn't count and use asoc sndbuf_used, sk sk_wmem_alloc and
sk_wmem_queued properly, which also causes some problem.

This patchset is to improve it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 11:23:47 -07:00
Xin Long cd305c74b0 sctp: use sk_wmem_queued to check for writable space
sk->sk_wmem_queued is used to count the size of chunks in out queue
while sk->sk_wmem_alloc is for counting the size of chunks has been
sent. sctp is increasing both of them before enqueuing the chunks,
and using sk->sk_wmem_alloc to check for writable space.

However, sk_wmem_alloc is also increased by 1 for the skb allocked
for sending in sctp_packet_transmit() but it will not wake up the
waiters when sk_wmem_alloc is decreased in this skb's destructor.

If msg size is equal to sk_sndbuf and sendmsg is waiting for sndbuf,
the check 'msg_len <= sctp_wspace(asoc)' in sctp_wait_for_sndbuf()
will keep waiting if there's a skb allocked in sctp_packet_transmit,
and later even if this skb got freed, the waiting thread will never
get waked up.

This issue has been there since very beginning, so we change to use
sk->sk_wmem_queued to check for writable space as sk_wmem_queued is
not increased for the skb allocked for sending, also as TCP does.

SOCK_SNDBUF_LOCK check is also removed here as it's for tx buf auto
tuning which I will add in another patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 11:23:47 -07:00
Xin Long 605c0ac182 sctp: count both sk and asoc sndbuf with skb truesize and sctp_chunk size
Now it's confusing that asoc sndbuf_used is doing memory accounting with
SCTP_DATA_SNDSIZE(chunk) + sizeof(sk_buff) + sizeof(sctp_chunk) while sk
sk_wmem_alloc is doing that with skb->truesize + sizeof(sctp_chunk).

It also causes sctp_prsctp_prune to count with a wrong freed memory when
sndbuf_policy is not set.

To make this right and also keep consistent between asoc sndbuf_used, sk
sk_wmem_alloc and sk_wmem_queued, use skb->truesize + sizeof(sctp_chunk)
for them.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 11:23:47 -07:00
David S. Miller 2d0f0ca2c7 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2018-10-17

This series adds support for the new igc driver.

The igc driver is the new client driver supporting the Intel I225
Ethernet Controller, which supports 2.5GbE speeds.  The reason for
creating a new client driver, instead of adding support for the new
device in e1000e, is that the silicon behaves more like devices
supported in igb driver.  It also did not make sense to add a client
part, to the igb driver which supports only 1GbE server parts.

This initial set of patches is designed for basic support (i.e. link and
pass traffic).  Follow-on patch series will add more advanced support
like VLAN, Wake-on-LAN, etc..
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 10:27:20 -07:00
David S. Miller 99e9acd85c mlx5-updates-2018-10-17
========================================================================
 
 From Or Gerlitz <ogerlitz@mellanox.com>:
 
 This series from Paul adds support to mlx5 e-switch tc offloading of multiple priorities and chains.
 
 This is made of four building blocks (along with few minor driver refactors):
 
 [1] Split FDB fast path prio to multiple namespaces
 
 Currently the FDB name-space contains two priorities, fast path (p0) and slow path (p1).
 The slow path contains the per representor SQ send-to-vport TX rule and the match-all
 RX miss rule. As a pre-step to support multi-chains and priorities, we split the FDB fast path
 to multiple namespaces  (sub namespaces), each with multiple priorities.
 
 [2] E-Switch chains and priorities
 
 A chain is a group of priorities. We use the fdb parallel sub-namespaces to implement chains,
 and a flow table for each priority in them.
 
 Because these namespaces are parallel and in series to the slow path
 fdb, the chains aren't connected to each other (but to the slow path),
 and one must use a explicit goto action to reach a different chain.
 
 Flow tables for the priorities are created on demand and destroyed
 once not used.
 
 [3] Add a no-append flow insertion mode, use it for TC offloads
 
 Enhance the driver fs core, such that if a no-append flag is set by the caller,
 we add a new FTE, instead of appending the actions of the inserted rule when
 the same match already exists.
 
 For encap rules, we defer the HW offloading till we have a valid neighbor. This can
 result in the packet hitting a lower priority rule in the HW DP. Use the no-append API
 to push these packets to the slow path FDB table, so they go to the TC kernel DP as done
 before priorities where supported.
 
 [4] Offloading tc priorities and chains for eswitch flows
 
 Using [1], [2] and [3] above we add the support for offloading both chains
 and priorities. To get to a new chain, use the tc goto action. We support
 a fixed prio range 1-16, and chains 0-3.
 =============================================================================
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbx6k1AAoJEEg/ir3gV/o+l40H/14rNaV27vefjuALgOvNX4DY
 iSI5UFv9ILnAemcD2xkVfJeGolwdzoRhCXJ5oyCylCPnP4tb9zgDgwu9V/WmIRG+
 DOaPLu+0V6jqfEGO5sXJPMhJNUR8WWAjfu66htJ0Nc1HV2OM5eYrcvjaYCfW4Egr
 QFWGyq4sPyYcpbb7wURbhmkfs8Vwxcj9c2cZIfXo3VJsKxULqU9Mj5hZnirI1OAy
 UhjLssb/8wfHmwNcqETI9ae7O+vPDMLkxdQvpviEBI+HJ7vZ6op2X4lVEsn/Bx2E
 /KrHGQObkwim8thTOYkQeJtqptWbiRvkpNnwryUV1fwjWPl6X1r3bXH7RdeRwCg=
 =aFCc
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-updates-2018-10-17

========================================================================

From Or Gerlitz <ogerlitz@mellanox.com>:

This series from Paul adds support to mlx5 e-switch tc offloading of multiple priorities and chains.

This is made of four building blocks (along with few minor driver refactors):

[1] Split FDB fast path prio to multiple namespaces

Currently the FDB name-space contains two priorities, fast path (p0) and slow path (p1).
The slow path contains the per representor SQ send-to-vport TX rule and the match-all
RX miss rule. As a pre-step to support multi-chains and priorities, we split the FDB fast path
to multiple namespaces  (sub namespaces), each with multiple priorities.

[2] E-Switch chains and priorities

A chain is a group of priorities. We use the fdb parallel sub-namespaces to implement chains,
and a flow table for each priority in them.

Because these namespaces are parallel and in series to the slow path
fdb, the chains aren't connected to each other (but to the slow path),
and one must use a explicit goto action to reach a different chain.

Flow tables for the priorities are created on demand and destroyed
once not used.

[3] Add a no-append flow insertion mode, use it for TC offloads

Enhance the driver fs core, such that if a no-append flag is set by the caller,
we add a new FTE, instead of appending the actions of the inserted rule when
the same match already exists.

For encap rules, we defer the HW offloading till we have a valid neighbor. This can
result in the packet hitting a lower priority rule in the HW DP. Use the no-append API
to push these packets to the slow path FDB table, so they go to the TC kernel DP as done
before priorities where supported.

[4] Offloading tc priorities and chains for eswitch flows

Using [1], [2] and [3] above we add the support for offloading both chains
and priorities. To get to a new chain, use the tc goto action. We support
a fixed prio range 1-16, and chains 0-3.
=============================================================================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 10:25:37 -07:00
David S. Miller 8f18da4721 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-10-18

1) Remove an unnecessary dev->tstats check in xfrmi_get_stats64.
   From Li RongQing.

2) We currently do a sizeof(element) instead of a sizeof(array)
   check when initializing the ovec array of the secpath.
   Currently this array can have only one element, so code is
   OK but error-prone. Change this to do a sizeof(array)
   check so that we can add more elements in future.
   From Li RongQing.

3) Improve xfrm IPv6 address hashing by using the complete IPv6
   addresses for a hash. From Michal Kubecek.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 09:57:42 -07:00
Gustavo A. R. Silva 82385b0d2d net: skbuff.h: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:31:30 -07:00
Arthur Kiyanovski 9fd255928d net: ena: enable Low Latency Queues
Use the new API to enable usage of LLQ.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:30:41 -07:00
Netanel Belgazal 8c590f9776 net: ena: Fix Kconfig dependency on X86
The Kconfig limitation of X86 is to too wide.
The ENA driver only requires a little endian dependency.

Change the dependency to be on little endian CPU.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:28:34 -07:00
David S. Miller a58598a497 Merge branch 'tcp_bbr-TCP-BBR-changes-for-EDT-pacing-model'
Neal Cardwell says:

====================
tcp_bbr: TCP BBR changes for EDT pacing model

Two small patches for TCP BBR to follow up with Eric's recent work to change
the TCP and fq pacing machinery to an "earliest departure time" (EDT) model:

- The first patch adjusts the TCP BBR logic to work with the new
  "earliest departure time" (EDT) pacing model.

- The second patch adjusts the TCP BBR logic to centralize the setting
  of gain values, to simplify the code and prepare for future changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:22:54 -07:00
Neal Cardwell cf33e25c0d tcp_bbr: centralize code to set gains
Centralize the code that sets gains used for computing cwnd and pacing
rate. This simplifies the code and makes it easier to change the state
machine or (in the future) dynamically change the gain values and
ensure that the correct gain values are always used.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:22:53 -07:00
Neal Cardwell a87c83d5ee tcp_bbr: adjust TCP BBR for departure time pacing
Adjust TCP BBR for the new departure time pacing model in the recent
commit ab408b6dc7 ("tcp: switch tcp and sch_fq to new earliest
departure time model").

With TSQ and pacing at lower layers, there are often several skbs
queued in the pacing layer, and thus there is less data "in the
network" than "in flight".

With departure time pacing at lower layers (e.g. fq or potential
future NICs), the data in the pacing layer now has a pre-scheduled
("baked-in") departure time that cannot be changed, even if the
congestion control algorithm decides to use a new pacing rate.

This means that there can be a non-trivial lag between when BBR makes
a pacing rate change and when the inter-skb pacing delays
change. After a pacing rate change, the number of packets in the
network can gradually evolve to be higher or lower, depending on
whether the sending rate is higher or lower than the delivery
rate. Thus ignoring this lag can cause significant overshoot, with the
flow ending up with too many or too few packets in the network.

This commit changes BBR to adapt its pacing rate based on the amount
of data in the network that it estimates has already been "baked in"
by previous departure time decisions. We estimate the number of our
packets that will be in the network at the earliest departure time
(EDT) for the next skb scheduled as:

   in_network_at_edt = inflight_at_edt - (EDT - now) * bw

If we're increasing the amount of data in the network ("in_network"),
then we want to know if the transmit of the EDT skb will push
in_network above the target, so our answer includes
bbr_tso_segs_goal() from the skb departing at EDT. If we're decreasing
in_network, then we want to know if in_network will sink too low just
before the EDT transmit, so our answer does not include the segments
from the skb departing at EDT.

Why do we treat pacing_gain > 1.0 case and pacing_gain < 1.0 case
differently? The in_network curve is a step function: in_network goes
up on transmits, and down on ACKs. To accurately predict when
in_network will go beyond our target value, this will happen on
different events, depending on whether we're concerned about
in_network potentially going too high or too low:

 o if pushing in_network up (pacing_gain > 1.0),
   then in_network goes above target upon a transmit event

 o if pushing in_network down (pacing_gain < 1.0),
   then in_network goes below target upon an ACK event

This commit changes the BBR state machine to use this estimated
"packets in network" value to make its decisions.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:22:53 -07:00
Vijay Khemka cb10c7c0df net/ncsi: Add NCSI Broadcom OEM command
This patch adds OEM Broadcom commands and response handling. It also
defines OEM Get MAC Address handler to get and configure the device.

ncsi_oem_gma_handler_bcm: This handler send NCSI broadcom command for
getting mac address.
ncsi_rsp_handler_oem_bcm: This handles response received for all
broadcom OEM commands.
ncsi_rsp_handler_oem_bcm_gma: This handles get mac address response and
set it to device.

Signed-off-by: Vijay Khemka <vijaykhemka@fb.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:14:54 -07:00
David S. Miller 1010c17ec5 Merge branch 'mscc-fixes'
Gustavo A. R. Silva says:

====================
fix signedness bug and memory leak in mscc driver

This patchset aims to fix a signedness bug in function
vsc85xx_downshift_get() and a memory leak in function
vsc8574_config_pre_init().

Changes in v3:
 - Add Quentin's Reviewed-by to commit log in patch 2/2.
 - Post the series to netdev.

Changes in v2:
 - Add Quentin's Reviewed-by to commit log in patch 1/2.
 - Jump to out label so all functions in the driver exit with the PHY
   set to access the standard page. Thanks to Quentin Schulz for
   pointing this out.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:08:56 -07:00
Gustavo A. R. Silva 47d20212aa net: phy: mscc: fix memory leak in vsc8574_config_pre_init
In case memory resources for *fw* were successfully allocated,
release them before return.

Addresses-Coverity-ID: 1473968 ("Resource leak")
Fixes: 00d70d8e0e ("net: phy: mscc: add support for VSC8574 PHY")
Reviewed-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:08:55 -07:00
Gustavo A. R. Silva e519869af3 net: phy: mscc: fix signedness bug in vsc85xx_downshift_get
Currently, the error handling for the call to function
phy_read_paged() doesn't work because *reg_val* is of
type u16 (16 bits, unsigned), which makes it impossible
for it to hold a value less than 0.

Fix this by changing the type of variable *reg_val* to int.

Addresses-Coverity-ID: 1473970 ("Unsigned compared against 0")
Fixes: 6a0bfbbe20 ("net: phy: mscc: migrate to phy_select/restore_page functions")
Reviewed-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:08:55 -07:00
Kyeongdon Kim 33c4368ee2 net: fix warning in af_unix
This fixes the "'hash' may be used uninitialized in this function"

net/unix/af_unix.c:1041:20: warning: 'hash' may be used uninitialized in this function [-Wmaybe-uninitialized]
  addr->hash = hash ^ sk->sk_type;

Signed-off-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:57:28 -07:00
Marek Behún 26422340da net: dsa: mv88e6xxx: Fix 88E6141/6341 2500mbps SERDES speed
This is a fix for the port_set_speed method for the Topaz family.
Currently the same method is used as for the Peridot family, but
this is wrong for the SERDES port.

On Topaz, the SERDES port is port 5, not 9 and 10 as in Peridot.
Moreover setting alt_bit on Topaz only makes sense for port 0 (for
(differentiating 100mbps vs 200mbps). The SERDES port does not
support more than 2500mbps, so alt_bit does not make any difference.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:56:15 -07:00
David S. Miller e943d94e4b Merge branch 'octeontx2-af-NPA-and-NIX-blocks-initialization'
Sunil Goutham says:

====================
octeontx2-af: NPA and NIX blocks initialization

This patchset is a continuation to earlier submitted patch series
to add a new driver for Marvell's OcteonTX2 SOC's
Resource virtualization unit (RVU) admin function driver.

octeontx2-af: Add RVU Admin Function driver
https://www.spinics.net/lists/netdev/msg528272.html

This patch series adds logic for the following.
- Modified register polling loop to use time_before(jiffies, timeout),
  as suggested by Arnd Bergmann.
- Support to forward interface link status notifications sent by
  firmware to registered PFs mapped to a CGX::LMAC.
- Support to set CGX LMAC in loopback mode, retrieve stats,
  configure DMAC filters at CGX level etc.
- Network pool allocator (NPA) functional block initialization,
  admin queue support, NPALF aura/pool contexts memory allocation, init
  and deinit.
- Network interface controller (NIX) functional block basic init,
  admin queue support, NIXLF RQ/CQ/SQ HW contexts memory allocation,
  init and deinit.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:43 -07:00
Geetha sowjanya 557dd485ea octeontx2-af: Support for disabling NIX RQ/SQ/CQ contexts
This patch adds support for a RVU PF/VF to disable all RQ/SQ/CQ
contexts of a NIX LF via mbox. This will be used by PF/VF drivers
upon teardown or while freeing up HW resources.

A HW context which is not INIT'ed cannot be modified and a
RVU PF/VF driver may or may not INIT all the RQ/SQ/CQ contexts.
So a bitmap is introduced to keep track of enabled NIX RQ/SQ/CQ
contexts, so that only enabled hw contexts are disabled upon LF
teardown.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:43 -07:00
Sunil Goutham ffb0abd7e9 octeontx2-af: NIX AQ instruction enqueue support
Add support for a RVU PF/VF to submit instructions to NIX AQ
via mbox. Instructions can be to init/write/read RQ/SQ/CQ/RSS
contexts. In case of read, context will be returned as part of
response to the mbox msg received.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:43 -07:00
Sunil Goutham 709a4f0c25 octeontx2-af: Alloc bitmaps for NIX Tx scheduler queues
Allocate bitmaps and memory for PFVF mapping info for
maintaining NIX transmit scheduler queues maintenance.
PF/VF drivers will request for alloc, free e.t.c of
Tx schedulers via mailbox.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:43 -07:00
Sunil Goutham 59360e9809 octeontx2-af: NIX LSO config for TSOv4/v6 offload
Config LSO formats for TSOv4 and TSOv6 offloads.
These formats tell HW which fields in the TCP packet's
headers have to be updated while performing segmentation
offload.

Also report PF/VF drivers the LSO format indices as part
of response to NIX_LF_ALLOC mbox msg. These indices are
used in SQE extension headers while framing SQE for pkt
transmission with TSO offload.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:43 -07:00
Sunil Goutham cb30711a6c octeontx2-af: NIX block LF initialization
Upon receiving NIX_LF_ALLOC mbox message allocate memory for
NIXLF's CQ, SQ, RQ, CINT, QINT and RSS HW contexts and configure
respective base iova HW. Enable caching of contexts into NIX NDC.

Return SQ buffer (SQB) size, this PF/VF MAC address etc info
e.t.c to the mbox msg sender.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:43 -07:00
Sunil Goutham aba53d5dbc octeontx2-af: NIX block admin queue init
Initialize NIX admin queue (AQ) i.e alloc memory for
AQ instructions and for the results. All NIX LFs will submit
instructions to AQ to init/write/read RQ/SQ/CQ/RSS contexts
and in case of read, get context from result memory.

Also before configuring/using NIX block calibrate X2P bus
and check if NIX interfaces like CGX and LBK are in active
and working state.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:43 -07:00
Geetha sowjanya 57856dde11 octeontx2-af: Support for disabling NPA Aura/Pool contexts
This patch adds support for a RVU PF/VF to disable all Aura/Pool
contexts of a NPA LF via mbox. This will be used by PF/VF drivers
upon teardown or while freeing up HW resources.

A HW context which is not INIT'ed cannot be modified and a
RVU PF/VF driver may or may not INIT all the Aura/Pool contexts.
So a bitmap is introduced to keep track of enabled NPA Aura/Pool
contexts, so that only enabled hw contexts are disabled upon LF
teardown.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:43 -07:00
Sunil Goutham 4a3581cd59 octeontx2-af: NPA AQ instruction enqueue support
Add support for a RVU PF/VF to submit instructions to NPA AQ
via mbox. Instructions can be to init/write/read Aura/Pool/Qint
contexts. In case of read, context will be returned as part of
response to the mbox msg received.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:42 -07:00
Sunil Goutham 3fa4c3232a octeontx2-af: NPA block LF initialization
Upon receiving NPA_LF_ALLOC mbox message allocate memory for
NPALF's aura, pool and qint contexts and configure the same
to HW. Enable caching of contexts into NPA NDC.

Return pool related info like stack size, num pointers per
stack page e.t.c to the mbox msg sender.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:42 -07:00
Sunil Goutham 7a37245ef2 octeontx2-af: NPA block admin queue init
Initialize NPA admin queue (AQ) i.e alloc memory for
AQ instructions and for the results. All NPA LFs will submit
instructions to AQ to init/write/read Aura/Pool contexts
and in case of read, get context from result memory.

Added some common APIs for allocating memory for a queue
and get IOVA in return, these APIs will be used by
NIX AQ and for other purposes.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:42 -07:00
Geetha sowjanya 23999b30ae octeontx2-af: Enable or disable CGX internal loopback
Add support to enable or disable internal loopback mode in CGX.
New mbox IDs CGX_INTLBK_ENABLE/DISABLE added for this.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:42 -07:00
Linu Cherian 61071a871e octeontx2-af: Forward CGX link notifications to PFs
Upon receiving notification from firmware the CGX event handler
in the AF driver gets the current link info such as status, speed,
duplex etc from CGX driver and sends it across to PFs who have
registered to receive such notifications.

To support above
 - Mbox messaging support for sending msgs from AF to PF has been added.
 - Added mbox msgs so that PFs can register/unregister for link events.
 - Link notifications are sent to PF under two scenarioss.
  1. When a asynchronous link change notification is received from
     firmware with notification flag turned on for that PF.
  2. Upon notification turn on request, the current link status is
     send to the PF.

Also added a new mailbox msg using which RVU PF/VF can retrieve
their mapped CGX LMAC's current link info. Link info includes
status, speed, duplex and lmac type.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:42 -07:00
Vidhya Raman 96be2e0da8 octeontx2-af: Support for MAC address filters in CGX
This patch adds support for setting MAC address filters in CGX
for PF interfaces. Also PF interfaces can be put in promiscuous
mode. Dataplane PFs access this functionality using mailbox
messages to the AF driver.

Signed-off-by: Vidhya Raman <vraman@marvell.com>
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:42 -07:00
Christina Jacob 66208910e5 octeontx2-af: Support to retrieve CGX LMAC stats
This patch adds support for a RVU PF/VF driver to retrieve
it's mapped CGX LMAC Rx and Tx stats from AF via mbox.
New mailbox msg is added is added.

Signed-off-by: Christina Jacob <cjacob@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:42 -07:00
Sunil Goutham 1435f66a28 octeontx2-af: CGX Rx/Tx enable/disable mbox handlers
Added new mailbox msgs for RVU PF/VFs to request AF
to enable/disable their mapped CGX::LMAC Rx & Tx.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:42 -07:00
Sunil Goutham 6ca3ee2f7d octeontx2-af: Improve register polling loop
Instead of looping on a integer timeout, use time_before(jiffies),
so that maximum poll time is capped.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 21:33:42 -07:00
David S. Miller 53e50a6ec2 Merge branch 'mlxsw-Add-VxLAN-support'
Ido Schimmel says:

====================
mlxsw: Add VxLAN support

This patchset adds support for VxLAN offload in the mlxsw driver.

With regards to the forwarding plane, VxLAN support is composed from two
main parts: Encapsulation and decapsulation.

In the device, NVE encapsulation (and VxLAN in particular) takes place
in the bridge. A packet can be encapsulated using VxLAN either because
it hit an FDB entry that forwards it to the router with the IP of the
remote VTEP or because it was flooded, in which case it is sent to a
list of remote VTEPs (in addition to local ports). In either case, the
VNI is derived from the filtering identifier (FID) the packet was
classified to at ingress and the underlay source IP is taken from a
device global configuration.

VxLAN decapsulation takes place in the underlay router, where packets
that hit a local route that corresponds to the source IP of the local
VTEP are decapsulated and injected to the bridge. The packets are
classified to a FID based on the VNI they came with.

The first six patches export the required APIs in the VxLAN and mlxsw
drivers in order to allow for the introduction of the NVE core in the
next two patches. The NVE core is designed to support a variety of NVE
encapsulations (e.g., VxLAN, NVGRE) and different ASICs, but currently
only VxLAN and Spectrum are supported. Spectrum-2 support will be added
in the future.

The last 10 patches add support for VxLAN decapsulation and
encapsulation and include the addition of the required switchdev APIs in
the VxLAN driver. These APIs allow capable drivers to get a notification
about the addition / deletion of FDB entries to / from the VxLAN's FDB.

Subsequent patchset will add selftests (generic and mlxsw-specific),
data plane learning, FDB extack and vetoing and support for VLAN-aware
bridges (one VNI per VxLAN device model).

v2:
* Implement netif_is_vxlan() using rtnl_link_ops->kind (Jakub & Stephen)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 17:45:08 -07:00
Ido Schimmel 1231e04f5b mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation
In the device, VxLAN encapsulation takes place in the FDB table where
certain {MAC, FID} entries are programmed with an underlay unicast IP.
MAC addresses that are not programmed in the FDB are flooded to the
relevant local ports and also to a list of underlay unicast IPs that are
programmed using the all zeros MAC address in the VxLAN driver.

One difference between the hardware and software data paths is the fact
that in the software data path there are two FDB lookups prior to the
encapsulation of the packet. First in the bridge's FDB table using {MAC,
VID} and another in the VxLAN's FDB table using {MAC, VNI}.

Therefore, when a new VxLAN FDB entry is notified, it is only programmed
to the device if there is a corresponding entry in the bridge's FDB
table. Similarly, when a new bridge FDB entry pointing to the VxLAN
device is notified, it is only programmed to the device if there is a
corresponding entry in the VxLAN's FDB table.

Note that the above scheme will result in a discrepancy between both
data paths if only one FDB table is populated in the software data path.
For example, if only the bridge's FDB is populated with an entry
pointing to a VxLAN device, then a packet hitting the entry will only be
flooded by the kernel to remote VTEPs whereas the device will also flood
the packets to other local ports member in the VLAN.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 17:45:08 -07:00
Ido Schimmel 1c30d1836a mlxsw: spectrum: Enable VxLAN enslavement to bridges
Enslavement of VxLAN devices to offloaded bridges was never forbidden by
mlxsw, but this patch makes sure the required configuration is performed
in order to allow VxLAN encapsulation and decapsulation to take place in
the device.

The patch handles both the case where a VxLAN device is enslaved to an
already offloaded bridge and the case where the first mlxsw port is
enslaved to a bridge that already has VxLAN device configured.

Invalid configurations are sanitized and an error string is returned via
extack.

Since encapsulation and decapsulation do not occur when the VxLAN device
is down, the driver makes sure to enable / disable these functionalities
based on NETDEV_PRE_UP and NETDEV_DOWN events.

Note that NETDEV_PRE_UP is used in favor of NETDEV_UP, as the former
allows to veto the operation, if necessary.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 17:45:08 -07:00
Ido Schimmel e9ba0fbc7d bridge: switchdev: Allow clearing FDB entry offload indication
Currently, an FDB entry only ceases being offloaded when it is deleted.
This changes with VxLAN encapsulation.

Devices capable of performing VxLAN encapsulation usually have only one
FDB table, unlike the software data path which has two - one in the
bridge driver and another in the VxLAN driver.

Therefore, bridge FDB entries pointing to a VxLAN device are only
offloaded if there is a corresponding entry in the VxLAN FDB.

Allow clearing the offload indication in case the corresponding entry
was deleted from the VxLAN FDB.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 17:45:08 -07:00
Petr Machata 045a5a9914 vxlan: Notify for each remote of a removed FDB entry
When notifications are sent about FDB activity, and an FDB entry with
several remotes is removed, the notification is sent only for the first
destination. That makes it impossible to distinguish between the case
where only this first remote is removed, and the one where the FDB entry
is removed as a whole.

Therefore send one notification for each remote of a removed FDB entry.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 17:45:08 -07:00
Petr Machata 0efe117333 vxlan: Support marking RDSTs as offloaded
Offloaded bridge FDB entries are marked with NTF_OFFLOADED. Implement a
similar mechanism for VXLAN, where a given remote destination can be
marked as offloaded.

To that end, introduce a new event, SWITCHDEV_VXLAN_FDB_OFFLOADED,
through which the marking is communicated to the vxlan driver. To
identify which RDST should be marked as offloaded, an
switchdev_notifier_vxlan_fdb_info is passed to the listeners. The
"offloaded" flag in that object determines whether the offloaded mark
should be set or cleared.

When sending offloaded FDB entries over netlink, mark them with
NTF_OFFLOADED.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 17:45:08 -07:00
Petr Machata 1941f1d645 vxlan: Add vxlan_fdb_find_uc() for FDB querying
A switchdev-capable driver that is aware of VXLAN may need to query
VXLAN FDB. In the particular case of mlxsw, this functionality is
limited to querying UC FDBs. Those being easier to deal with than the
general case of RDST chain traversal, introduce an interface to query
specifically UC FDBs: vxlan_fdb_find_uc().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 17:45:08 -07:00