Commit Graph

4821 Commits

Author SHA1 Message Date
Eli Britstein 97417f6182 net/mlx5e: Fix GRE key by controlling port tunnel entropy calculation
Flow entropy is calculated on the inner packet headers and used for
flow distribution in processing, routing etc. For GRE-type
encapsulations the entropy value is placed in the eight LSB of the key
field in the GRE header as defined in NVGRE RFC 7637. For UDP based
encapsulations the entropy value is placed in the source port of the
UDP header.
The hardware may support entropy calculation specifically for GRE and
for all tunneling protocols. With commit df2ef3bff1 ("net/mlx5e: Add
GRE protocol offloading") GRE is offloaded, but the hardware is
configured by default to calculate flow entropy so packets transmitted
on the wire have a wrong key. To support UDP based tunnels (i.e VXLAN),
GRE (i.e. no flow entropy) and NVGRE (i.e. with flow entropy) the
hardware behaviour must be controlled by the driver.

Ensure port entropy calculation is enabled for offloaded VXLAN tunnels
and disable port entropy calculation in the presence of offloaded GRE
tunnels by monitoring the presence of entropy enabling tunnels (i.e
VXLAN) and entropy disabing tunnels (i.e GRE).

Fixes: df2ef3bff1 ("net/mlx5e: Add GRE protocol offloading")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-22 13:38:23 -08:00
Eli Britstein bfedc645de net/mlx5: Use read-modify-write when changing PCMR register values
Currently changing a PCMR field is done by setting the field in a
zeroed buffer, zeroing other unrelated fields.
Fix this behaviour by modifying only the required field after first
reading the current register values, as a pre-step towards using more
fields in PCMR register.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-22 13:38:23 -08:00
Maxim Mikityanskiy 41f5f63cd1 net/mlx5e: Trust kernel regarding transport offset
After AF_PACKET is fixed to calculate the transport header offset
correctly, trust the value set by the kernel. If the offset wasn't set,
it means there is no transport header in the packet.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22 12:55:32 -08:00
Maxim Mikityanskiy 3517dfe6f2 net/mlx5e: Remove the wrong assumption about transport offset
skb_transport_offset() == 0 is not a special value. The only special
value is when skb->transport_header is ~0U, and it's checked by
skb_transport_header_was_set().

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22 12:55:32 -08:00
Petr Machata bb6c346cef mlxsw: spectrum_buffers: Reject overlarge headroom size requests
cap_max_headroom_size holds maximum headroom size supported.
Overstepping that limit might under certain conditions lead to ASIC
freeze.

Query and store the value, and add mlxsw_sp_sb_max_headroom_cells() for
obtaining the stored value. In __mlxsw_sp_port_headroom_set(), reject
requests where the total port buffer is larger than the advertised
maximum.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 15:57:46 -08:00
Petr Machata edf777f55a mlxsw: spectrum_buffers: Update port headroom configuration
The recommendation for headroom size for 100Gbps port and 100m cable is
101.6KB, reduced accordingly for split ports. The closest higher number
evenly divisible by cell size for both Spectrum-1 and Spectrum-2, and
such that the number of cells can be further divided by maximum split
factor of 4, is 102528 bytes, or 25632 bytes per lane.

Update mlxsw_sp_port_pb_init() to compute the headroom taking into
account this recommended per-lane value and number of lanes actually
dedicated to a given port.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 15:57:46 -08:00
Petr Machata fe099bf682 mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration
Customize the tables related to shared buffer configuration to match the
current recommendation for Spectrum-2 systems.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 15:57:46 -08:00
Petr Machata 13f35cc424 mlxsw: spectrum_buffers: Keep mlxsw_sp_sb_mm in sb_vals
The SBMM register configures the shared buffer quota for MC packets
according to Switch-Priority. The default configuration depends on the
chip type. Therefore keep the table and length in struct
mlxsw_sp_sb_vals. Redirect the references from the global definitions to
the fields.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 15:57:46 -08:00
Petr Machata bb60a62e02 mlxsw: spectrum_buffers: Keep mlxsw_sp_sb_cm in sb_vals
The SBCM register configures shared buffer quota according to
port-priority resp. port-TC. The default configuration depends on the
chip type. Therefore keep the tables and their lengths in struct
mlxsw_sp_sb_vals. Redirect the references from the global definitions to
the fields.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 15:57:46 -08:00
Petr Machata 5d25232eb9 mlxsw: spectrum_buffers: Keep mlxsw_sp_sb_prs in mlxsw_sp_sb_vals
The SBPR register configures shared buffer pools. The default
configuration depends on the chip type. Therefore keep it in struct
mlxsw_sp_sb_vals. Redirect the one reference from the global array to
the field.

Because the pool descriptor ID is implicit in the ordering of array
members, both this array and the pool descriptor array have the same
length. Therefore reuse mlxsw_sp_sb.pool_dess_len for the purpose of
determining the length of SBPR array.

Drop the now useless MLXSW_SP_SB_PRS_LEN.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 15:57:46 -08:00
Petr Machata cc1ce6ff34 mlxsw: spectrum_buffers: Keep mlxsw_sp_sb_pms in mlxsw_sp_sb_vals
The SBPM register can be used to configure quotas for packets ingressing
from a certain pool to a certain port, and egressing from a certain pool
to a certain port. The default configuration depends on the chip type.
Therefore keep it in struct mlxsw_sp_sb_vals. Redirect the one reference
from the global array to the field.

Because the pool descriptor ID is implicit in the ordering of array
members, both this array and the pool descriptor array have the same
length. Therefore reuse mlxsw_sp_sb.pool_dess_len for the purpose of
determining the length of SBPM array.

Drop the now useless MLXSW_SP_SB_PMS_LEN.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 15:57:46 -08:00
Petr Machata 5d65f5f45e mlxsw: spectrum_buffers: Keep pool descriptors in mlxsw_sp_sb_vals
Keep the table of pool descriptors and its length in struct
mlxsw_sp_sb_vals so that it can be specialized per chip type. Redirect
all users from the global definitions to the mlxsw_sp_sb fields.

Give mlxsw_sp_pool_count() an extra mlxsw_sp parameter so that it can
access the descriptor table.

Drop the now unnecessary MLXSW_SP_SB_POOL_DESS_LEN.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 15:57:45 -08:00
Petr Machata 93d201f775 mlxsw: spectrum_buffers: Allocate prs & pms dynamically
Spectrum-2 will be configured with a different set of pools than
Spectrum-1. The size of prs and pms buffers will therefore depend on the
chip type of the device.

Therefore, instead of reserving an array directly in a structure
definition, allocate the buffer in mlxsw_sp_sb_port{,s}_init().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 15:57:45 -08:00
Petr Machata c39f3e0e4f mlxsw: spectrum: Add struct mlxsw_sp_sb_vals
Spectrum-2 will be configured with a different shared buffer
configuration than Spectrum-1. Therefore introduce a structure for
keeping the chip-specific default and immutable configuration.

Configuration mutable in runtime will still be kept in struct
mlxsw_sp_sb.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 15:57:45 -08:00
Florian Fainelli 010c8f01aa net: Get rid of switchdev_port_attr_get()
With the bridge no longer calling switchdev_port_attr_get() to obtain
the supported bridge port flags from a driver but instead trying to set
the bridge port flags directly and relying on driver to reject
unsupported configurations, we can effectively get rid of
switchdev_port_attr_get() entirely since this was the only place where
it was called.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 14:55:14 -08:00
Florian Fainelli cc0c207a5d net: Remove SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT
Now that we have converted the bridge code and the drivers to check for
bridge port(s) flags at the time we try to set them, there is no need
for a get() -> set() sequence anymore and
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT therefore becomes unused.

Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 14:55:14 -08:00
Florian Fainelli c19c44f867 mlxsw: spectrum: Handle PORT_PRE_BRIDGE_FLAGS
In preparation for getting rid of switchdev_port_attr_get(), have mlxsw
check for the bridge flags being set through switchdev_port_attr_set()
when the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier is
used.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 14:55:13 -08:00
Jason Gunthorpe 815f748037 Merge branch 'mlx5-next' into rdma.git for-next
From
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

To resolve conflicts with net-next and pick up the first patch.

* branch 'mlx5-next':
  net/mlx5: Factor out HCA capabilities functions
  IB/mlx5: Add support for 50Gbps per lane link modes
  net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register
  net/mlx5: Add new fields to Port Type and Speed register
  net/mlx5: Refactor queries to speed fields in Port Type and Speed register
  net/mlx5: E-Switch, Avoid magic numbers when initializing offloads mode
  net/mlx5: Relocate vport macros to the vport header file
  net/mlx5: E-Switch, Normalize the name of uplink vport number
  net/mlx5: Provide an alternative VF upper bound for ECPF
  net/mlx5: Add host params change event
  net/mlx5: Add query host params command
  net/mlx5: Update enable HCA dependency
  net/mlx5: Introduce Mellanox SmartNIC and modify page management logic
  IB/mlx5: Use unified register/load function for uplink and VF vports
  net/mlx5: Use consistent vport num argument type
  net/mlx5: Use void pointer as the type in address_of macro
  net/mlx5: Align ODP capability function with netdev coding style
  mlx5: use RCU lock in mlx5_eq_cq_get()

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21 12:40:18 -07:00
David S. Miller 8e4c076ef2 mlx5-updates-2019-02-19
This series includes misc updates to mlx5 drivers and one ethtool update.
 
 1) From Aya Levin:
    - ethtool: Define 50Gbps per lane link modes
    - add support for 50Gbps per lane link modes in mlx5 driver
 
 2) From Tariq Toukan,
    - Add a helper function to unify mlx5 resource reloading
 
 3) From Vlad Buslov,
    - Remove wrong and superfluous tc pedit header type check
 
 4) From Tonghao Zhang,
    - Some refactoring in en_tc.c to simplify the mlx5e_tc_add_fdb_flow
 
 5) From Leon Romanovsky & Saeed,
    - Compilation warning fixes
 
 6) From Bodong wang,
    - E-Switch fixes that are related to the SmarNIC series
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcbH/oAAoJEEg/ir3gV/o+bJQIAKIbILkQsAIn3B3U1Y7WgE8l
 4pXdcYe6A3Mr9ZOi+U2ovJqVEHKPc7ASNfBfK5zsRayxhDQmoNhaloZSlwkgxenN
 +guf86OBqTmXq9vzk3AFfPbiceQ+ENkM8ZHsv0yc9q8ZgZX5KR3ghG0j7wONDdgv
 UOGuHNbO2QPA2d38V+xdwGkr/M8eep2xjYwDqT8+s7F+7BZeRThF2s9NBhE4rgob
 KlPXwYTqkx5fiODtlFE9igryi/aSr9iaVIVCOs9ZBwQrETc/WLttWbfMgrMWFW5z
 r4sa/84oVe0GVwZ+KPJKjUpXR4x1C3O+vSIAREsZfBDfjADHvGgXyRCYw+xrnq4=
 =9kOH
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-02-19

This series includes misc updates to mlx5 drivers and one ethtool update.

1) From Aya Levin:
   - ethtool: Define 50Gbps per lane link modes
   - add support for 50Gbps per lane link modes in mlx5 driver

2) From Tariq Toukan,
   - Add a helper function to unify mlx5 resource reloading

3) From Vlad Buslov,
   - Remove wrong and superfluous tc pedit header type check

4) From Tonghao Zhang,
   - Some refactoring in en_tc.c to simplify the mlx5e_tc_add_fdb_flow

5) From Leon Romanovsky & Saeed,
   - Compilation warning fixes

6) From Bodong wang,
   - E-Switch fixes that are related to the SmarNIC series
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20 20:13:58 -08:00
David S. Miller 375ca548f7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two easily resolvable overlapping change conflicts, one in
TCP and one in the eBPF verifier.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20 00:34:07 -08:00
Bodong Wang 1c50d369f5 net/mlx5: E-Switch, Disable esw manager vport correctly
When disabling vport, relevant vport configurations will be cleaned
up. These cleanups should be done to the vports which had these configs
applied at vport enablement. As esw manager vport didn't have such
vport config applied, cleanup should not touch it.

Fixes: de9e6a8136c5 ("net/mlx5: E-Switch, Properly refer to host PF vport as other vport")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reported-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19 14:15:04 -08:00
Bodong Wang acad70731e net/mlx5: E-Switch, Fix the warning on vport index out of range
When eswitch gets vport data structure, the index should not be out
of the range of the vport array. Driver mistakenly used vport number
to check the range.

Fixes: 22b8ddc86bf4 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19 14:15:04 -08:00
Saeed Mahameed e87636117e net/mlx5e: Remove unused variable ‘esw’
Fix the following compiler warning:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:2770:
warning: unused variable ‘esw’ [-Wunused-variable]

Fixes: 1cd3ab86b713 ("net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19 14:15:04 -08:00
Leon Romanovsky 566428375a net/mlx5: Delete unused FPGA QPN variable
fpga_qpn was assigned but never used and compilation with W=1
produced the following warning:

drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c: In function _mlx5_fpga_event_:
drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c:320:6: warning:
variable _fpga_qpn_ set but not used [-Wunused-but-set-variable]
  u32 fpga_qpn;
      ^~~~~~~~

Fixes: 98db16bab5 ("net/mlx5: FPGA, Handle QP error event")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19 14:15:04 -08:00
Leon Romanovsky 36a73471e5 net/mlx5e: Add missing static function annotation
Compilation with W=1 produces following warning:

drivers/net/ethernet/mellanox/mlx5/core/en/monitor_stats.c:69:6:
warning: no previous prototype for _mlx5e_monitor_counter_start_ [-Wmissing-prototypes]
 void mlx5e_monitor_counter_start(struct mlx5e_priv *priv)
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~

Avoid it by declaring mlx5e_monitor_counter_start() as a static function.

Fixes: 5c7e8bbb02 ("net/mlx5e: Use monitor counters for update stats")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19 14:15:03 -08:00
Tonghao Zhang 7040632df5 net/mlx5e: Remove 'parse_attr' argument in mlx5e_tc_add_fdb_flow()
This patch is a little improvement. Simplify the mlx5e_tc_add_fdb_flow().

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19 14:15:03 -08:00
Tonghao Zhang 988ab9c736 net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper
Introduce the mlx5e_flow_esw_attr_init() helper
for simplifying codes.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19 14:15:03 -08:00
Vlad Buslov 73c718fbb3 net/mlx5e: Remove wrong and superfluous tc pedit header type check
With recent introduction of flow_rule infrastructure drivers no longer
directly include action headers, so it is no longer possible to use
constants defined in them. Instead, one of flow_rule patches substituted
pedit action header constant with hardcoded value '2' in mlx5
set_pedit_val() function conditional which verifies that header type is in
range of values allowed by pedit action. That conditional is now both
wrong (hardcoded value is '2' but __PEDIT_HDR_TYPE_MAX is 6 in current
version) and superfluous (pedit action already verifies that header type is
in allowed range during init). Remove the described check from mlx5 code.

Fixes: 7386788175 ("drivers: net: use flow action infrastructure")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19 14:15:03 -08:00
Tariq Toukan 877662e272 net/mlx5e: Wrap the open and apply of channels in one fail-safe function
Take into a function the common code structure of opening
a side set of channels followed by a call to apply them.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19 14:15:03 -08:00
Aya Levin 6a89737241 net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes
In previous patch, driver added new speed modes: 50Gbps per lane support
for 50G/100G/200G.  This patch modifies mlx5e_get_link_ksettings and
mlx5e_set_link_ksettings to set and get these link modes via ethtool.
In order to do so, added mapping of new HW bits to ethtool bitmap and
enforce mutual exclusion between extended link modes and previously
defined link modes.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19 14:15:02 -08:00
Leon Romanovsky 37b6bb77c6 net/mlx5: Factor out HCA capabilities functions
Combine all HCA capabilities setters under one function
and compile out the ODP related function in case kernel
was compiled without ODP support.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-02-19 14:40:44 +02:00
Shalom Toledo 31ef5b0eef mlxsw: spectrum: Change IP2ME CPU policer rate and burst size values
The IP2ME packet trap is triggered by packets hitting local routes.
After evaluating current defaults used by the driver it was decided to
reduce the amount of traffic generated by this trap to 1Kpps and
increase the burst size. This is inline with similarly deployed systems.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 12:10:49 -08:00
Colin Ian King 21d2cb491b net/mlx4_en: fix spelling mistake: "quiting" -> "quitting"
There is a spelling mistake in a en_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 12:06:42 -08:00
Vadim Pasternak 6a79507cfe mlxsw: core: Extend thermal module with per QSFP module thermal zones
Add a dedicated thermal zone for each QSFP/SFP module. The current
temperature is obtained from the module's temperature sensor and the
trip points are set based on the warning and critical thresholds
read from the module.

A cooling device (fan) is bound to all the thermal zones. The
thermal zone governor is set to user space in order to avoid
collisions between thermal zones.
For example, one thermal zone might want to increase the speed of
the fan, whereas another one would like to decrease it.

Deferring this decision to user space allows the user to the take
the most suitable decision.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 10:57:49 -08:00
Petr Machata 289460404f mlxsw: __mlxsw_sp_port_headroom_set(): Fix a use of local variable
The function-local variable "delay" enters the loop interpreted as delay
in bits. However, inside the loop it gets overwritten by the result of
mlxsw_sp_pg_buf_delay_get(), and thus leaves the loop as quantity in
cells. Thus on second and further loop iterations, the headroom for a
given priority is configured with a wrong size.

Fix by introducing a loop-local variable, delay_cells. Rename thres to
thres_cells for consistency.

Fixes: f417f04da5 ("mlxsw: spectrum: Refactor port buffer configuration")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 10:13:46 -08:00
David S. Miller f2281c245d Support Mellanox BlueField SmartNIC (mlx5-updates-2019-02-15)
Bodong Wang says,
 
 BlueField device is a multi-core ARM processor in a highly integrated
 system on chip coupled with the ConnectX interconnect controller.
 BlueField device can be presented in one out of two modes:
 
 - SEPARATED_HOST: ARM processors as a separated and orthogonal host
   like any other external host in the multi-host virtualization model.
 - EMBEDDED_CPU: ARM processors as Embedded CPU (EC) and part of the
   external hosts virtualization model.
 
 While existing driver already supports the device on separated_host
 mode, this patch series focus on the functionalities of embedded_cpu
 mode.
 
 On embedded_cpu mode, BlueField device exposes regular network
 controller PCI function in the BlueField host(e.g, x86). However, a
 separate PCI function called Embedded CPU Physical Function(ECPF) is
 also added to the ARM host side, where standard Linux distributions is
 able to run on the ARM cores. Depends on the NV configuration from
 firmware, ECPF can be the e-switch manager and firmware pages supplier.
 If ECPF is configured as e-switch manager and page supplier, it will
 take over the responsibilities from the PF on BlueField host includes:
 - Owns, controls and manages all e-switch parts, and takes e-switch
   traffic by default. It also should perform ENABLE_HCA for the host
   PF just like a PF does for its VFs.
 - Provides and manages the ICM host memory required for the HCA to
   store various contexts for itself, the PF and VFs belong the
   e-switch it manages.
 
 The PF on BlueField host side is still responsible for:
 - Control its own permanent MAC.
 - PCI and SRIOV configurations and perform ENABLE_HCA for its VFs.
 
 The ECPF can also retrieve information about the external host it
 controls, like host identifier, PCI BDF and number of virtual functions.
 As these parameters may be changed dynamically, an event will be triggered
 to the driver on ECPF side.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcZ2amAAoJEEg/ir3gV/o+lzMIALvLNUoD6pXi41MWsOwvmAHg
 07mzg1N80Z66MCcFau40I8T3h9NLiRMzNtFrBNtxx+ruwKFUpAjJHjaU0sms0yQH
 WVjr35vk4XsZyDPSCJ4g/hCQVlgCT/1tIUvPO0YM9hjhDuVa9mT4wEpucQRDu8bO
 KfeXNXLDFnlWxjokhpSVj369ozh+LTv4Kzy0MBBbji97bG6MktGAT8uCimUy7wG0
 7dlYimnZ1+iUD1/DZQadiLCUHUu/rTcnvF2+DcdG/nbSU8ydVLgj6vgtIfCYt4e5
 kcQO5hmatnl0iJUA8GNpdHQwGjYytKneoGmIfbMAK4KjvFNF+3/8N4Bytoa5kzk=
 =DgTl
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Support Mellanox BlueField SmartNIC (mlx5-updates-2019-02-15)

Bodong Wang says,

BlueField device is a multi-core ARM processor in a highly integrated
system on chip coupled with the ConnectX interconnect controller.
BlueField device can be presented in one out of two modes:

- SEPARATED_HOST: ARM processors as a separated and orthogonal host
  like any other external host in the multi-host virtualization model.
- EMBEDDED_CPU: ARM processors as Embedded CPU (EC) and part of the
  external hosts virtualization model.

While existing driver already supports the device on separated_host
mode, this patch series focus on the functionalities of embedded_cpu
mode.

On embedded_cpu mode, BlueField device exposes regular network
controller PCI function in the BlueField host(e.g, x86). However, a
separate PCI function called Embedded CPU Physical Function(ECPF) is
also added to the ARM host side, where standard Linux distributions is
able to run on the ARM cores. Depends on the NV configuration from
firmware, ECPF can be the e-switch manager and firmware pages supplier.
If ECPF is configured as e-switch manager and page supplier, it will
take over the responsibilities from the PF on BlueField host includes:
- Owns, controls and manages all e-switch parts, and takes e-switch
  traffic by default. It also should perform ENABLE_HCA for the host
  PF just like a PF does for its VFs.
- Provides and manages the ICM host memory required for the HCA to
  store various contexts for itself, the PF and VFs belong the
  e-switch it manages.

The PF on BlueField host side is still responsible for:
- Control its own permanent MAC.
- PCI and SRIOV configurations and perform ENABLE_HCA for its VFs.

The ECPF can also retrieve information about the external host it
controls, like host identifier, PCI BDF and number of virtual functions.
As these parameters may be changed dynamically, an event will be triggered
to the driver on ECPF side.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-16 12:11:17 -08:00
Colin Ian King 59e6158aca mlxsw: core: fix spelling mistake "temprature" -> "temperature"
There is a spelling mistake in several dev_err messages, fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 20:16:52 -08:00
Bodong Wang c96692fb8f net/mlx5: E-Switch, Allow transition to offloads mode for ECPF
Currently, the e-switch driver requires going to legacy mode before
changing to the offloads mode. This makes sense for regular case as
the legacy mode is done by creating VFs.

However, it's problematic when ECPF is the eswitch manager. In such
case, ECPF will control the vports on peer host including the peer
PF and VFs. But ECPF doesn't need and shall not create VFs as the
VFs are created in the peer PF host.

Grant ECPF the ability to change from none to the offloads mode. Note
that currently the only way to go back to none mode is by unloading
the ECPF driver.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Bodong Wang a3888f33db net/mlx5: E-Switch, Load/unload VF reps according to event from host PF
When host PF changes the number of VFs, the ECPF esw driver will get
a FW event. It should query the number of VFs enabled by host PF and
update the VF reps accordingly. Note that host PF can't change the
number of VFs dynamically, it has to reset the number of VFs to 0
before changing to a new positive number.

The host event is registered when driver is moving to switchdev mode,
and it's the last step to do in esw_offloads_init. It's unregistered
and the work queue is flushed when driver quits from switchdev mode.
In this way, the host event and devlink command are serialized.

When driver is enabling switchdev mode, pay attention to the following
two facts:
1. Host PF must not have VF initialized as the flow table in ECPF has
   ENCAP enabled as default. Such flow table can't be created with
   existing initialized VFs.
2. ECPF doesn't know how many VFs the host PF will enable, ECPF
   offloads flow steering shall create the flow table/groups based on
   the max number of VFs possibly supported by host PF.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Bodong Wang 81cd229c29 net/mlx5: E-Switch, Consider ECPF vport depends on eswitch ownership
ECPF connects to the eswitch through vport 0xfffe. ECPF may or may
not be the eswitch manager depending on firmware configuration.

1. If ECPF is eswitch manager: ECPF will take over the eswitch manager
   responsibility. A rep of the host PF shall be created at the ECPF
   side for the eswitch manager to control.

2. If ECPF is not eswitch manager: host PF will be the eswitch manager,
   ECPF acts similar as a VF to the host PF. Host PF will be aware
   of the ECPF vport presence and control it's rep.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Bodong Wang 5ae5162066 net/mlx5: E-Switch, Assign a different position for uplink rep and vport
In offloads mode, the current implementation puts the uplink
representor at index zero of the vport reps array. It is not "natural"
to place it at index 0 since we want to put the representor for vport
0 at index 0 with the introduction of SmartNIC. A separate patch will
handle the case whether a rep is needed for vport 0 (PF vport).

So, we want to have a different placeholder for uplink vport and
representor. It was placed at the end of vport and rep array. Since
vport number can no longer act as an index into the vport or
representors arrays, use functions to map vport numbers to indices
when accessing the vports or representors arrays, and vice versa.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Bodong Wang f8e8fa0262 net/mlx5: E-Switch, Centralize repersentor reg/unreg to eswitch driver
Eswitch has two users: IB and ETH. They both register repersentors
when mlx5 interface is added, and unregister the repersentors when
mlx5 interface is removed. Ideally, each driver should only deal with
the entities which are unique to itself. However, current IB and ETH
drivers have to perform the following eswitch operations:

1. When registering, specify how many vports to register. This number
   is the same for both drivers which is the total available vport
   numbers.
2. When unregistering, specify the number of registered vports to do
   unregister. Also, unload the repersentors which are already loaded.

It's unnecessary for eswitch driver to hands out the control of above
operations to individual driver users, as they're not unique to each
driver. Instead, such operations should be centralized to eswitch
driver. This consolidates eswitch control flow, and simplified IB and
ETH driver.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Bodong Wang 29d9fd7d5a net/mlx5: E-Switch, Support load/unload reps of specific vport types
Currently the driver loads and unloads all reps in an unbreakable
group. However, with ECPF, the reps of special vports such as uplink
and host PF should always be loaded in switchdev mode where the reps
for VFs will be loaded on-demand and unloaded on no-demand. This is
a pre-step for that change.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:57 -08:00
Bodong Wang f121e0ea95 net/mlx5: E-Switch, Add state to eswitch vport representors
Currently the eswitch vport reps have a valid indicator, which is
set on register and unset on unregister. However, a rep can be loaded
or not loaded when doing unregister, current driver checks if the
vport of that rep is enabled as a flag to imply the rep is loaded.
However, for ECPF, this is not valid as the host PF will enable the
vports for its VFs instead.

Add three states: {unregistered, registered, loaded}, with the
following state changes across different operations:

	create: (none)       -> unregistered
	reg:    unregistered -> registered
	load:   registered   -> loaded
	unload: loaded       -> registered
	unreg:  registered   -> unregistered

Note that the state shall only be updated inside eswitch driver rather
than individual drivers such as ETH or IB.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Suggested-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:57 -08:00
Bodong Wang 879c8f84e3 net/mlx5: E-Switch, Use getter and iterator to access vport/rep
With only PF and VF, it is sufficient to have the vport/rep array
index as the vport number. This is because PF and VF vports numbers
are consecutive serial numbers. In downstream patches with
introducing of ECPF and UPLINK vports, it's not consecutive any more.

Use getter to get specific vport/rep, and use iterator to traversal
a list of vport/rep. This hides the translation between array index
and vport number, and provides flexibility of using different
translation mechanism in the future.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:57 -08:00
Bodong Wang c9b99abcf2 net/mlx5: E-Switch, Split VF and special vports for offloads mode
When driver is entering offloads mode, there are two major tasks to
do: initialize flow steering and create representors. Flow steering
should make sure enough flow table/group spaces are reserved for all
reps. Representors will be created in a group, all or none.

With the introduction of ECPF, flow steering should still reserve the
same spaces. But, the representors are not always loaded/unloaded in a
single piece. Once ECPF is in offloads mode, it will get the number
of VF changing event from host PF. In such scenario, only the VF reps
should be loaded/unloaded, not the reps for special vports (such as
the uplink vport).

Thus, when entering offloads mode, driver should specify the total
number of reps, and the number of VF reps separately. When leaving
offloads mode, the cleanup should use the information self-contained
in eswitch such as number of VFs.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:57 -08:00
Bodong Wang eca8cc3895 net/mlx5: E-Switch, Refactor offloads flow steering init/cleanup
E-switch offloads mode initialize/cleanup multiple steering related
entities (flow table/group). Refactor these operations to internal
helper functions for better block design.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:57 -08:00
Bodong Wang cbc44e76bf net/mlx5: E-Switch, Properly refer to host PF vport as other vport
Commands referring to vports use the following scheme:

1. When referring to my own vport, put 0 in vport and 0 in other_vport.
2. When referring to another vport, put the vport number of the
   referred vport and put 1 in other_vport. It was assumed that driver
   is accessing other vport when vport number is greater than 0.

With the above scheme, the case that ECPF eswitch manager is trying
to access host PF vport will fall over with scheme 1 as the vport
number is 0. This is apparently wrong as driver is trying to refer
other vport.

As such usage can only happen in the eswitch context, change relevant
functions to provide other vport input properly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:56 -08:00
Bodong Wang a1b3839ac4 net/mlx5: E-Switch, Properly refer to the esw manager vport
In SmartNIC mode, the eswitch manager is not necessarily the PF
(vport 0). Use a helper function to get the correct eswitch manager
vport number and cache on the eswitch instance for fast reference.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:56 -08:00
Bodong Wang 86b39a66b7 net/mlx5: Correctly set LAG mode for ECPF
When bonding is added, driver assumes that it's RoCE LAG if no VF is
enabled. This is not enough for ECPF as the VF is enabled in host PF
side. LAG should only choose RoCE mode when both slave devices meet
conditions below:
 1. E-Switch offloads mode is NONE.
 2. No VF is enabled.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:56 -08:00
Saeed Mahameed 259fae5a2c Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Merge mlx5-next shared branched into net-next,

From Bodong Wang:
1) Introduction of ECPF (Embedded CPU Physical Function), and low level
bits for mlx5 SmartNic capabilities support.
2) Vport enumeration refactoring that affect mlx5_ib and mlx5_core

From Aya Levin,
3) Add support for 50Gbps per lane link modes in the Port Type and Speed
register (PTYS)
4) Refactor low level query functions for PTYS register
5) Add support for 50Gbps per lane link modes to mlx5_ib

Note: due to a change in API in mlx5/core and a later patch from net-next,
a fixup was squashed with this merge commit that replaces FDB_UPLINK_VPORT
with MLX5_VPORT_UPLINK which exists only in upstream net-next.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 16:45:31 -08:00
David S. Miller 3313da8188 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The netfilter conflicts were rather simple overlapping
changes.

However, the cls_tcindex.c stuff was a bit more complex.

On the 'net' side, Cong is fixing several races and memory
leaks.  Whilst on the 'net-next' side we have Vlad adding
the rtnl-ness support.

What I've decided to do, in order to resolve this, is revert the
conversion over to using a workqueue that Cong did, bringing us back
to pure RCU.  I did it this way because I believe that either Cong's
races don't apply with have Vlad did things, or Cong will have to
implement the race fix slightly differently.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 12:38:38 -08:00
Aya Levin a08b4ed137 net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register
This patch exposes new link modes (including 50Gbps per lane), and ext_*
fields which describes the new link modes in Port Type and Speed
register (PTYS).
Access functions, translation functions (speed <-> HW bits) and
link max speed function were modified.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:42 -08:00
Aya Levin bc4e12ffef net/mlx5: Refactor queries to speed fields in Port Type and Speed register
This patch fascicles queries to speed related fields in Port Type and
Speed register (PTYS) into a single API. I addition, this patch
refactors functions which serves only Ethernet driver: remove the
protocol type as an input parameter, move code from 'core' directory
into 'en' directory and add 'eth' prefix to the function's name. The
patch also encapsulates functions that are not used outside the Ethernet
driver removes redundant include files.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:42 -08:00
Bodong Wang cd7e4186af net/mlx5: E-Switch, Avoid magic numbers when initializing offloads mode
When dealing with the offloads mode initialization, driver refers to
the number of VFs and add magic number one (1) to take account of the
uplink. This is not clear and will make the code less readable after
adding other vports (e.g. host PF). As these are special vports
compared to VF vports, add a helper macro to denote such special
vports and eliminate the use of magic number.

Moreover, when creating offloads flow table and groups, the driver
reserves two more slots for UC and MC miss rules. Replace this magic
number with a helper macro as well.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:42 -08:00
Bodong Wang bf3e4d387d net/mlx5: Relocate vport macros to the vport header file
These are two macros in the driver general header which deal with the
number of total vports and if a vport is vport manager. Such macros
are vport entities, better to place them at the vport header file.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:42 -08:00
Bodong Wang b05af6aacd net/mlx5: E-Switch, Normalize the name of uplink vport number
Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to
comply with the same naming convention along with the introduction of
other vports. Use MLX5_VPORT as the prefix for such vports and
relocate the uplink vport definition to public header file for the
benefits of both net and IB drivers.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:42 -08:00
Bodong Wang 7f0d11c7e0 net/mlx5: Add host params change event
In Embedded CPU (EC) configurations, the EC driver needs to know when
the number of virtual functions change on the corresponding PF at the
host side. This is required so the EC driver can create or destroy
representor net devices that represent the VFs ports.

Whenever a change in the number of VFs occurs, firmware will generate an
event towards the EC which will trigger a work to complete the rest of
the handling. The specifics of the handling will be introduced in a
downstream patch.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:42 -08:00
Bodong Wang c3a4e9f107 net/mlx5: Add query host params command
The QUERY_HOST_PARAMS command is used by an Embedded CPU Physical
Function (ECPF) driver to identify and retrieve information about the
PF on the host side. E.g, number of virtual functions and PCI BDF.

The number of VFs can be changed on the fly, a function is added to
query current number of VFs and will be used in downstream patches.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:41 -08:00
Bodong Wang 22e939a91d net/mlx5: Update enable HCA dependency
With the introduction of ECPF, we require that the ECPF driver will
aways call enable/disable HCA for that PF in the same way a PF does
this for its VFs. The PF is still responsible for calling enable and
disable HCA for its VFs.

To distinguish between the ECPF executing enable/disable HCA for
itself or for the PF, it sets the embedded CPU function bit in the
input params struct of these commands. When the bit is cleared and
function ID is zero, it refers to the peer PF.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:41 -08:00
Bodong Wang 591905ba96 net/mlx5: Introduce Mellanox SmartNIC and modify page management logic
Mellanox's SmartNIC combines embedded CPU(e.g, ARM) processing power
with advanced network offloads to accelerate a multitude of security,
networking and storage applications.

With the introduction of the SmartNIC, there is a new PCI function
called Embedded CPU Physical Function(ECPF). And it's possible for a
PF to get its ICM pages from the ECPF PCI function. Driver shall
identify if it is running on such a function by reading a bit in
the initialization segment.

When firmware asks for pages, it would issue a page request event
specifying how many pages it requests and for which function. That
driver responds with a manage_pages command providing the requested
pages along with an indication for which function it is providing these
pages.

The encoding before this patch was as follows:
    function_id == 0: pages are requested for the function receiving
                      the EQE.
    function_id != 0: pages are requested for VF identified by the
                      function_id value

A new one bit field in the EQE identifies that pages are requested for
the ECPF.

The notion of page_supplier can be introduced here and to support that,
manage pages and query pages were modified so firmware can distinguish
the following cases:

1. Function provides pages for itself
2. PF provides pages for its VF
3. ECPF provides pages to itself
4. ECPF provides pages for another function

This distinction is possible through the introduction of the bit
"embedded_cpu_function" in query_pages, manage_pages and page request
EQE.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:41 -08:00
Bodong Wang 7e4c4330a3 net/mlx5: Use consistent vport num argument type
Use u16 for vport number, which matches how hardware refers to this
argument throughout commands.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:41 -08:00
Vadim Pasternak 97cd342ae4 mlxsw: core: Allow thermal zone binding to an external cooling device
Allow thermal zone binding to an external cooling device from the
cooling devices white list.

It provides support for Mellanox next generation systems on which
cooling device logic is not controlled through the switch registers.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak a53779de6a mlxsw: core: Add QSFP module temperature label attribute to hwmon
Add label attribute to hwmon object for exposing QSFP module's
temperature sensor name. Modules are labeled as "front panel xxx". The
label is used by utilities such as "sensors":

front panel 001:   +0.0C  (crit =  +0.0C, emerg =  +0.0C)
..
front panel 020:  +31.0C  (crit = +70.0C, emerg = +80.0C)
..
front panel 056:  +41.0C  (crit = +70.0C, emerg = +80.0C)

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak 5c42eaa07b mlxsw: core: Extend hwmon interface with QSFP module temperature attributes
Add new attributes to hwmon object for exposing QSFP module temperature
input, fault indication, critical and emergency thresholds. Temperature
input and fault indication are read from Management Temperature Bulk
Register. Temperature thresholds are read from Management Cable Info
Access Register.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak 2c6a33cd33 mlxsw: core: Extend hwmon interface with fan fault attribute
Add new fan hwmon attribute for exposing fan faults (fault indication is
read from Fan Out of Range Event Register).

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak 2ee1165118 mlxsw: core: Rename cooling device
Rename cooling device from "Fan" to "mlxsw_fan".  Name "Fan" is too
common name, and such name is misleading, while it's interpreted by
user. For example name "Fan" could be used by ACPI.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak 41e760841d mlxsw: core: Replace thermal temperature trips with defines
Replace thermal hardcoded temperature trip values with defines.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak 69115b7d01 mlxsw: core: Modify thermal zone definition
Modify thermal zone trip points setting for better alignment with system
thermal requirement.

Add hysteresis thresholds for thermal trips in order to avoid throttling
around thermal trip point. If hysteresis temperature is not considered,
PWM can have side effect of flip up/down on thermal trip point boundary.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak 3dcfe17957 mlxsw: core: Set different thermal polling time based on bus frequency capability
Add low frequency bus capability in order to allow core functionality
separation based on bus type. Driver could run over PCIe, which is
considered as high frequency bus or I2C, which is considered as low
frequency bus. In the last case time setting, for example, for thermal
polling interval, should be increased.

Use different thermal monitoring based on bus type. For I2C bus time is
set to 20 seconds, while for PCIe 1 second polling interval is used.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak d93c19a1d9 mlxsw: core: Add API for QSFP module temperature thresholds reading
Add new API to read QSFP module's temperature thresholds - warning and
critical.

New internal API reads the temperature thresholds from the modules,
which are equipped with the thermal sensor. These thresholds will be
exposed via hwmon subsystem.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak 3760c2b99e mlxsw: reg: Add Fan Out of Range Event Register
Add FORE (Fan Out of Range Event Register), which is used for fan fault
reading.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak 5f28ef71a5 mlxsw: reg: Add Management Temperature Bulk Register
Add MTBR (Management Temperature Bulk Register), which is used for port
temperature reading in a bulk mode.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:02 -08:00
Vadim Pasternak d517ee7ca8 mlxsw: spectrum: Move QSFP EEPROM definitions to common location
Move QSFP EEPROM definitions to common location from the spectrum driver
in order to make them available for other mlxsw modules. They are common
for all kind of chips and have relation to SFF specifications 8024,
8436, 8472, 8636, rather than to chip type.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:33:01 -08:00
Saeed Mahameed 407e17b1a6 net/mlx5e: XDP, fix redirect resources availability check
Currently mlx5 driver creates xdp redirect hw queues unconditionally on
netdevice open, This is great until someone starts redirecting XDP traffic
via ndo_xdp_xmit on mlx5 device and changes the device configuration at
the same time, this might cause crashes, since the other device's napi
is not aware of the mlx5 state change (resources un-availability).

To fix this we must synchronize with other devices napi's on the system.
Added a new flag under mlx5e_priv to determine XDP TX resources are
available, set/clear it up when necessary and use synchronize_rcu()
when the flag is turned off, so other napi's are in-sync with it, before
we actually cleanup the hw resources.

The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and
it is sufficient to determine if it safe to transmit or not. The other
two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become
unnecessary. Thus, they are removed from data path.

Fixes: 58b99ee3e3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13 15:40:51 -08:00
Tariq Toukan 5400261e4d net/mlx5: Fix a compilation warning in events.c
Eliminate the following compilation warning:

drivers/net/ethernet/mellanox/mlx5/core/events.c: warning: 'error_str'
may be used uninitialized in this function [-Wuninitialized]:  => 238:3

Fixes: c2fb3db22d ("net/mlx5: Rework handling of port module events")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Mikhael Goikhman <migo@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13 15:40:50 -08:00
Huy Nguyen 4cab346bcf net/mlx5: No command allowed when command interface is not ready
When EEH is injected and PCI bus stalls, mlx5's pci error detect
function is called to deactivate the command interface and tear down
the device. The issue is that there can be a thread that already
passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
and stuck in the wait_func.

Solution:
Add function mlx5_cmd_flush to disable command interface and clear all
the pending commands. When device state is set to
MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
pending threads waiting for firmware commands completion are terminated.

Fixes: c1d4d2e92a ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13 15:40:50 -08:00
Maria Pasechnik fb35c534b7 net/mlx5e: Fix NULL pointer derefernce in set channels error flow
New channels are applied to the priv channels only after they
are successfully opened. Then, the indirection table should be built
according to the new number of channels.
Currently, such build is preformed independently of whether the
channels opening is successful, and is not reverted on failure.

The bug is caused due to removal of rss params from channels struct
and moving it to priv struct. That change cause to independency between
channels and rss params.
This causes a crash on a later point, when accessing rqn of a non
existing channel.

This patch fixes it by moving the indirection table build right before
switching the priv channels to new channels struct, after the new set of
channels was successfully opened.

Fixes: bbeb53b8b2 ("net/mlx5e: Move RSS params to a dedicated struct")
Signed-off-by: Maria Pasechnik <mariap@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13 15:40:50 -08:00
Leon Romanovsky 224d71ccc0 net/mlx5: Align ODP capability function with netdev coding style
Update newly introduced function to be aligned to netdev coding style.

Fixes: 46861e3e88 ("net/mlx5: Set ODP SRQ support in firmware")
Reported-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-02-13 19:50:48 +02:00
Florian Fainelli 0f56623dc4 mlxsw: spectrum_switchdev: Remove unused variables
After 1ecb195753 ("mlxsw: spectrum_switchdev: Remove getting
PORT_BRIDGE_FLAGS") we are not accessing any driver private data
structure, so the mlxsw_sp_port and mlxsw_sp variables are unused.

Fixes: 1ecb195753 ("mlxsw: spectrum_switchdev: Remove getting PORT_BRIDGE_FLAGS")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 17:31:30 -08:00
Saeed Mahameed 74abc07dee net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, it is not guaranteed. For example, switches might
choose to make other use of these octets.
This repeatedly causes kernel hardware checksum fault.

Prior to the cited commit below, skb checksum was forced to be
CHECKSUM_NONE when padding is detected. After it, we need to keep
skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
verify and parse IP headers, it does not worth the effort as the packets
are so small that CHECKSUM_COMPLETE has no significant advantage.

Future work: when reporting checksum complete is not an option for
IP non-TCP/UDP packets, we can actually fallback to report checksum
unnecessary, by looking at cqe IPOK bit.

Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 12:58:48 -05:00
Florian Fainelli 1ecb195753 mlxsw: spectrum_switchdev: Remove getting PORT_BRIDGE_FLAGS
There is no code that will query the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
attribute remove support for that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 12:49:53 -05:00
Saeed Mahameed 29dded89e8 net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, it is not guaranteed. For example, switches might
choose to make other use of these octets.
This repeatedly causes kernel hardware checksum fault.

Prior to the cited commit below, skb checksum was forced to be
CHECKSUM_NONE when padding is detected. After it, we need to keep
skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
verify and parse IP headers, it does not worth the effort as the packets
are so small that CHECKSUM_COMPLETE has no significant advantage.

Future work: when reporting checksum complete is not an option for
IP non-TCP/UDP packets, we can actually fallback to report checksum
unnecessary, by looking at cqe IPOK bit.

Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 12:26:21 -05:00
Ido Schimmel 384c2f7473 mlxsw: spectrum_flower: Fix VLAN modify action support
The driver does not support VLAN push and pop, but only VLAN modify.

Fixes: 7386788175 ("drivers: net: use flow action infrastructure")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 12:03:29 -05:00
Ido Schimmel 24f91ce0d2 mlxsw: spectrum_router: Drop unnecessary WARN_ON_ONCE()
In case the register access failed an error would be logged anyway, so
we can drop the warning.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 12:03:29 -05:00
Nir Dotan 48ebab31d4 mlxsw: spectrum: Set LAG port collector only when active
The LAG port collecting (receive) function was mistakenly set when the
port was registered as a LAG member, while it should be set only when
the port collection state is set to true. Set LAG port to collecting
when it is set to distributing, as described in the IEEE link
aggregation standard coupled control mux machine state diagram.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 12:03:29 -05:00
Cong Wang 1fbf1252df mlx5: use RCU lock in mlx5_eq_cq_get()
mlx5_eq_cq_get() is called in IRQ handler, the spinlock inside
gets a lot of contentions when we test some heavy workload
with 60 RX queues and 80 CPU's, and it is clearly shown in the
flame graph.

In fact, radix_tree_lookup() is perfectly fine with RCU read lock,
we don't have to take a spinlock on this hot path. This is pretty
much similar to commit 291c566a28
("net/mlx4_core: Fix racy CQ (Completion Queue) free"). Slow paths
are still serialized with the spinlock, and with synchronize_irq()
it should be safe to just move the fast path to RCU read lock.

This patch itself reduces the latency by about 50% for our memcached
workload on a 4.14 kernel we test. In upstream, as pointed out by Saeed,
this spinlock gets some rework in commit 02d92f7903
("net/mlx5: CQ Database per EQ"), so the difference could be smaller.

Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-11 14:39:14 -08:00
Gustavo A. R. Silva 9e475293cd mlxsw: spectrum_router: Use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL)

Notice that, in this case, variable alloc_size is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 22:57:28 -08:00
Jiri Pirko 3985de7260 mlxsw: spectrum_acl: Add couple of vregion rehash tracepoints
As vregion rehash is happening in delayed work, add some visibility to
the process using a few tracepoints.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:50 -08:00
Jiri Pirko 98bbf70c1c mlxsw: spectrum: add "acl_region_rehash_interval" devlink param
Expose new driver-specific "acl_region_rehash_interval" devlink param
which would allow user to alter default ACL region rehash interval.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:50 -08:00
Jiri Pirko e5e7962ee5 mlxsw: spectrum_acl: Implement region migration according to hints
If the hints are returned, the migration should be started. For that to
happen, there is a need to create a second physical region in TCAM with
new ERP set by passing the hints and then move chunk by chunk,
entry by entry.

During the transition, two lookups will occur. One in old region and
another in new region. The highest priority rule will be chosen.

In an unlikely case that the migration will fail and also rollback to
original region will fail the vregion will become in bad state.
Everything will work, only no future rehash will be possible. In a
follow-up work, this can be resolved by trying to resume the rollback
in delayed work and repair the vregion.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:50 -08:00
Jiri Pirko 5c661f142c mlxsw: reg: Add multi field to PAGT register
For Spectrum-2 this allows parallel lookups in multiple regions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:50 -08:00
Jiri Pirko a339bf8aaf mlxsw: spectrum_acl: Pass hints priv all the way to ERP code
The hints priv comes from ERP code and it is possible to obtain it from
TCAM code. Add arg to appropriate functions so the hints
priv could be passed back down to ERP code. Pass NULL now as the
follow-up patches would pass an actual hints priv pointer.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:50 -08:00
Jiri Pirko 29a2102a29 mlxsw: spectrum_acl: Implement basic ERP rehash hits creation
Introduce an initial implementation of rehash logic in ERP code.
Currently, the rehash is considered as needed only in case number of
roots in the hints is smaller than the number of roots actually in use.
In that case return hints pointer and let it be obtained through the
callpath through the Spectrum-2 TCAM op.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
Jiri Pirko c4c2dc5429 mlxsw: spectrum_acl: Split entry struct into entry and ventry
Do the split of entry struct so the new entry struct is related to the
actual HW entry, whereas ventry struct is a SW abstration of that.
This split prepares possibility for ventry to hold 2 HW entries which
is needed for region ERP rehash flow.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
Jiri Pirko b2d6b4d2be mlxsw: spectrum_acl: Split chunk struct into chunk and vchunk
Do the split of chunk struct so the new chunk struct is related to the
actual HW chunk (differs between Spectrum and Spectrum-2),
whereas vchunk struct is a SW abstraction of that. This split
prepares possibility for vchunk to hold 2 HW chunks which is needed
for region ERP rehash flow.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
Jiri Pirko 0f54236da0 mlxsw: spectrum_acl: Split region struct into region and vregion
Do the split of region struct so the new region struct is related to the
actual HW region, whereas vregion struct is a SW abstration of that.
This split prepares possibility for vregion to hold 2 HW regions which
is needed for region ERP rehash flow.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
Jiri Pirko 9069a3817d lib: objagg: implement optimization hints assembly and use hints for object creation
Implement simple greedy algo to find more optimized root-delta tree for
a given objagg instance. This "hints" can be used by a driver to:
1) check if the hints are better (driver's choice) than the original
   objagg tree. Driver does comparison of objagg stats and hints stats.
2) use the hints to create a new objagg instance which will construct
   the root-delta tree according to the passed hints. Currently, only a
   simple greedy algorithm is implemented. Basically it finds the roots
   according to the maximal possible user count including deltas.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
Jiri Pirko 7c62cfb8c5 devlink: publish params only after driver init is done
Currently, user can do dump or get of param values right after the
devlink params are registered. However the driver may not be initialized
which is an issue. The same problem happens during notification
upon param registration. Allow driver to publish devlink params
whenever it is ready to handle get() ops. Note that this cannot
be resolved by init reordering, as the "driverinit" params have
to be available before the driver is initialized (it needs the param
values there).

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
David S. Miller a655fe9f19 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
An ipvlan bug fix in 'net' conflicted with the abstraction away
of the IPV6 specific support in 'net-next'.

Similarly, a bug fix for mlx5 in 'net' conflicted with the flow
action conversion in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:00:17 -08:00
Eran Ben Elisha 7d91126b1a net/mlx5e: Add tx timeout support for mlx5e tx reporter
With this patch, ndo_tx_timeout callback will be redirected to the tx
reporter in order to detect a tx timeout error and report it to the
devlink health. (The watchdog detects tx timeouts, but the driver verify
the issue still exists before launching any recover method).

In addition, recover from tx timeout in case of lost interrupt was added
to the tx reporter recover method. The tx timeout recover from lost
interrupt is not a new feature in the driver, this patch re-organize the
functionality and move it to the tx reporter recovery flow.

tx timeout example:
(with auto_recover set to false, if set to true, the manual recover and
diagnose sections are irrelevant)

$cat /sys/kernel/debug/tracing/trace
...
devlink_health_report: bus_name=pci dev_name=0000:00:09.0
driver_name=mlx5_core reporter_name=tx: TX timeout on queue: 0, SQ: 0x8a,
CQ: 0x35, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 14912000

$devlink health show
pci/0000:00:09.0:
  name tx
    state healthy #err 1 #recover 0 last_dump_ts N/A
    parameters:
      grace_period 500 auto_recover false

$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
    "SQs": [ {
            "sqn": 138,
            "HW state": 1,
            "stopped": true
        },{
            "sqn": 142,
            "HW state": 1,
            "stopped": false
        } ]
}

$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
  sqn: 138 HW state: 1 stopped: true
  sqn: 142 HW state: 1 stopped: false

$devlink health recover pci/0000:00:09 reporter tx
$devlink health show
pci/0000:00:09.0:
  name tx
    state healthy #err 1 #recover 1 last_dump_ts N/A
    parameters:
      grace_period 500 auto_recover false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07 10:34:29 -08:00
Eran Ben Elisha de8650a820 net/mlx5e: Add tx reporter support
Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of tx errors.
This patch declares the TX reporter operations and creates it using the
devlink health API. Currently, this reporter supports reporting and
recovering from send error CQE only. In addition, it adds diagnose
information for the open SQs.

For a local SQ recover (due to driver error report), in case of SQ recover
failure, the recover operation will be considered as a failure.
For a full tx recover, an attempt to close and open the channels will be
done. If this one passed successfully, it will be considered as a
successful recover.

The SQ recover from error CQE flow is not a new feature in the driver,
this patch re-organize the functions and adapt them for the devlink
health API. For this purpose, move code from en_main.c to a new file
named reporter_tx.c.

Diagnose output:
$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
    "SQs": [ {
            "sqn": 138,
            "HW state": 1,
            "stopped": false
        },{
            "sqn": 142,
            "HW state": 1,
            "stopped": false
        } ]
}

$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
  sqn: 138 HW state: 1 stopped: false
  sqn: 142 HW state: 1 stopped: false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07 10:34:29 -08:00
Tonghao Zhang 218d05ce32 net/mlx5e: Don't overwrite pedit action when multiple pedit used
In some case, we may use multiple pedit actions to modify packets.
The command shown as below: the last pedit action is effective.

$ tc filter add dev netdev_rep parent ffff: protocol ip prio 1    \
	flower skip_sw ip_proto icmp dst_ip 3.3.3.3        \
	action pedit ex munge ip dst set 192.168.1.100 pipe    \
	action pedit ex munge eth src set 00:00:00:00:00:01 pipe    \
	action pedit ex munge eth dst set 00:00:00:00:00:02 pipe    \
	action csum ip pipe    \
	action tunnel_key set src_ip 1.1.1.100 dst_ip 1.1.1.200 dst_port 4789 id 100 \
	action mirred egress redirect dev vxlan0

To fix it, we add max_mod_hdr_actions to mlx5e_tc_flow_parse_attr struction,
max_mod_hdr_actions will store the max pedit action number we support and
num_mod_hdr_actions indicates how many pedit action we used, and store all
pedit action to mod_hdr_actions.

Fixes: d79b6df6b1 ("net/mlx5e: Add parsing of TC pedit actions to HW format")
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 17:12:13 -08:00
Tonghao Zhang 6707f74be8 net/mlx5e: Update hw flows when encap source mac changed
When we offload tc filters to hardware, hardware flows can
be updated when mac of encap destination ip is changed.
But we ignore one case, that the mac of local encap ip can
be changed too, so we should also update them.

To fix it, add route_dev in mlx5e_encap_entry struct to save
the local encap netdevice, and when mac changed, kernel will
flush all the neighbour on the netdevice and send NETEVENT_NEIGH_UPDATE
event. The mlx5 driver will delete the flows and add them when neighbour
available again.

Fixes: 232c001398 ("net/mlx5e: Add support to neighbour update flow")
Cc: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 17:12:13 -08:00
Ido Schimmel 2810c3b252 mlxsw: spectrum_router: Offload blackhole routes
Create a new FIB entry type for blackhole routes and set it in case the
type of the notified route is 'RTN_BLACKHOLE'.

Program such routes with a discard action and mark them as offloaded
since the device is dropping the packets instead of the kernel.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 14:24:05 -08:00
Florian Fainelli 25ba860514 mlxsw: Implement ndo_get_port_parent_id()
mlxsw implements SWITCHDEV_ATTR_ID_PORT_PARENT_ID and we want to get rid
of switchdev_ops eventually, ease that migration by implementing a
ndo_get_port_parent_id() function which returns what
switchdev_port_attr_get() would do.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 14:16:11 -08:00
Florian Fainelli 6dcfa23438 net/mlx5e: Implement ndo_get_port_parent_id()
mlx5e only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it a
great candidate to be converted to use the ndo_get_port_parent_id() NDO
instead of implementing switchdev_port_attr_get().

Since mlx5e makes use of switchdev_port_parent_id() convert it to use
netdev_port_same_parent_id().

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 14:16:11 -08:00
Nir Dotan d32d02a548 mlxsw: core: Trace EMAD errors
Trace EMAD errors returned from HW.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 11:05:57 -08:00
Pablo Neira Ayuso 7386788175 drivers: net: use flow action infrastructure
This patch updates drivers to use the new flow action infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 10:38:25 -08:00
Pablo Neira Ayuso 3b1903ef97 flow_offload: add statistics retrieval infrastructure and use it
This patch provides the flow_stats structure that acts as container for
tc_cls_flower_offload, then we can use to restore the statistics on the
existing TC actions. Hence, tcf_exts_stats_update() is not used from
drivers anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 10:38:25 -08:00
Pablo Neira Ayuso c500c86b0c net/mlx5e: support for two independent packet edit actions
This patch adds pedit_headers_action structure to store the result of
parsing tc pedit actions. Then, it calls alloc_tc_pedit_action() to
populate the mlx5e hardware intermediate representation once all actions
have been parsed.

This patch comes in preparation for the new flow_action infrastructure,
where each packet mangling comes in an separated action, ie. not packed
as in tc pedit.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 10:38:25 -08:00
Pablo Neira Ayuso 8f2566225a flow_offload: add flow_rule and flow_match structures and use them
This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.

To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.

This introduces two new interfaces:

	bool flow_rule_match_key(rule, dissector_id)

that returns true if a given matching key is set on, and:

	flow_rule_match_XYZ(rule, &match);

To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 10:38:25 -08:00
Guy Shattah 1651925d40 net/mlx5e: Use the inner headers to determine tc/pedit offload limitation on decap flows
In packets that need to be decaped the internal headers
have to be checked, not the external ones.

Fixes: bdd66ac0ae ("net/mlx5e: Disallow TC offloading of unsupported match/action combinations")
Signed-off-by: Guy Shattah <sguy@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-05 12:10:19 -08:00
Or Gerlitz 6363651d6d net/mlx5e: Properly set steering match levels for offloaded TC decap rules
The match level computed by the driver gets to be wrong for decap
rules with wildcarded inner packet match such as:

tc filter add dev vxlan_sys_4789 protocol all parent ffff: prio 2 flower
       enc_dst_ip 192.168.0.9 enc_key_id 100 enc_dst_port 4789
       action tunnel_key unset
       action mirred egress redirect dev eth1

The FW errs for a missing matching meta-data indicator for the outer
headers (where we do have a match), and a wrong matching meta-data
indicator for the inner headers (where we don't have a match).

Fix that by taking into account the matching on the tunnel info and
relating the match level of the encapsulated packet to the firmware
inner headers indicator in case of decap.

As for vxlan we mandate a match on the tunnel udp dst port, and in general
we practically madndate a match on the source or dest ip for any IP tunnel,
the fix was done in a minimal manner around the tunnel match parsing code.

Fixes: d708f90298 ('net/mlx5e: Get the required HW match level while parsing TC flow matches')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-05 12:10:19 -08:00
Raed Salem 82eaa1fa04 net/mlx5e: FPGA, fix Innova IPsec TX offload data path performance
At Innova IPsec TX offload data path a special software parser metadata
is used to pass some packet attributes to the hardware, this metadata
is passed using the Ethernet control segment of a WQE (a HW descriptor)
header.

The cited commit might nullify this header, hence the metadata is lost,
this caused a significant performance drop during hw offloading
operation.

Fix by restoring the metadata at the Ethernet control segment in case
it was nullified.

Fixes: 37fdffb217 ("net/mlx5: WQ, fixes for fragmented WQ buffers API")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-05 12:10:19 -08:00
Tonghao Zhang fc9c5a4a5a net/mlx5: Fix code style issue in mlx driver
Add the tab before '}' and keep the code style consistent.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 18:01:56 -08:00
Jason Gunthorpe 6a8a2aa62d Linux 5.0-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlxXYaEeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGkSQH/2yrfnviNPFYpZOR
 QQdc71Bfhkd8m85SmWIsSebkxmi3hKFVj15sGbWXd6+0/VxjEEGvQCZpvVwJceke
 LwDxtkKGg/74wAqJvlSAWxFNZ+Had4jDeoSoeQChddsBVXBBCxQx2v6ECg3o2x7W
 k8Z8t4+3RijDf8fYXY9ETyO2zW8R/wgT+dnl+DPgUH7u4dxh7FzAUfc4bgZIDg+i
 FzBQfbTJuz4BU7uRZ9IJiwhWKv0Iyi2DR3BY8Z1pqEpRaUMJMrCs2WGytHbTgt9e
 0EtO1airbVneU4eumU/ZaF9cyEbah9HousEPnP7J09WG4s/Odxc4zE+uK1QqS2im
 5Xv88is=
 =dVd1
 -----END PGP SIGNATURE-----

Merge tag 'v5.0-rc5' into rdma.git for-next

Linux 5.0-rc5

Needed to merge the include/uapi changes so we have an up to date
single-tree for these files. Patches already posted are also expected to
need this for dependencies.
2019-02-04 14:53:42 -07:00
Jakub Kicinski bff5731d43 net: devlink: report cell size of shared buffers
Shared buffer allocation is usually done in cell increments.
Drivers will either round up the allocation or refuse the
configuration if it's not an exact multiple of cell size.
Drivers know exactly the cell size of shared buffer, so help
out users by providing this information in dumps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:25:34 -08:00
Moni Shoua 46861e3e88 net/mlx5: Set ODP SRQ support in firmware
To avoid compatibility issue with older kernels the firmware doesn't
allow SRQ to work with ODP unless kernel asks for it.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-02-03 12:49:59 +02:00
Jiri Pirko a97cfe4de1 mlxsw: spectrum_acl: Add C-TCAM spill tracepoint
Add some visibility to the rule addition process and trace whenever rule
spilled into C-TCAM.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 10:00:40 -08:00
Jiri Pirko cb56e21467 mlxsw: spectrum_acl: Include delta bits into hashtable key
Currently only ERP mask masked bits in key are considered for
the hashtable key. That leads to false negative collisions
and fallbacks to C-TCAM in case two keys differ only in delta bits.

Fix this by taking full encoded key as a hashtable key,
including delta bits.

Reported-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 10:00:40 -08:00
David S. Miller eaf2a47f40 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
Ido Schimmel 2adeb5f1c3 mlxsw: spectrum_switchdev: Add more extack messages
Add more extack messages that let the user know why VXLAN offload
failed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 10:43:15 -08:00
Jiri Pirko 3021afe168 mlxsw: spectrum_acl: Fix rul/rule typo
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 10:43:15 -08:00
Jiri Pirko 038418eeb9 mlxsw: spectrum_acl: Move mr_ruleset and mr_rule structs
Move the struct to the place where they belong, alongside with the rest
of the MR code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 10:43:15 -08:00
Jiri Pirko 42d704e018 mlxsw: spectrum_acl: Remove unnecessary arg on action_replace call path
No need to pass ruleset/group and chunk pointers on action_replace call
path, nobody uses them.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 10:43:15 -08:00
David S. Miller 3da15ad3e9 mlx5-fixes-2019-01-25
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcS2rdAAoJEEg/ir3gV/o+CZ8IAMX4yWUWjzgrIDdOc1OEgHJQ
 FDCMtTDm/I+m1UInAgoigRia2RyIL99N81Vrx+p+otCOml/U5IXPt5tuJfSNQYZ3
 p9MibQSXtKJ9jhowOVZdYlTx9CXti75BS45wHKZPlL7V+gCC2Q7/8aviuRlkQKAN
 4pQNDreNiiB8PznXG/yCyzEu0Kc9M/Bv89xdLynVpDTxBfEfI8xSv4PttUcAxaQ5
 Fs/489wuODyhAczOquQj1dvn61zUamaPp6z0h+GbZOCQv0AO7ABRZl+7o5C5r+rA
 i7R4ak6WIgW3snIzhCdpynPfas8kUIvxJjkNCZuoFsefrBRsNv5RvtfCkuzyms4=
 =3epF
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2019-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-01-25

This series introduces some fixes to mlx5 driver.
For more information please see tag log below.

Please pull and let me know if there is any problem.

For -stable v4.13
('net/mlx5e: Allow MAC invalidation while spoofchk is ON')

For -stable v4.18
('Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27 11:06:45 -08:00
David S. Miller 1d68101367 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-27 10:43:17 -08:00
Saeed Mahameed b832d4fdf1 net/mlx5e: Reuse fold sw stats in representors
Representors software stats are basic, this patch is reusing the
mlx5e_fold_sw_stats in representors, which sums up the basic stats64 for a
mlx5e netdevice.

Fixes: 8bfaf07f78 ("net/mlx5e: Present SW stats when state is not opened")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:15 -08:00
Tariq Toukan 168af00a3b net/mlx5e: Present the representors SW stats when state is not opened
This behavior is already adopted for all other cases in the cited patch.
The representor's functions were missed, here we modify the them to
behave similarly.

Fixes: 8bfaf07f78 ("net/mlx5e: Present SW stats when state is not opened")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:14 -08:00
Saeed Mahameed 9659e49a6d net/mlx5e: Separate between ethtool and netdev software stats folding
mlx5e_grp_sw_update_stats can be called from two threads,
1) ndo_get_stats64
2) get_ethtool_stats

For this reason and to minimize concurrency issue impact on 64bit machines
mlx5e_grp_sw_update_stats folds the software stats into a temporary
variable then copies it to the global driver stats, both ethtool and ndo
statistics callbacks will use the global software stats variable to report
whatever stats they need.

Actually ndo_get_stats64 doesn't need to fold the whole software stats
(mlx5e_grp_sw_update_stats), all it needs is five counters to fill the
rtnl_link_stats64 relevant stats parameter.

Hence this patch introduces a simpler helper function to fold software
stats for ndo_get_stats64 which will work directly on rtnl_link_stats64
stats parameter and not on the global or even temporary mlx5e_sw_stats
variable.

Since now mlx5e_grp_sw_update_stats is not called by ndo_get_stats64 we
can make it static and remove the temp var.

Unlike mlx5e_grp_sw_update_stats the new fold stats function doesn't
need to zero out the output statistics parameter since it is already
done by the stack @dev_get_stats().

This patch is fixing stack usage of mlx5e_grp_sw_update_stats on
x86 gcc-4.9 and higher, the concurrency issue between mlx5's
ndo_get_stats64 and get_ethtool_stats is resolved as well.

Fixes: 8bfaf07f78 ("net/mlx5e: Present SW stats when state is not opened")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:14 -08:00
Or Gerlitz 8e4ca98609 net/mlx5: Add trace points for flow tables create/destroy
We were not tracking flow tables so far, add it up.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:14 -08:00
Jason Gunthorpe 71129676ab net/mlx5e: Return the allocated flow directly from __mlx5e_add_fdb_flow
This confusing construction confuses the compiler which can't see
that flow is initialized if !err:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function `mlx5e_configure_flower`
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:2727:28: warning:
`flow` may be used uninitialized in this function
[-Wmaybe-uninitialized]

There is no reason for two function outputs, just return the
pointer directly and use ERR_PTR to encode a failure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:14 -08:00
Moshe Shemesh 149e566fef net/mlx5e: Expand XPS cpumask to cover all online cpus
Currently we have one cpu in XPS cpumask per tx queue, this is good
enough for default configuration where there is a tx queue per cpu.
However, once configuration changes to use less tx queues, part of the
cpus are not XPS-mapped and so the select queue decision falls back to
hash calculation and balancing is not guaranteed.

Expand XPS cpumask to enable using all cpus even when number of tx
queues is smaller than number of cpus.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:14 -08:00
Tariq Toukan 79d356ef2c net/mlx5e: Take CQ decompress fields into a separate structure
Only the Receive CQ makes use of these fields. Take them
out into a separate struct and save space in the generic
CQ structure.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:13 -08:00
Tariq Toukan 9481627838 net/mlx5e: RX, Make sure packet header does not cross page boundary
In the non-linear SKB memory scheme of Striding RQ, a packet header
could cross page boundary. This requires special care in fast path
that costs LoC, additional runtime instructions and branches.

It could happen when the header (up to 256B) does not fit in
a single stride. Avoid this by working with a stride size that fits
the maximum possible header. Stride is increased form 64B to 256B.

Performance:
Tested packet rate for UDP streams, single ring, on ConnectX-5.

Configuration:
Set Striding RQ and LRO ON (to enabled the non-linear SKB scheme).
GRO OFF, early drop by TC rule.

64B: 4x worse memory utilization, no page-crossers headers
- No degradation (5,887,305 pps).
- The reduction in memory utilization is compensated by the saving of
  branches tests.

192B: 1.33x worse memory utilization, avoid page-crossers headers
- Before: 5,727,252. After: 5,777,037. ~1% gain.

256B: Same memory util, no page-crossers
- Before: 5,691,885. After: 5,748,007. ~1% gain.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:13 -08:00
Or Gerlitz 6ce966fd26 net/mlx5e: Unblock setting vid 0 for VFs through the uplink rep
It turns out that libvirt uses 0-vid as a default if no vlan was
set for the guest (which is the case for switchdev mode) and errs
if we disallow that:

error: Failed to start domain vm75
error: Cannot set interface MAC/vlanid to 6a:66:2d:48:92:c2/0 \
		for ifname enp59s0f0 vf 0: Operation not supported

So allow this in order not to break existing systems.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Maor Dickman <maord@mellanox.com>
Reviewed-by: Gavi Teitz <gavi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:29 -08:00
Or Gerlitz c12ecc2305 net/mlx5e: Move to use common phys port names for vport representors
With VF LAG commit 491c37e49b "net/mlx5e: In case of LAG, one switch
parent id is used for all representors", both uplinks and all the VFs
(on both of them) get the same switchdev id.

This cause the provisioning system method to identify the rep of a given
VF from the parent PF PCI device using switchev id and physical port
name to break, since VFm of PF0 will have the (id, name) as VFm of PF1.

To fix that, we align to use the framework agreed upstream and set by
nfp commit 168c478e10 "nfp: wire get_phys_port_name on representors":

$ cat /sys/class/net/eth4_*/phys_port_name
p0
pf0vf0
pf0vf1

Now, the names will be different, e.g. pf0vf0 vs. pf1vf0.

Fixes: 491c37e49b ("net/mlx5e: In case of LAG, one switch parent id is used for all representors")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Waleed Musa <waleedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:29 -08:00
Aya Levin 9d2cbdc5d3 net/mlx5e: Allow MAC invalidation while spoofchk is ON
Prior to this patch the driver prohibited spoof checking on invalid MAC.
Now the user can set this configuration if it wishes to.

This is required since libvirt might invalidate the VF Mac by setting it
to zero, while spoofcheck is ON.

Fixes: 1ab2068a4c ("net/mlx5: Implement vports admin state backup/restore")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:29 -08:00
Moni Shoua 33814e5d12 net/mlx5: Take lock with IRQs disabled to avoid deadlock
The lock in qp_table might be taken from process context or from
interrupt context. This may lead to a deadlock unless it is taken with
IRQs disabled.

Discovered by lockdep

================================
WARNING: inconsistent lock state
4.20.0-rc6
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W}

python/12572 [HC1[1]:SC0[0]:HE0:SE1] takes:
00000000052a4df4 (&(&table->lock)->rlock#2){?.+.}, /0x50 [mlx5_core]
{HARDIRQ-ON-W} state was registered at:
  _raw_spin_lock+0x33/0x70
  mlx5_get_rsc+0x1a/0x50 [mlx5_core]
  mlx5_ib_eqe_pf_action+0x493/0x1be0 [mlx5_ib]
  process_one_work+0x90c/0x1820
  worker_thread+0x87/0xbb0
  kthread+0x320/0x3e0
  ret_from_fork+0x24/0x30
irq event stamp: 103928
hardirqs last  enabled at (103927): [] nk+0x1a/0x1c
hardirqs last disabled at (103928): [] unk+0x1a/0x1c
softirqs last  enabled at (103924): [] tcp_sendmsg+0x31/0x40
softirqs last disabled at (103922): [] 80

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&table->lock)->rlock#2);

    lock(&(&table->lock)->rlock#2);

 *** DEADLOCK ***

Fixes: 032080ab43 ("IB/mlx5: Lock QP during page fault handling")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:28 -08:00
Shay Agroskin 92b3277294 net/mlx5e: Fix wrong private flag usage causing checksum disable
MLX5E_PFLAG_* definitions were changed from bitmask to enumerated
values. However, in mlx5e_open_rq(), the proper API (MLX5E_GET_PFLAG macro)
was not used to read the flag value of MLX5E_PFLAG_RX_NO_CSUM_COMPLETE.
Fixed it.

Fixes: 8ff57c18e9 ("net/mlx5e: Improve ethtool private-flags code structure")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:28 -08:00
Bodong Wang 4e046de0f5 Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
This reverts commit 5f5991f36d.

With the original commit, eswitch instance will not be initialized for
a function which is vport group manager but not eswitch manager such as
host PF on SmartNIC (BlueField) card. This will result in a kernel crash
when such a vport group manager is trying to access vports in its group.
E.g, PF vport manager (not eswitch manager) tries to configure the MAC
of its VF vport, a kernel trace will happen similar as bellow:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 ...
 RIP: 0010:mlx5_eswitch_get_vport_config+0xc/0x180 [mlx5_core]
 ...

Fixes: 5f5991f36d ("net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reported-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:28 -08:00
David S. Miller 30e5c2c6bf net: Revert devlink health changes.
This reverts the devlink health changes from 9/17/2019,
Jiri wants things to be designed differently and it was
agreed that the easiest way to do this is start from the
beginning again.

Commits reverted:

cb5ccfbe73
880ee82f03
c7af343b4e
ff253fedab
6f9d56132e
fcd852c69d
8a66704a13
12bd0dcefe
aba25279c1
ce019faa70
b8c45a033a

And the follow-on build fix:

o33a0efa4baecd689da9474ce0e8b673eb6931c60

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 10:53:23 -08:00
Shalom Toledo 2bb3e10394 mlxfw: Replace license text with SPDX identifiers and adjust copyrights
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:34:03 -08:00
Tariq Toukan 5e5b9f6272 net/mlx4_core: A write memory barrier is sufficient in EQ ci update
Soften the memory barrier call of mb() by a sufficient wmb() in the
consumer index update of the event queues.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 21:54:13 -08:00
Jack Morgenstein ffe4cfc3da net/mlx4_core: Fix error handling when initializing CQ bufs in the driver
Procedure mlx4_init_user_cqes() handles returns by copy_to_user
incorrectly. copy_to_user() returns the number of bytes not copied.
Thus, a non-zero return should be treated as a -EFAULT error
(as is done elsewhere in the kernel). However, mlx4_init_user_cqes()
error handling simply returns the number of bytes not copied
(instead of -EFAULT).

Note, though, that this is a harmless bug: procedure mlx4_alloc_cq()
(which is the only caller of mlx4_init_user_cqes()) treats any
non-zero return as an error, but that returned error value is processed
internally, and not passed further up the call stack.

In addition, fixes the following sparse warning:
warning: incorrect type in argument 1 (different address spaces)
   expected void [noderef] <asn:1>*to
   got void *buf

Fixes: e45678973d ("{net, IB}/mlx4: Initialize CQ buffers in the driver when possible")
Reported by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 21:48:26 -08:00
Aya Levin a40ded6043 net/mlx4_core: Add masking for a few queries on HCA caps
Driver reads the query HCA capabilities without the corresponding masks.
Without the correct masks, the base addresses of the queues are
unaligned.  In addition some reserved bits were wrongly read.  Using the
correct masks, ensures alignment of the base addresses and allows future
firmware versions safe use of the reserved bits.

Fixes: ab9c17a009 ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet")
Fixes: 0ff1fb654b ("{NET, IB}/mlx4: Add device managed flow steering firmware API")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 21:48:26 -08:00
Michael Guralnik ce4eee5340 net/mlx5: Add pci AtomicOps request
Calling pci_enable_atomic_ops_to_root enables AtomicOp requests to pci
root port.

AtomicOp requests will be enabled only if the completer and all
intermediate pci bridges support PCI atomic operations.
This, together with appropriate settings in the NVCONFIG should enable
PCI atomic operations on the device.

PCI atomic operations were first introduced in PCI Express Base Specification
2.1. The Supported operations are Swap (Unconditional Swap), CAS (Compare and
Swap) and FetchAdd (Fetch and Add).

Unlike other atomic operation modes PCI atomic operations gives the user
the option to do atomic operations on local memory, without involving verbs
api, without it compromising the operation's atomicity.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-24 14:26:09 +02:00
Jason Gunthorpe e355477ed9 net/mlx5: Make mlx5_cmd_exec_cb() a safe API
APIs that have deferred callbacks should have some kind of cleanup
function that callers can use to fence the callbacks. Otherwise things
like module unloading can lead to dangling function pointers, or worse.

The IB MR code is the only place that calls this function and had a
really poor attempt at creating this fence. Provide a good version in
the core code as future patches will add more places that need this
fence.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-24 14:25:26 +02:00
Ido Schimmel 02d21b59d5 mlxsw: spectrum_nve: Enable VXLAN on Spectrum-2
Enable VXLAN on Spectrum-2 as previous patches added the required
functionality.

Note that for now Spectrum-1 and Spectrum-2 use the same function to
determine whether the VXLAN configuration is valid or not. In the
future, when the driver will be extended to support features not present
in Spectrum-1, two different functions will be needed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-23 09:28:27 -08:00