linux

Commit Graph

Author	SHA1	Message	Date
Eyal Davidovich	fd4572b3ff	net/mlx5: Add monitor commands layout and event data Will be used in downstream patch to monitor counter changes by the HCA and report it to the driver by an event. The driver will update its counters cached data accordingly. Signed-off-by: Eyal Davidovich <eyald@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-10 14:00:08 -08:00
Mikhael Goikhman	8d6b57e644	net/mlx5: Add support for plugged-disabled cable status in PME Support a new hardware module status in port module events: - module_status=0x4 (Cable plugged, but disabled) Signed-off-by: Mikhael Goikhman <migo@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-10 14:00:08 -08:00
Mikhael Goikhman	37a12aae06	net/mlx5: Add support for PCIe power slot exceeded error in PME Support a new hardware error type in port module events: - error_type=0xc (PCIe system power slot exceeded) Signed-off-by: Mikhael Goikhman <migo@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-10 14:00:08 -08:00
Mikhael Goikhman	c2fb3db22d	net/mlx5: Rework handling of port module events Add explicit HW defined error values. For simplicity, keep counters for all statuses starting from 0, although currently status=0 is not used. Additionally, when HW signals an unexpected cable status, it is reported now rather than ignored. And status counter is now updated on errors. Signed-off-by: Mikhael Goikhman <migo@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-10 14:00:08 -08:00
David S. Miller	4cc1feeb6f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Several conflicts, seemingly all over the place. I used Stephen Rothwell's sample resolutions for many of these, if not just to double check my own work, so definitely the credit largely goes to him. The NFP conflict consisted of a bug fix (moving operations past the rhashtable operation) while chaning the initial argument in the function call in the moved code. The net/dsa/master.c conflict had to do with a bug fix intermixing of making dsa_master_set_mtu() static with the fixing of the tagging attribute location. cls_flower had a conflict because the dup reject fix from Or overlapped with the addition of port range classifiction. __set_phy_supported()'s conflict was relatively easy to resolve because Andrew fixed it in both trees, so it was just a matter of taking the net-next copy. Or at least I think it was :-) Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup() intermixed with changes on how the sdif and caller_net are calculated in these code paths in net-next. The remaining BPF conflicts were largely about the addition of the __bpf_md_ptr stuff in 'net' overlapping with adjustments and additions to the relevant data structure where the MD pointer macros are used. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-09 21:43:31 -08:00
Saeed Mahameed	7300375f18	net/mlx5: Move flow counters data structures from flow steering header After the following flow counters API refactoring: ("net/mlx5: Use flow counter IDs and not the wrapping cache object") flow counters private data structures mlx5_fc_cache and mlx5_fc are redundantly exposed in fs_core.h, they have nothing to do with flow steering core and they are private to fs_counter.c, this patch moves them to where they belong and reduces their exposure in the driver. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-09 18:16:16 -08:00
Tariq Toukan	6254adeb1f	net/mlx5: Use helper to get CQE opcode Introduce and use a helper that extracts the opcode from a CQE (completion queue entry) structure. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-09 18:16:16 -08:00
Daniel Jurgens	fe206c2093	net/mlx5: When fetching CQEs return CQE instead of void pointer The function is only used to retrieve CQEs, use the proper type as the return value. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-09 18:16:16 -08:00
Tarick Bedeir	bd5122cd1e	net/mlx4_core: Correctly set PFC param if global pause is turned off. rx_ppp and tx_ppp can be set between 0 and 255, so don't clamp to 1. Fixes: `6e8814ceb7` ("net/mlx4_en: Fix mixed PFC and Global pause user control requests") Signed-off-by: Tarick Bedeir <tarick@google.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-08 21:26:36 -08:00
Petr Machata	8a5969d8a8	mlxsw: spectrum_nve: Un/offload FDB on nve_fid_disable/enable Any existing NVE FDB entries need to be offloaded when NVE is enabled for a given FID. Recent patches have added fdb_replay op for this, so just invoke it from mlxsw_sp_nve_fid_enable(). When NVE is disabled on a FID, any existing FDB offloaded marks need to be cleared on NVE device as well as on its bridge master. An op to handle this, fdb_clear_offload, has been added to FID ops and NVE ops in previous patches. Add code to resolve the NVE device, NVE type, and dispatch to both fdb_clear_offload ops. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-07 12:59:08 -08:00
Petr Machata	83de78831b	mlxsw: spectrum: Add mlxsw_sp_fid_ops.fdb_clear_offload If there are any offloaded FDB entries at bridge master of an NVE device at the time that it's un-offloaded, their offloaded marks need to be cleared. How that is done depends on whether the bridge in question is vlan aware. Therefore add a per-FID-type operation. Implement the operation for the 802.1q and 802.1d bridges. Add and publish a function mlxsw_sp_fid_fdb_clear_offload() to dispatch to the new operation according to FID type. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-07 12:59:08 -08:00
Petr Machata	b73ef0e0ee	mlxsw: spectrum_nve: Add mlxsw_sp_nve_ops.fdb_clear_offload If there are any offloaded FDB entries at an NVE device at the time that it's un-offloaded, their offloaded marks need to be cleared. How that is done depends on NVE device type, and therefore add a per-NVE-type operation. Implement the operation for the sole NVE device type currently supported by mlxsw, VXLAN. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-07 12:59:08 -08:00
Petr Machata	a6ef5a48a3	mlxsw: spectrum_nve: Add mlxsw_sp_nve_ops.fdb_replay A replay of FDB needs to be performed so that the FDB entries existing at the NVE device are offloaded. How the replay is done depends on NVE device type, and therefore add a per-NVE-type operation. Implement the operation for the sole NVE device type currently supported by mlxsw, VXLAN. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-07 12:59:08 -08:00
Petr Machata	34139ede05	mlxsw: spectrum_switchdev: Publish mlxsw_sp_switchdev_notifier The notifier block will need to be passed to vxlan_fdb_replay() in a follow-up patch. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-07 12:59:08 -08:00
Petr Machata	2a36c12520	mlxsw: spectrum: Track NVE type at FIDs A follow-up patch will add support for replay and for clearing of offload marks. These are NVE type-sensitive operations, and to be able to dispatch them properly, a FID needs to know what NVE type is attached to it. Therefore, track the NVE type at struct mlxsw_sp_fid. Extend mlxsw_sp_fid_vni_set() to take it as an argument, and add mlxsw_sp_fid_nve_type(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-07 12:59:08 -08:00
Ido Schimmel	993107fea5	mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl When deleting a VLAN device using an ioctl the netdev is unregistered before the VLAN filter is updated via ndo_vlan_rx_kill_vid(). It can lead to a use-after-free in mlxsw in case the VLAN device is deleted while being enslaved to a bridge. The reason for the above is that when mlxsw receives the CHANGEUPPER event, it wrongly assumes that the VLAN device is no longer its upper and thus destroys the internal representation of the bridge port despite the reference count being non-zero. Fix this by checking if the VLAN device is our upper using its real device. In net-next I'm going to remove this trick and instead make mlxsw completely agnostic to the order of the events. Fixes: `c57529e1d5` ("mlxsw: spectrum: Replace vPorts with Port-VLAN") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-06 13:31:09 -08:00
Nir Dotan	da93d2913f	mlxsw: spectrum_router: Relax GRE decap matching check GRE decap offload is configured when local routes prefix correspond to the local address of one of the offloaded GRE tunnels. The matching check was found to be too strict, such that for a flat GRE configuration, in which the overlay and underlay traffic share the same non-default VRF, decap flow was not offloaded. Relax the check for decap flow offloading. A match occurs if the local address of the tunnel matches the local route address while both share the same VRF table. Fixes: `4607f6d269` ("mlxsw: spectrum_router: Support IPv4 underlay decap") Signed-off-by: Nir Dotan <nird@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-06 13:31:08 -08:00
Ido Schimmel	f58a83c207	mlxsw: spectrum_switchdev: Avoid leaking FID's reference count It should never be possible for a user to set a VNI on a FID in case one is already set. The driver therefore returns an error, but fails to drop the reference count taken earlier when calling mlxsw_sp_fid_8021d_lookup(). Drop the reference when this unlikely error is hit. Fixes: `1c30d1836a` ("mlxsw: spectrum: Enable VxLAN enslavement to bridges") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-06 13:31:08 -08:00
Ido Schimmel	050fc01fb1	mlxsw: spectrum_nve: Remove easily triggerable warnings It is possible to trigger a warning in mlxsw in case a flood entry which mlxsw is not aware of is deleted from the VxLAN device. This is because mlxsw expects to find a singly linked list where the flood entry is present in. Fix by removing these warnings for now. Will re-add them in the next release after we teach mlxsw to ask for a dump of FDB entries from the VxLAN device, once it is enslaved to a bridge mlxsw cares about. Fixes: `6e6030bd54` ("mlxsw: spectrum_nve: Implement common NVE core") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-06 13:31:08 -08:00
Tariq Toukan	8ff57c18e9	net/mlx5e: Improve ethtool private-flags code structure Refactor the code of private-flags setter. Replace consecutive calls to mlx5e_handle_pflag with a loop that uses a preset set of parameters. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-05 16:00:37 -08:00
Aya Levin	756c41603a	net/mlx5e: ethtool, Support user configuration for RX hash fields Enable user configuration of RX hash fields that are used for traffic spreading into RX queues. User can change built-in RSS (Receive Side Scaling) profiles on the following traffic types: UDP4, UDP6, TCP4 and TCP6. This configuration effects both outer and inner headers. Added support for ethtool commands: ETHTOOL_SRXFH and ETHTOOL_GRXFH. Command example respectively: $ethtool -N eth1 rx-flow-hash tcp4 sdfn $ethtool -n eth1 rx-flow-hash tcpp4 IP SA IP DA L4 bytes 0 & 1 [TCP/UDP src port] L4 bytes 2 & 3 [TCP/UDP dst port] Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-05 16:00:37 -08:00
Aya Levin	bbeb53b8b2	net/mlx5e: Move RSS params to a dedicated struct Remove RSS params from params struct under channels, and introduce a new struct with RSS configuration params under priv struct. There is no functional change here. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-05 16:00:36 -08:00
Aya Levin	d930ac796f	net/mlx5e: Refactor TIR configuration function Refactor mlx5e_build_indir_tir_ctx_hash for better code re-use. TIR stands for Transport Interface Receive, which is responsible for all transport related operations on the receive side. Added a static array with TIR default configuration values. This separates configuration values from command setting, which is needed for downstream patch. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-05 16:00:33 -08:00
Aya Levin	080d1b17fb	net/mlx5e: Move modify tirs hash functionality Move modify tirs hash functionality (mlx5e_modify_tirs_hash) from en_ethtool.c to en_main.c. This allows future use of this fuctionality from en_fs_ethtool.c, while keeping current convention: en_ethtool.c doesn't have an API. There is no functional change here. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-04 18:20:01 -08:00
Gal Pressman	3054383109	net/mlx5e: Cleanup unused defines Remove couple of defines that are no longer used. Signed-off-by: Gal Pressman <pressmangal@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-04 18:20:00 -08:00
Saeed Mahameed	8742c7eb3d	net/mlx5e: Remove trailing space of tx_pause ethtool counter name tx_pause_storm_warning_events ethtool counter name has a trailing space, remove it. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>	2018-12-04 18:20:00 -08:00
Ido Schimmel	2f4f44946b	mlxsw: spectrum: Mirror loopbacked packets instead of trapping them When the ASIC detects that a unicast packet is routed through the same router interface (RIF) from which it ingressed (iRIF == eRIF), it raises a trap called loopback error (LBERROR). Thus far, this trap was configured to send a sole copy of the packet to the CPU so that ICMP redirect packets could be potentially generated by the kernel. This is problematic as the CPU cannot forward packets at 3.2Tb/s and there are scenarios (e.g., "one-armed router") where iRIF == eRIF is not an exception. Solve this by changing the trap to send a copy of the packet to the CPU. To prevent the kernel from forwarding the packet again, it is marked with 'offload_l3_fwd_mark'. The trap is configured in a trap group of its own with a dedicated policer in order not to prevent packets trapped by other traps from reaching the CPU. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-04 08:36:36 -08:00
Ido Schimmel	875e893995	skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark' Commit `abf4bb6b63` ("skbuff: Add the offload_mr_fwd_mark field") added the 'offload_mr_fwd_mark' field to indicate that a packet has already undergone L3 multicast routing by a capable device. The field is used to prevent the kernel from forwarding a packet through a netdev through which the device has already forwarded the packet. Currently, no unicast packet is routed by both the device and the kernel, but this is about to change by subsequent patches and we need to be able to mark such packets, so that they will no be forwarded twice. Instead of adding yet another field to 'struct sk_buff', we can just rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark', as a packet either has a multicast or a unicast destination IP. While at it, add a comment about both 'offload_fwd_mark' and 'offload_l3_fwd_mark'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-04 08:36:36 -08:00
Leon Romanovsky	f3da6577da	RDMA/mlx5: Initialize SRQ tables on mlx5_ib Transfer initialization and cleanup from mlx5_priv struct of mlx5_core_dev to be part of mlx5_ib_dev. This completes removal of SRQ from mlx5_core. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-12-04 09:25:50 +02:00
Leon Romanovsky	f02d0d6e53	net/mlx5: Move SRQ functions to RDMA part There is no need to keep SRQ which is RDMA object in mlx5_core. In this patch, we partially move the execution code, while next patches will move table initialization/release logic too. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-12-04 09:14:30 +02:00
Leon Romanovsky	c23f88cb57	net/mlx5: Remove references to local mlx5_core functions As a preparation to move SRQ functionality to RDMA, drop all references to mlx5_core logic and make SRQ be dependent on shared code only. Most of the time, we are interested to know if events are working/not working and it is possible with previous commit ("net/mlx5: Debug print for forwarded async events"). Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-12-04 09:14:25 +02:00
Leon Romanovsky	26d1164dff	net/mlx5: Remove not-used lib/eq.h header file lib/eq.h is needed for EQ manipulation which are not performed in SRQ. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-12-04 09:14:20 +02:00
Leon Romanovsky	5b5f0f1627	net/mlx5: Remove dead transobj code Delete functions which are not called and not needed. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-12-04 09:14:15 +02:00
Leon Romanovsky	6cd0014ab9	net/mlx5: Align SRQ licenses and copyright information Ensure that both RDMA and netdev parts of SRQ implementation has same copyright and license information annotated by SPDX tags. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-12-04 09:14:09 +02:00
Erez Alfasi	92a59ad040	net/mlx4_core: Fix several coding style errors Fix 3 checkpatch errors within mlx4/main.c: - Unnecessary mlx4_debug_level global variable initialization to 0. - Prohibited space before comma. - Whitespaces instead of tab. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-03 16:34:32 -08:00
Erez Alfasi	95aac2cdaf	net/mlx4_core: Fix return codes of unsupported operations Functions __set_port_type and mlx4_check_port_params returned -EINVAL while the proper return code is -EOPNOTSUPP as a result of an unsupported operation. All drivers should generate this and all users should check for it when detecting an unsupported functionality. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-03 16:34:31 -08:00
Saeed Mahameed	1b603f9e43	net/mlx4_en: Fix build break when CONFIG_INET is off MLX4_EN depends on NETDEVICES, ETHERNET and INET Kconfigs. Make sure they are listed in MLX4_EN Kconfig dependencies. This fixes the following build break: drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: ‘struct iphdr’ declared inside parameter list [enabled by default] struct iphdr *iph) ^ drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] drivers/net/ethernet/mellanox/mlx4/en_rx.c: In function ‘get_fixed_ipv4_csum’: drivers/net/ethernet/mellanox/mlx4/en_rx.c:586:20: error: dereferencing pointer to incomplete type _u8 ipproto = iph->protocol; Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-03 16:16:22 -08:00
Eran Ben Elisha	24be19e477	net/mlx4_en: Change min MTU size to ETH_MIN_MTU NIC driver minimal MTU size shall be set to ETH_MIN_MTU, as defined in the RFC791 and in the network stack. Remove old mlx4_en only define for it, which was set to wrong value. Fixes: `b80f71f581` ("ethernet/mellanox: use core min/max MTU checking") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-03 16:16:22 -08:00
Shalom Toledo	064501c5b6	mlxsw: spectrum: Load firmware version based on devlink parameter Load firmware version based on 'fw_load_policy' devlink parameter. The driver supports these two options: * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER (0) Default, load firmware version preferred by the driver * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH (1) Load firmware currently stored in flash The second option, 'flash', allow the device to run with different firmware version than preferred by the driver for testing and/or debugging purposes. For example, testing a firmware bug fix. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-03 13:55:43 -08:00
Shalom Toledo	03bffcad49	mlxsw: core: Reset firmware after flash during driver initialization After flashing new firmware during the driver initialization flow (reload or not), the driver should do a firmware reset when it gets -EAGAIN in order to load the new one. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-03 13:55:43 -08:00
Cong Wang	ef6fcd4552	mlx5: fix get_ip_proto() IP header is not necessarily located right after struct ethhdr, there could be multiple 802.1Q headers in between, this is why we call __vlan_get_protocol(). Fixes: `fe1dc06999` ("net/mlx5e: don't set CHECKSUM_COMPLETE on SCTP packets") Cc: Alaa Hleihel <alaa@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 17:19:33 -08:00
Ido Schimmel	d70e42b22d	mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges Commit `1c30d1836a` ("mlxsw: spectrum: Enable VxLAN enslavement to bridges") enabled the enslavement of VxLAN devices to bridges that have mlxsw ports (or their upper) as slaves. This patch extends mlxsw to also support VLAN-aware bridges. The patch is similar in nature to mentioned commit, but there is one major difference. With VLAN-aware bridges, the VxLAN device's VNI is mapped to the VLAN that is configured as PVID and egress untagged on the bridge port. Therefore, the driver is extended to listen to VLAN configuration on VxLAN devices of interest and enable / disable NVE encapsulation on the corresponding 802.1Q FIDs. To prevent ambiguity, the driver makes sure that a given VLAN is not configured as PVID and egress untagged on multiple VxLAN devices. This sanitization takes place both when a port is enslaved to a bridge with existing VxLAN devices and when a VLAN is added to / removed from a VxLAN device of interest. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 17:06:29 -08:00
Ido Schimmel	48fde46606	mlxsw: spectrum_switchdev: Prepare function for VLAN-aware bridges The vxlan_join() function resolves the FID on which the VNI should be set and then sets the VNI. Currently, the FID is simply resolved according to the ifindex of the bridge device to which the VxLAN device is enslaved. This works because only VLAN-unaware bridges are supported. With VLAN-aware bridges the FID would need to be resolved based on the VLAN to which the VNI is mapped to. Add the VLAN ID to the argument list of the function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 17:06:29 -08:00
Ido Schimmel	b03fa9e7e0	mlxsw: spectrum_switchdev: Unify VxLAN leave function The function mlxsw_sp_bridge_vxlan_leave() is currently split between VLAN-aware and VLAN-unaware bridges, but actually both types can use the same function. The function needs to resolve the FID that corresponds to the VxLAN device and disable NVE encapsulation on it. Instead of looking up the FID differently for VLAN-aware and VLAN-unaware bridges, we can always use the VxLAN's device VNI. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 17:06:29 -08:00
Ido Schimmel	5a8fb370be	mlxsw: spectrum_fid: Add API to lookup 802.1Q FIDs without creating them In a similar fashion to commit `564c6d727a` ("mlxsw: spectrum_fid: Add APIs to lookup FID without creating it"), add a corresponding API to lookup 802.1Q FIDs. This is a prerequisite to VxLAN support with VLAN-aware bridges and will allow us to resolve a 802.1Q FID by its VLAN when an FDB entry is added on the bridge port of the VxLAN device. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 17:06:29 -08:00
Saeed Mahameed	93631211c9	net/mlx5: Debug print for forwarded async events Print a debug message for every async FW event forwarded to mlx5 interfaces (mlx5e netdev and mlx5_ib rdma module). Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-29 16:40:32 -08:00
Saeed Mahameed	4e2df04ad2	net/mlx5: Forward SRQ resource events Allow forwarding of SRQ events to mlx5_core interfaces, e.g. mlx5_ib. Use mlx5_notifier_register/unregister in srq.c in order to allow seamless transition of srq.c to infiniband subsystem. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-29 16:40:32 -08:00
Saeed Mahameed	451be51c0b	net/mlx5: Forward QP/WorkQueues resource events Allow forwarding QP and WQ events to mlx5_core interfaces, e.g. mlx5_ib Use mlx5_notifier_register/unregister in qp.c in order to allow seamless transition of qp.c to infiniband subsystem. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-29 16:40:32 -08:00
Saeed Mahameed	b8267cd765	net/mlx5: Remove all deprecated software versions of FW events Before the new mlx5 event notification infrastructure and API, mlx5_core used to process all events before forwarding them to mlx5 interfaces (mlx5e/mlx5_ib) and used to translate the event type enum to a software defined enum, this is not needed anymore since it is ok for mlx5e and mlx5_ib to receive FW events as is, at least the few ones mlx5 core allows. mlx5e and mlx5_ib already moved to use the new API and they only handle FW events types, it is now safe to remove all equivalent software defined events and the logic around them. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-29 16:40:32 -08:00
Saeed Mahameed	cb6191bf25	net/mlx5: Allow forwarding event type general event as is FW general event is used by mlx5_ib for RQ delay drop timeout event handling, in this patch we allow to forward FW general event type to mlx5 notifiers chain so mlx5_ib can handle it and to deprecate the software version of it. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-29 16:40:31 -08:00
Saeed Mahameed	02039fb659	net/mlx5: Remove unused events callback and logic The mlx5_interface->event callback is not used by mlx5e/mlx5_ib anymore. We totally remove the delayed events logic work around, since with the dynamic notifier registration API it is not needed anymore, mlx5_ib can register its notifier and start receiving events exactly at the moment it is ready to handle them. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-29 16:40:31 -08:00
Saeed Mahameed	58d180b34e	net/mlx5: Forward all mlx5 events to mlx5 notifiers chain This to allow seamless migration to the new notifier chain API, and to eventually deprecate interfaces dev->event callback. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-29 16:40:31 -08:00
Saeed Mahameed	7cffaddd39	net/mlx5e: Use the new mlx5 core notifier API Remove the deprecated mlx5_interface->event mlx5e callback and use new mlx5 notifier API to subscribe for mlx5 events, handle port change event as received from FW rather than handling the mlx5 core processed port change software version event. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-29 16:40:31 -08:00
Saeed Mahameed	7a17955530	net/mlx5: Allow port change event to be forwarded to driver notifiers chain The idea is to allow mlx5 core interfaces (mlx5e/mlx5_ib) to be able to receive some allowed FW events as is via the new notifier API. In this patch we allow forwarding port change event to mlx5 core interfaces (mlx5e/mlx5_ib) as it was received from FW. Once mlx5e and mlx5_ib start using this event we can safely remove the redundant software version of it and its translation logic. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-29 16:40:31 -08:00
Saeed Mahameed	20902be46c	net/mlx5: Driver events notifier API Use atomic notifier chain to fire events to mlx5 core driver consumers (mlx5e/mlx5_ib) and provide mlx5 register/unregister notifier API. This API will replace the current mlx5_interface->event callback and all the logic around it, especially the delayed events logic introduced by commit `97834eba7c` ("net/mlx5: Delay events till ib registration ends") Which is not needed anymore with this new API where the mlx5 interface can dynamically register/unregister its notifier. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-29 16:40:31 -08:00
Ido Schimmel	c2e7490c31	mlxsw: spectrum: Flip driver to use emulated 802.1Q FIDs Replace 802.1Q FIDs and VLAN RIFs with their emulated counterparts. The emulated 802.1Q FIDs are actually 802.1D FIDs and thus use the same flood tables, of per-FID type. Therefore, add 4K-1 entries to the per-FID flood tables for the new FIDs and get rid of the FID-offset flood tables that were used by the old 802.1Q FIDs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:08 -08:00
Ido Schimmel	ba6da02a9c	mlxsw: spectrum_router: Introduce emulated VLAN RIFs Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of "VLAN" type, whereas RIFs constructed on top of VLAN-unaware bridges of "FID" type. In other words, the RIF type is derived from the underlying FID type. VLAN RIFs are used on top of 802.1Q FIDs, whereas FID RIFs are used on top of 802.1D FIDs. Since the previous patch emulated 802.1Q FIDs using 802.1D FIDs, this patch emulates VLAN RIFs using FID RIFs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:08 -08:00
Ido Schimmel	d62dd8a0c8	mlxsw: spectrum_fid: Introduce emulated 802.1Q FIDs The driver uses 802.1Q FIDs when offloading a VLAN-aware bridge. Unfortunately, it is not possible to assign a VNI to such FIDs, which prompts the driver to forbid the enslavement of VxLAN devices to a VLAN-aware bridge. Workaround this hardware limitation by creating a new family of FIDs, emulated 802.1Q FIDs. These FIDs are emulated using 802.1D FIDs, which can be assigned a VNI. The downside of this approach is that multiple {Port, VID}->FID entries are required, whereas only a single VID->FID is required with "true" 802.1Q FIDs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:08 -08:00
Ido Schimmel	7c4a729221	mlxsw: spectrum_fid: Make flood index calculation more robust 802.1D FIDs use a per-FID flood table, where the flood index into the table is calculated by subtracting 4K from the FID's index. Currently, 802.1D FIDs start at 4K, so the calculation is correct, but if it was ever to change, the calculation will no longer be correct. In addition, this change will allow us to reuse the flood index calculation function in the next patch, where we are going to emulate 802.1Q FIDs using 802.1D FIDs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:07 -08:00
Ido Schimmel	6502be9f04	mlxsw: spectrum_switchdev: Do not set field when it is reserved When configuring an FDB entry pointing to a LAG netdev (or its upper), the driver should only set the 'lag_vid' field when the FID (filtering identifier) is of 802.1D type. Extend the 802.1D FID family with an attribute indicating whether this field should be set and based on its value set the field or leave it blank. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:07 -08:00
Saeed Mahameed	2c89156082	net/mlx5: Improve core device events handling Register a separate handler per event type, rather than listening for all events and looking for the events to handle in a switch case. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:34 -08:00
Saeed Mahameed	69c1280b1f	net/mlx5: Device events, Use async events chain Move all the generic async events handling into new specific events handling file events.c to keep eq.c file clean from concrete event logic handling. Use new API to register for NOTIFY_ANY to handle generic events and dispatch allowed events to mlx5_core consumers (mlx5_ib and mlx5e) Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:34 -08:00
Saeed Mahameed	2742bc90bc	net/mlx5: CQ ERR, Use async events chain Remove the explicit call to mlx5_eq_cq_event on MLX5_EVENT_TYPE_CQ_ERROR and register a specific CQ ERROR handler via the new API. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:34 -08:00
Saeed Mahameed	221c14f3d1	net/mlx5: Resource tables, Use async events chain Remove the explicit call to QP/SRQ resources events handlers on several FW events and let resources logic register resources events notifiers via the new API. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:34 -08:00
Saeed Mahameed	71edc69ca1	net/mlx5: CmdIF, Use async events chain Remove the explicit call to mlx5_cmd_comp_handler on MLX5_EVENT_TYPE_CMD and let command interface to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:34 -08:00
Saeed Mahameed	0cf53c1247	net/mlx5: FWPage, Use async events chain Remove the explicit call to mlx5_core_req_pages_handler on MLX5_EVENT_TYPE_PAGE_REQUEST and let FW page logic to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:33 -08:00
Saeed Mahameed	6933a93795	net/mlx5: E-Switch, Use async events chain Remove the explicit call to mlx5_eswitch_vport_event on MLX5_EVENT_TYPE_NIC_VPORT_CHANGE and let the eswitch register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:33 -08:00
Saeed Mahameed	41069256e9	net/mlx5: Clock, Use async events chain Remove the explicit call to mlx5_pps_event on MLX5_EVENT_TYPE_PPS_EVENT and let clock logic to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:33 -08:00
Saeed Mahameed	a52a7d01fd	net/mlx5: FPGA, Use async events chain Remove the explicit call to mlx5_fpga_event on MLX5_EVENT_TYPE_FPGA_ERROR or MLX5_EVENT_TYPE_FPGA_QP_ERROR let fpga core to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:33 -08:00
Saeed Mahameed	720a936d40	net/mlx5: FWTrace, Use async events chain Remove the explicit call to mlx5_fw_tracer_event on MLX5_EVENT_TYPE_DEVICE_TRACER and let fw tracer to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:33 -08:00
Saeed Mahameed	0f597ed435	net/mlx5: EQ, Introduce atomic notifier chain subscription API Use atomic_notifier_chain to fire firmware events at internal mlx5 core components such as eswitch/fpga/clock/FW tracer/etc.., this is to avoid explicit calls from low level mlx5_core to upper components and to simplify the mlx5_core API for future developments. Simply provide register/unregister notifiers API and call the notifier chain on firmware async events. Example: to subscribe to a FW event: struct mlx5_nb port_event; MLX5_NB_INIT(&port_event, port_event_handler, PORT_CHANGE); mlx5_eq_notifier_register(mdev, &port_event); where: - port_event_handler is the notifier block callback. - PORT_EVENT is the suffix of MLX5_EVENT_TYPE_PORT_CHANGE. The above will guarantee that port_event_handler will receive all FW events of the type MLX5_EVENT_TYPE_PORT_CHANGE. To receive all FW/HW events one can subscribe to MLX5_EVENT_TYPE_NOTIFY_ANY. The next few patches will start moving all mlx5 core components to use this new API and cleanup mlx5_eq_async_int misx handler from component explicit calls and specific logic. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-26 13:39:33 -08:00
David S. Miller	b1bf78bfb2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2018-11-24 17:01:43 -08:00
Petr Machata	d17d9f5e51	switchdev: Replace port obj add/del SDO with a notification Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of this field from all clients, which were migrated to use switchdev notification in the previous patches. Add a new function switchdev_port_obj_notify() that sends the switchdev notifications SWITCHDEV_PORT_OBJ_ADD and _DEL. Update switchdev_port_obj_del_now() to dispatch to this new function. Drop __switchdev_port_obj_add() and update switchdev_port_obj_add() likewise. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-23 18:02:24 -08:00
Petr Machata	52a227b30e	mlxsw: spectrum_switchdev: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL Following patches will change the way of distributing port object changes from a switchdev operation to a switchdev notifier. The switchdev code currently recursively descends through layers of lower devices, eventually calling the op on a front-panel port device. The notifier will instead be sent referencing the bridge port device, which may be a stacking device that's one of front-panel ports uppers, or a completely unrelated device. To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking notifier chain. Dispatch to mlxsw_sp_port_obj_add() resp. _del() to maintain the behavior that the switchdev operation based code currently has. Defer to switchdev_handle_port_obj_add() / _del() to handle the recursive descend, because mlxsw supports a number of upper types. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-23 18:02:24 -08:00
Daniel Jurgens	e45678973d	{net, IB}/mlx4: Initialize CQ buffers in the driver when possible Perform CQ initialization in the driver when the capability is supported by the FW. When passing the CQ to HW indicate that the CQ buffer has been pre-initialized. Doing so decreases CQ creation time. Testing on P8 showed a single 2048 entry CQ creation time was reduced from ~395us to ~170us, which is 2.3x faster. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-22 11:38:36 -08:00
Ido Schimmel	20134ee9c5	mlxsw: spectrum_nve: Allow VxLAN learning Up until now the driver returned an error when learning was enabled on a VxLAN device enslaved to an offloaded bridge. Previous patches added VxLAN learning support, so remove the check. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-21 17:10:31 -08:00
Ido Schimmel	8b547a6026	mlxsw: spectrum_switchdev: Allow deletion of learned FDB entries Allow users to delete learned FDB entries from the bridge's FDB before enabling VxLAN learning. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-21 17:10:31 -08:00
Ido Schimmel	981f084b36	mlxsw: spectrum_switchdev: Process learned VxLAN FDB entries Start processing two new entry types in addition to current ones: * Learned unicast tunnel entry * Aged-out unicast tunnel entry In both cases the device reports on a new {MAC, FID, IP address} tuple that was learned / aged-out. Based on this notification, the driver instructs the device to add / delete the entry to / from its database. The driver also makes sure to notify the bridge and VxLAN drivers about the new entry. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-21 17:10:31 -08:00
Ido Schimmel	3c55bdaca0	mlxsw: spectrum_nve: Add API to resolve learned IP addresses FDB notifications for entries learned from an NVE tunnel contain the IP address of the remote VTEP. In the case of IPv4 underlay, the IP address is specified as-is. IPv6 addresses on the other hand, are specified as handles which then need to be used to query the actual address from the device. Only IPv4 underlay is currently supported, so we cannot receive notifications for IPv6 addresses and therefore an error is returned when one tries to resolve such an address. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-21 17:10:31 -08:00
Ido Schimmel	5d44a712e6	mlxsw: spectrum_fid: Allow FID lookup by its index When processing a notification about a new FDB entry learned from a VxLAN tunnel, the driver is provided with the FID index among other parameters. The driver potentially needs to update the bridge and VxLAN drivers about the new entry using a pointer to the VxLAN device and the corresponding VNI. These two parameters are stored in the FID, so add a new function that allows looking up a FID based on its index. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-21 17:10:31 -08:00
Ido Schimmel	5bae63d9b7	mlxsw: spectrum_fid: Store ifindex of NVE device in FID The driver periodically polls for new FDB entries learned by the device. In the case of an FDB entry learned from a VxLAN tunnel, the notification includes the IP of the remote VTEP, the filtering identifier (FID) and the source MAC address of the overlay packet. Assuming learning is enabled in the VxLAN and bridge drivers, the driver needs to generate a notification and update them about the new FDB entry. Store the ifindex of the NVE device in the FID so that the driver will be able to update the VxLAN and bridge drivers using it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-21 17:10:31 -08:00
Ido Schimmel	933b1ecd00	mlxsw: reg: Add definition of unicast tunnel record for SFN register Will be used to process learned FDB records from an NVE tunnel. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-21 17:10:31 -08:00
Michał Mirosław	6c0fbd7262	mlx5: use skb_vlan_tag_get_prio() Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-21 15:41:30 -08:00
Vadim Pasternak	a421ce088a	mlxsw: core: Extend cooling device with cooling levels Extend cooling device with cooling levels vector to allow more flexibility of PWM setting. Thermal zone algorithm operates with the numerical states for PWM setting. Each state is the index, defined in range from 0 to 10 and it's mapped to the relevant duty cycle value, which is written to PWM controller. With the current definition fan speed is set to 0% for state 0, 10% for state 1, and so on up to 100% for the maximum state 10. Some systems have limitation for the PWM speed minimum. For such systems PWM setting speed to 0% will just disable the ability to increase speed anymore and such device will be stall on zero speed. Cooling levels allow to configure state vector according to the particular system requirements. For example, if PWM speed is not allowed to be below 30%, cooling levels could be configured as 30%, 30%, 30%, 30%, 40%, 50% and so on. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-20 10:31:15 -08:00
Saeed Mahameed	6d2d6fc83a	net/mlx5: EQ, Make EQE access methods inline These are one/two liner generic EQ access methods, better have them declared static inline in eq.h. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:07:51 +02:00
Saeed Mahameed	d5d284b829	{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA Use the new generic EQ API to move all ODP RDMA data structures and logic form mlx5 core driver into mlx5_ib driver. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:07:33 +02:00
Saeed Mahameed	7701707cb9	net/mlx5: EQ, Generic EQ Add mlx5_eq_{create/destroy}_generic APIs and EQE access methods, for mlx5 core consumers generic EQs. This API will be used in downstream patch to move page fault (RDMA ODP) EQ logic into mlx5_ib rdma driver, hence it will use a generic EQ. Current mlx5 EQ allocation scheme: On load mlx5 allocates 4 (for async) + #cores (for data completions) MSIX vectors, mlx5 core will assign 3 MSIX vectors for internal async EQs and will use all of the #cores MSIX vectors for completion EQs, (One vector is going to be reserved for a generic EQ). After this patch an external user (e.g mlx5_ib) of mlx5_core can use this new API to create new generic EQs with the reserved msix vector index for that eq. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:07:05 +02:00
Saeed Mahameed	16d760839c	net/mlx5: EQ, Different EQ types In mlx5 we have three types of usages for EQs, 1. Asynchronous EQs, used internally by mlx5 core for a. FW command completions b. FW page requests c. one EQ for all other Asynchronous events 2. Completion EQs, used for CQ completion (we create one per core) 3. Special type of EQ (page fault) used for RDMA on demand paging (ODP). The 3rd type shouldn't be special at least in mlx5 core, it is yet another async events EQ with specific use case, it will be removed in the next two patches, and will completely move its logic to mlx5_ib, as it is rdma specific. In this patch we remove use case (eq type) specific fields from struct mlx5_eq into a new eq type specific structures. struct mlx5_eq_async; truct mlx5_eq_comp; struct mlx5_eq_pagefault; Separate between their type specific flows. In the future we will allow users to create there own generic EQs. for now we will allow only one for ODP in next patches. We will introduce event listeners registration API for those who want to receive mlx5 async events. After that mlx5 eq handling will be clean from feature/user specific handling. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:07:00 +02:00
Saeed Mahameed	f2f3df5501	net/mlx5: EQ, Privatize eq_table and friends Move unnecessary EQ table structures and declaration from the public include/linux/mlx5/driver.h into the private area of mlx5_core and into eq.c/eq.h. Introduce new mlx5 EQ APIs: mlx5_comp_vectors_count(dev); mlx5_comp_irq_get_affinity_mask(dev, vector); And use them from mlx5_ib or mlx5e netdevice instead of direct access to mlx5_core internal structures. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:06:54 +02:00
Saeed Mahameed	d674a9aa43	net/mlx5: EQ, irq_info and rmap belong to eq_table irq_info and rmap are EQ properties of the driver, and only needed for EQ objects, move them to the eq_table EQs database structure. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:06:48 +02:00
Saeed Mahameed	c8e21b3b57	net/mlx5: EQ, Create all EQs in one place Instead of creating the EQ table in three steps at driver load, - allocate irq vectors - allocate async EQs - allocate completion EQs Gather all of the procedures into one function in eq.c and call it from driver load. This will help us reduce the EQ and EQ table private structures visibility to eq.c in downstream refactoring. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:06:42 +02:00
Saeed Mahameed	ca828cb468	net/mlx5: EQ, Move all EQ logic to eq.c Move completion EQs flows from main.c to eq.c, reasons: 1) It is where this logic belongs. 2) It will help centralize the EQ logic in one file for downstream refactoring, and future extensions/updates. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:06:37 +02:00
Saeed Mahameed	aaa553a644	net/mlx5: EQ, Remove redundant completion EQ list lock Completion EQs list is only modified on driver load/unload, locking is not required, remove it. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:06:23 +02:00
Saeed Mahameed	2883f35257	net/mlx5: EQ, No need to store eq index as a field eq->index is used only for completion EQs and is assigned to be the completion eq index, it is used only when traversing the completion eqs list, and it can be calculated dynamically, thus remove the eq->index field. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:06:18 +02:00
Saeed Mahameed	4de45c7586	net/mlx5: EQ, Remove unused fields and structures Some fields and structures are not referenced nor used by the driver, remove them. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:06:07 +02:00
Saeed Mahameed	1e86ace4c1	net/mlx5: EQ, Use the right place to store/read IRQ affinity hint Currently the cpu affinity hint mask for completion EQs is stored and read from the wrong place, since reading and storing is done from the same index, there is no actual issue with that, but internal irq_info for completion EQs stars at MLX5_EQ_VEC_COMP_BASE offset in irq_info array, this patch changes the code to use the correct offset to store and read the IRQ affinity hint. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-20 20:05:59 +02:00
David S. Miller	1359f25106	mlx5-fixes-2018-11-19 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJb80jFAAoJEEg/ir3gV/o+eUEH/0/bUhiaxaN88VFZXA24WlU2 v5jaeMg1P3ZuWYNS/AxGWedYWYr7oOgin79/4LIBKJMFdTdLGS2Oa0cTi0ycSNpx 0WMhVchClX3/UNTn6u/G23VsHeYav/AP+uaqnI+Tbtv4o8tc9ecpFgCXo4pMJaW/ osUOJC3MmU68yRAo91uwhhSHWMsl+8tceMdoS9N1Q6faM2v9XcJrMvKHMNpDvPpf VXvC18fTX/qi/loJ6oKepTXn6lNVKDIgKmCjjudgwfRHPiw9R5uMlunaOoHO1wnH T8ttdg/Dn94BLOOmHlOoW9TT7LVFLk7w2y5NoefUY0wUKSjByMxzyzqmdI+qXZU= =SSaw -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2018-11-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-11-19 The following fixes are for mlx5 core and netdev driver. For -stable v4.16 bc7fda7d4637 ('net/mlx5e: IPoIB, Reset QP after channels are closed') For -stable v4.17 36917a270395 ('net/mlx5: IPSec, Fix the SA context hash key') For -stable v4.18 6492a432be3a ('net/mlx5e: Always use the match level enum when parsing TC rule match') c3f81be236b1 ('net/mlx5e: Removed unnecessary warnings in FEC caps query') c5ce2e736b64 ('net/mlx5e: Fix selftest for small MTUs') For -stable v4.19 effcd896b25e ('net/mlx5e: Adjust to max number of channles when re-attaching') 394cbc5acd68 ('net/mlx5e: RX, verify received packet size in Linear Striding RQ') 447cbb3613c8 ('net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded') c223c1574612 ('net/mlx5e: Claim TC hw offloads support only under a proper build config') Please pull and let me know if there's any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 19:04:33 -08:00
Shay Agroskin	9184e51b5b	net/mlx5e: Fix failing ethtool query on FEC query error If FEC caps query fails when executing 'ethtool <interface>' the whole callback fails unnecessarily, fixed that by replacing the error return code with debug logging only. Fixes: `6cfa946050` ("net/mlx5e: Ethtool driver callback for query/set FEC policy") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 15:33:31 -08:00
Shay Agroskin	64e2833484	net/mlx5e: Removed unnecessary warnings in FEC caps query Querying interface FEC caps with 'ethtool [int]' after link reset throws warning regading link speed. This warning is not needed as there is already an indication in user space that the link is not up. Fixes: `0696d60853` ("net/mlx5e: Receive buffer configuration") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 15:33:31 -08:00
Shay Agroskin	febd72f27c	net/mlx5e: Fix wrong field name in FEC related functions This bug would result in reading wrong FEC capabilities for 10G/40G. Fixes: `2095b26414` ("net/mlx5e: Add port FEC get/set functions") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 15:33:31 -08:00
Shay Agroskin	9cdeaab3b7	net/mlx5e: Fix a bug in turning off FEC policy in unsupported speeds Some speeds don't support turning FEC policy off. In case a requested FEC policy is not supported for a speed (including current speed), its new FEC policy would be: no FEC - if disabling FEC is supported for that speed unchanged - else Fixes: `2095b26414` ("net/mlx5e: Add port FEC get/set functions") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 15:33:31 -08:00
Valentine Fatiev	228c4cd04d	net/mlx5e: Fix selftest for small MTUs Loopback test had fixed packet size, which can be bigger than configured MTU. Shorten the loopback packet size to be bigger than minimal MTU allowed by the device. Text field removed from struct 'mlx5ehdr' as redundant to allow send small packets as minimal allowed MTU. Fixes: `d605d66` ("net/mlx5e: Add support for ethtool self diagnostics test") Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 14:35:04 -08:00
Moshe Shemesh	0073c8f727	net/mlx5e: RX, verify received packet size in Linear Striding RQ In case of striding RQ, we use MPWRQ (Multi Packet WQE RQ), which means that WQE (RX descriptor) can be used for many packets and so the WQE is much bigger than MTU. In virtualization setups where the port mtu can be larger than the vf mtu, if received packet is bigger than MTU, it won't be dropped by HW on too small receive WQE. If we use linear SKB in striding RQ, since each stride has room for mtu size payload and skb info, an oversized packet can lead to crash for crossing allocated page boundary upon the call to build_skb. So driver needs to check packet size and drop it. Introduce new SW rx counter, rx_oversize_pkts_sw_drop, which counts the number of packets dropped by the driver for being too large. As a new field is added to the RQ struct, re-open the channels whenever this field is being used in datapath (i.e., in the case of linear Striding RQ). Fixes: `619a8f2a42` ("net/mlx5e: Use linear SKB in Striding RQ") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 14:35:04 -08:00
Roi Dayan	1392f44bba	net/mlx5e: Apply the correct check for supporting TC esw rules split The mirror and not the output count is the one denoting a split. Fix to condition the offload attempt on the mirror count being > 0 along the firmware to have the related capability. Fixes: `592d365159` ("net/mlx5e: Parse mirroring action for offloaded TC eswitch flows") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Yossi Kuperman <yossiku@mellanox.com> Reviewed-by: Chris Mi <chrism@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 14:35:04 -08:00
Yuval Avnery	a1f240f180	net/mlx5e: Adjust to max number of channles when re-attaching When core driver enters deattach/attach flow after pci reset, Number of logical CPUs may have changed. As a result we need to update the cpu affiliated resource tables. 1. indirect rqt list 2. eq table Reproduction (PowerPC): echo 1000 > /sys/kernel/debug/powerpc/eeh_max_freezes ppc64_cpu --smt=on # Restart driver modprobe -r ... ; modprobe ... # Link up ifconfig ... # Only physical CPUs ppc64_cpu --smt=off # Inject PCI errors so PCI will reset - calling the pci error handler echo 0x8000000000000000 > /sys/kernel/debug/powerpc/<PCI BUS>/err_injct_inboundA Call trace when trying to add non-existing rqs to an indirect rqt: mlx5e_redirect_rqt+0x84/0x260 [mlx5_core] (unreliable) mlx5e_redirect_rqts+0x188/0x190 [mlx5_core] mlx5e_activate_priv_channels+0x488/0x570 [mlx5_core] mlx5e_open_locked+0xbc/0x140 [mlx5_core] mlx5e_open+0x50/0x130 [mlx5_core] mlx5e_nic_enable+0x174/0x1b0 [mlx5_core] mlx5e_attach_netdev+0x154/0x290 [mlx5_core] mlx5e_attach+0x88/0xd0 [mlx5_core] mlx5_attach_device+0x168/0x1e0 [mlx5_core] mlx5_load_one+0x1140/0x1210 [mlx5_core] mlx5_pci_resume+0x6c/0xf0 [mlx5_core] Create cq will fail when trying to use non-existing EQ. Fixes: `89d44f0a6c` ("net/mlx5_core: Add pci error handlers to mlx5_core driver") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 14:35:04 -08:00
Or Gerlitz	83621b7df6	net/mlx5e: Always use the match level enum when parsing TC rule match We get the match level (none, l2, l3, l4) while going over the match dissectors of an offloaded tc rule. When doing this, the match level enum and the not min inline enum values should be used, fix that. This worked accidentally b/c both enums have the same numerical values. Fixes: `d708f90298` ('net/mlx5e: Get the required HW match level while parsing TC flow matches') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 14:35:04 -08:00
Or Gerlitz	077ecd785d	net/mlx5e: Claim TC hw offloads support only under a proper build config Currently, we are only supporting tc hw offloads when the eswitch support is compiled in, but we are not gating the adevertizment of the NETIF_F_HW_TC feature on this config being set. Fix it, and while doing that, also avoid dealing with the feature on ethtool when the config is not set. Fixes: `e8f887ac6a` ('net/mlx5e: Introduce tc offload support') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 14:35:04 -08:00
Or Gerlitz	d3a80bb5a3	net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded For the "all" ethertype we should not care whether the packet has vlans. Besides being wrong, the way we did it caused FW error for rules such as: tc filter add dev eth0 protocol all parent ffff: \ prio 1 flower skip_sw action drop b/c the matching meta-data (outer headers bit in struct mlx5_flow_spec) wasn't set. Fix that by matching on vlan non-existence only if we were also told to match on the ethertype. Fixes: `cee2648762` ('net/mlx5e: Set vlan masks for all offloaded TC rules') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 14:35:04 -08:00
Denis Drozdov	acf3766b36	net/mlx5e: IPoIB, Reset QP after channels are closed The mlx5e channels should be closed before mlx5i_uninit_underlay_qp puts the QP into RST (reset) state during mlx5i_close. Currently QP state incorrectly set to RST before channels got deactivated and closed, since mlx5_post_send request expects QP in RTS (Ready To Send) state. The fix is to keep QP in RTS state until mlx5e channels get closed and to reset QP afterwards. Also this fix is simply correct in order to keep the open/close flow symmetric, i.e mlx5i_init_underlay_qp() is called first thing at open, the correct thing to do is to call mlx5i_uninit_underlay_qp() last thing at close, which is exactly what this patch is doing. Fixes: `dae37456c8` ("net/mlx5: Support for attaching multiple underlay QPs to root flow table") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 14:35:04 -08:00
Raed Salem	f2b18732ee	net/mlx5: IPSec, Fix the SA context hash key The commit "net/mlx5: Refactor accel IPSec code" introduced a bug where asynchronous short time change in hash key value by create/release SA context might happen during an asynchronous hash resize operation this could cause a subsequent remove SA context operation to fail as the key value used during resize is not the same key value used when remove SA context operation is invoked. This commit fixes the bug by defining the SA context hash key such that it includes only fields that never change during the lifetime of the SA context object. Fixes: `d6c4f0298c` ("net/mlx5: Refactor accel IPSec code") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-11-19 14:35:04 -08:00
David S. Miller	f2be6d710d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2018-11-19 10:55:00 -08:00
Shalom Toledo	bae4e10983	mlxsw: spectrum: Expose discard counters via ethtool Expose packets discard counters via ethtool to help with debugging. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-18 19:02:07 -08:00
Aya Levin	a463146e67	net/mlx4: Fix UBSAN warning of signed integer overflow UBSAN: Undefined behavior in drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:626:29 signed integer overflow: 1802201963 + 1802201963 cannot be represented in type 'int' The union of res_reserved and res_port_rsvd[MLX4_MAX_PORTS] monitors granting of reserved resources. The grant operation is calculated and protected, thus both members of the union cannot be negative. Changed type of res_reserved and of res_port_rsvd[MLX4_MAX_PORTS] from signed int to unsigned int, allowing large value. Fixes: `5a0d0a6161` ("mlx4: Structures and init/teardown for VF resource quotas") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-15 16:09:31 -08:00
Tariq Toukan	3ea7e7ea53	net/mlx4_core: Fix uninitialized variable compilation warning Initialize the uid variable to zero to avoid the compilation warning. Fixes: `7a89399ffa` ("net/mlx4: Add mlx4_bitmap zone allocator") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-15 16:09:31 -08:00
Jack Morgenstein	bd85fbc203	net/mlx4_core: Zero out lkey field in SW2HW_MPT fw command When re-registering a user mr, the mpt information for the existing mr when running SRIOV is obtained via the QUERY_MPT fw command. The returned information includes the mpt's lkey. This retrieved mpt information is used to move the mpt back to hardware ownership in the rereg flow (via the SW2HW_MPT fw command when running SRIOV). The fw API spec states that for SW2HW_MPT, the lkey field must be zero. Any ConnectX-3 PF driver which checks for strict spec adherence will return failure for SW2HW_MPT if the lkey field is not zero (although the fw in practice ignores this field for SW2HW_MPT). Thus, in order to conform to the fw API spec, set the lkey field to zero before invoking SW2HW_MPT when running SRIOV. Fixes: `e630664c83` ("mlx4_core: Add helper functions to support MR re-registration") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-15 16:09:30 -08:00
Jiri Pirko	c22291f7cf	mlxsw: spectrum: acl: Implement delta for ERP Allow ERP sharing for multiple mask. Do it by properly implementing delta_create() objagg object. Use the computed delta info for inserting rules in A-TCAM. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-15 14:43:43 -08:00
Jiri Pirko	c293ba3403	mlxsw: spectrum: acl: Push code related to num_ctcam_erps inc/dec into separate helpers Later on the same code is going to be needed for deltas as well. So push the procedures related to increment and decrement of num_ctcam_erps into a separate helpers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-15 14:43:43 -08:00
Jiri Pirko	59600844cf	mlxsw: spectrum: acl: Remove mlxsw_afk_encode() block range args and key/mask check Since two remaining users of mlxsw_afk_encode() do not specify block ranges to work on, remove the args. Also, key/mask is always non-NULL now, so skip the checks. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-15 14:43:43 -08:00
Jiri Pirko	b1ce60e621	mlxsw: spectrum: acl: Don't encode the key again in mlxsw_sp_acl_atcam_12kb_lkey_id_get() No need to do key encoding again in mlxsw_sp_acl_atcam_12kb_lkey_id_get(). Instead of that, introduce a new helper that would just clear unused blocks. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-15 14:43:43 -08:00
Jiri Pirko	3bc6f3858a	mlxsw: core_acl: Change order of args of ops->encode_block() Change order so it is aligned with the usual case where the "write_to" buffer comes as the first arg. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-15 14:43:43 -08:00
Jiri Pirko	d07cd66060	mlxsw: spectrum: acl: Pass key pointer to master_mask_set/clear The device requires that the master mask of each region will be composed from a logical OR between all the unmasked bits in the region. Currently, this is just a logical OR between all the eRPs used in the region, but the next patch is going to introduce delta bits support which need to be taken into account as well. Since the eRP does not include the delta bits, pass the key pointer to mlxsw_sp_acl_erp_master_mask_set/clear instead. Convert key->mask to the bitmap on fly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-15 14:43:43 -08:00
Jiri Pirko	c71abd7d94	mlxsw: spectrum: acl_erp: Convert to use objagg for tracking ERPs Currently the ERPs are tracked internally in a hashtable. Benefit from the newly introduced objagg library and use it to track ERPs. At this point, there is no nesting of objects done, as the delta_create callback always returns -EOPNOTSUPP. On the way, add "mask" into ERP mask get and set functions and struct names. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-15 14:43:43 -08:00
Moni Shoua	90290db766	net/mlx5: Use multi threaded workqueue for page fault handling Page fault events are processed in a workqueue context. Since each QP can have up to two concurrent unrelated page-faults, one for requester and one for responder, page-fault handling can be done in parallel. Achieve this by changing the workqueue to be multi-threaded. The number of threads is the same as the number of command interface channels to avoid command interface bottlenecks. In addition to multi-threads, change the workqueue flags to give it high priority. Stress benchmark shows that before this change 85% of page faults were waiting in queue 8 seconds or more while after the change 98% of page faults were waiting in queue 64 milliseconds or less. The number of threads was chosen as the number of channels to the command interface. Fixes: `d9aaed8387` ("{net,IB}/mlx5: Refactor page fault handling") Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-12 22:21:32 +02:00
Moni Shoua	ef90c5e975	net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state When the device is in internal error state, command interface isn't accessible and the driver decides which commands to fail and which to pass. Move the PAGE_FAULT_RESUME command to the pass list in order to avoid redundant failure messages. Fixes: `89d44f0a6c` ("net/mlx5_core: Add pci error handlers to mlx5_core driver") Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-12 22:21:17 +02:00
Moni Shoua	27e95603f4	net/mlx5: Add interface to hold and release core resources Sometimes upper layers may want to prevent the destruction of a core resource for a period of time while work on that resource is in progress. Add API to support this. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-12 22:20:29 +02:00
Moni Shoua	698114968a	net/mlx5: Release resource on error flow Fix reference counting leakage when the event handler aborts due to an unsupported event for the resource type. Fixes: `a14c2d4bee` ("net/mlx5_core: Warn on unsupported events of QP/RQ/SQ") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-11-12 22:20:13 +02:00
Michał Mirosław	4b17f9fe48	mlx4: use __vlan_hwaccel helpers Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-08 20:45:04 -08:00
Shalom Toledo	96801552f8	mlxsw: spectrum: Fix IP2ME CPU policer configuration The CPU policer used to police packets being trapped via a local route (IP2ME) was incorrectly configured to police based on bytes per second instead of packets per second. Change the policer to police based on packets per second and avoid packet loss under certain circumstances. Fixes: `9148e7cf73` ("mlxsw: spectrum: Add policers for trap groups") Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-03 19:31:42 -07:00
Eric Dumazet	c297344435	net/mlx4_en: use __netdev_tx_sent_queue() doorbell only depends on xmit_more and netif_tx_queue_stopped() Using __netdev_tx_sent_queue() avoids messing with BQL stop flag, and is more generic. This patch increases performance on GSO workload by keeping doorbells to the minimum required. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-03 15:40:01 -07:00
Petr Machata	0fe6402316	mlxsw: spectrum: Set minimum shaper on MC TCs An MC-aware mode was introduced in commit `7b81953066` ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports"). In MC-aware mode, BUM traffic gets a special treatment by being assigned to a separate set of traffic classes 8..15. Pairs of TCs 0 and 8, 1 and 9, etc., are then configured to strictly prioritize the lower-numbered ones. The intention is to prevent BUM traffic from flooding the switch and push out all UC traffic, which would otherwise happen, and instead give UC traffic precedence. However strictly prioritizing UC traffic has the effect that UC overload pushes out all BUM traffic, such as legitimate ARP queries. These packets are kept in queues for a while, but under sustained UC overload, their lifetime eventually expires and these packets are dropped. That is detrimental to network performance as well. Therefore configure the MC TCs (8..15) with minimum shaper of 200Mbps (a minimum permitted value) to allow a trickle of necessary control traffic to get through. Fixes: `7b81953066` ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-31 12:56:58 -07:00
Petr Machata	8b931821aa	mlxsw: reg: QEEC: Add minimum shaper fields Add QEEC.mise (minimum shaper enable) and QEEC.min_shaper_rate to enable configuration of minimum shaper. Increase the QEEC length to 0x20 as well: that's the length that the register has had for a long time now, but with the configurations that mlxsw typically exercises, the firmware tolerated 0x1C-sized packets. With mise=true however, FW rejects packets unless they have the full required length. Fixes: `b9b7cee405` ("mlxsw: reg: Add QoS ETS Element Configuration register") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-31 12:56:58 -07:00
Eric Dumazet	d48051c5b8	net/mlx5e: fix csum adjustments caused by RXFCS As shown by Dmitris, we need to use csum_block_add() instead of csum_add() when adding the FCS contribution to skb csum. Before 4.18 (more exactly commit `88078d98d1` "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"), the whole skb csum was thrown away, so RXFCS changes were ignored. Then before commit `d55bef5059` ("net: fix pskb_trim_rcsum_slow() with odd trim offset") both mlx5 and pskb_trim_rcsum_slow() bugs were canceling each other. Now we fixed pskb_trim_rcsum_slow() we need to fix mlx5. Note that this patch also rewrites mlx5e_get_fcs() to : - Use skb_header_pointer() instead of reinventing it. - Use __get_unaligned_cpu32() to avoid possible non aligned accesses as Dmitris pointed out. Fixes: `902a545904` ("net/mlx5e: When RXFCS is set, add FCS data into checksum calculation") Reported-by: Paweł Staszewski <pstaszewski@itcare.pl> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eran Ben Elisha <eranbe@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Dimitris Michailidis <dmichail@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Paweł Staszewski <pstaszewski@itcare.pl> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Tested-By: Maria Pasechnik <mariap@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-31 12:40:52 -07:00
Eric Dumazet	3aa8029e1a	net/mlx4_en: add a missing <net/ip.h> include Abdul Haleem reported a build error on ppc : drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: `struct iphdr` declared inside parameter list [enabled by default] struct iphdr *iph) ^ drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] drivers/net/ethernet/mellanox/mlx4/en_rx.c: In function get_fixed_ipv4_csum: drivers/net/ethernet/mellanox/mlx4/en_rx.c:586:20: error: dereferencing pointer to incomplete type __u8 ipproto = iph->protocol; ^ Fixes: `55469bc6b5` ("drivers: net: remove <net/busy_poll.h> inclusion when not needed") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-30 11:18:59 -07:00
Shalom Toledo	a22712a962	mlxsw: core: Fix devlink unregister flow After a failed reload, the driver is still registered to devlink, its devlink instance is still allocated and the 'reload_fail' flag is set. Then, in the next reload try, the driver's allocated devlink instance will be freed without unregistering from devlink and its components (e.g, resources). This scenario can cause a use-after-free if the user tries to execute command via devlink user-space tool. Fix by not freeing the devlink instance during reload (failed or not). Fixes: `24cc68ad6c` ("mlxsw: core: Add support for reload") Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-29 20:48:00 -07:00
Petr Machata	ad0b9d9418	mlxsw: spectrum_switchdev: Don't ignore deletions of learned MACs Demands to remove FDB entries should be honored even if the FDB entry in question was originally learned, and not added by the user. Therefore ignore the added_by_user datum for SWITCHDEV_FDB_DEL_TO_DEVICE. Fixes: `816a3bed95` ("switchdev: Add fdb.added_by_user to switchdev notifications") Signed-off-by: Petr Machata <petrm@mellanox.com> Suggested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-29 20:48:00 -07:00
Eric Dumazet	55469bc6b5	drivers: net: remove <net/busy_poll.h> inclusion when not needed Drivers using generic NAPI interface no longer need to include <net/busy_poll.h>, since busy polling was moved to core networking stack long ago. See commit `79e7fff47b` ("net: remove support for per driver ndo_busy_poll()") for reference. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-25 16:20:02 -07:00
Dan Carpenter	cc3a4cd3f0	net/mlx5: Allocate enough space for the FDB sub-namespaces FDB_MAX_CHAIN is three. We wanted to allocate enough memory to hold four structs but there are missing parentheses so we only allocate enough memory for three structs and the first byte of the fourth one. Fixes: `328edb499f` ("net/mlx5: Split FDB fast path prio to multiple namespaces") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-22 19:46:34 -07:00
David S. Miller	2e2d6f0342	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-19 11:03:06 -07:00
Shay Agroskin	4cb4e98e5b	net/mlx5e: Added 'raw_errors_laneX' fields to ethtool statistics These are counters for errors received on rx side, such as FEC errors. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-18 13:32:57 -07:00
Shay Agroskin	6cfa946050	net/mlx5e: Ethtool driver callback for query/set FEC policy Driver callback function for 'ethtool --show-fec', 'ethtool --set-fec' commands. The query function returns active and configured FEC policy for current link speed. The set function sets FEC policy for all supported link speeds. 1) If current link speed doesn't support requested FEC policy, the function fails. 2) If a different link speed doesn't support requested FEC policy, FEC capbilities for this speed are turned off. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-18 13:13:31 -07:00
Shay Agroskin	2095b26414	net/mlx5e: Add port FEC get/set functions Added functions to query and set link FEC policy. To get/set FEC capabilities in PPLM reg we need to query current link speed. 'mlx5_get_fec_speed_field' queries current link speed and returns correct field offset. FEC Query's return value is divided into 'active FEC policy', which is the FEC policy used by the link, and 'configured FEC policy', which is the FEC policy requested by the user. The two values may differ if: 1) FEC policy was configured to 'auto', in which case the active FEC policy would be the default FEC policy for current link speed. 2) FEC policy was changed, but no link reset is performed. In which case, the active FEC policy would become the configured one after a link reset. FEC set function sets FEC policy for all link speeds and perform link reset. 1) If current link speed doesn't support requested FEC policy, the function fails. 2) If a different link speed doesn't support requested FEC policy, FEC capbilities for this speed are turned off and a warning message is printed. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-18 13:13:31 -07:00
Vlad Buslov	2a4c429802	net/mlx5: Remove counter from idr after removing it from list Fs_counters list can temporary become unsorted when new counters are created/deleted concurrently. Idr is used to quickly lookup position to insert new counter in logarithmic time. However, if new flows are concurrently inserted during time window when flows with adjacent ids are already removed from idr but are still present in counters list, mlx5_fc_stats_work() observes counters list in inconsistent state, which results following warning: [ 1839.561955] mlx5_core 0000:81:00.0: mlx5_cmd_fc_bulk_get:587:(pid 729): Flow counter id (0x102d5) out of range (0x1c0a8..0x1c10b). Counter ignored. Move idr_remove() call to be executed synchronously with counter deletion from list. Extract this code to mlx5_fc_stats_remove() helper function that is called by workqueue job handler mlx5_fc_stats_work(). Fixes: `12d6066c3b` ("net/mlx5: Add flow counters idr") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>	2018-10-18 13:13:31 -07:00
Vlad Buslov	fd33071303	net/mlx5: Take fs_counters dellist before addlist In fs_counters elements from both addlist and dellist are removed by mlx5_fc_stats_work() without any locking. This introduces race condition when batch of new rules is created and then immediately deleted (for example, when error occurred during flow creation). In such case some of the rules might be in dellist, but not in addlist when mlx5_fc_stats_work() is executed concurrently with tc, which will result rule deletion and use-after-free on next iteration because deleted rules are still in addlist. Always take dellist first to guarantee that rules can only be deleted after they were removed from addlist. Fixes: `6e5e228391` ("net/mlx5: Add new list to store deleted flow counters") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reported-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>	2018-10-18 13:13:31 -07:00
Tariq Toukan	4972e6fa3a	net/mlx5: Refactor fragmented buffer struct fields and init flow Take struct mlx5_frag_buf out of mlx5_frag_buf_ctrl, as it is not needed to manage and control the datapath of the fragmented buffers API. struct mlx5_frag_buf contains control info to manage the allocation and de-allocation of the fragmented buffer. Its fields are not relevant for datapath, so here I take them out of the struct mlx5_frag_buf_ctrl, except for the fragments array itself. In addition, modified mlx5_fill_fbc to initialise the frags pointers as well. This implies that the buffer must be allocated before the function is called. A set of type-specific *_get_byte_size() functions are replaced by a generic one. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-18 13:13:31 -07:00
David S. Miller	99e9acd85c	mlx5-updates-2018-10-17 ======================================================================== From Or Gerlitz <ogerlitz@mellanox.com>: This series from Paul adds support to mlx5 e-switch tc offloading of multiple priorities and chains. This is made of four building blocks (along with few minor driver refactors): [1] Split FDB fast path prio to multiple namespaces Currently the FDB name-space contains two priorities, fast path (p0) and slow path (p1). The slow path contains the per representor SQ send-to-vport TX rule and the match-all RX miss rule. As a pre-step to support multi-chains and priorities, we split the FDB fast path to multiple namespaces (sub namespaces), each with multiple priorities. [2] E-Switch chains and priorities A chain is a group of priorities. We use the fdb parallel sub-namespaces to implement chains, and a flow table for each priority in them. Because these namespaces are parallel and in series to the slow path fdb, the chains aren't connected to each other (but to the slow path), and one must use a explicit goto action to reach a different chain. Flow tables for the priorities are created on demand and destroyed once not used. [3] Add a no-append flow insertion mode, use it for TC offloads Enhance the driver fs core, such that if a no-append flag is set by the caller, we add a new FTE, instead of appending the actions of the inserted rule when the same match already exists. For encap rules, we defer the HW offloading till we have a valid neighbor. This can result in the packet hitting a lower priority rule in the HW DP. Use the no-append API to push these packets to the slow path FDB table, so they go to the TC kernel DP as done before priorities where supported. [4] Offloading tc priorities and chains for eswitch flows Using [1], [2] and [3] above we add the support for offloading both chains and priorities. To get to a new chain, use the tc goto action. We support a fixed prio range 1-16, and chains 0-3. ============================================================================= -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbx6k1AAoJEEg/ir3gV/o+l40H/14rNaV27vefjuALgOvNX4DY iSI5UFv9ILnAemcD2xkVfJeGolwdzoRhCXJ5oyCylCPnP4tb9zgDgwu9V/WmIRG+ DOaPLu+0V6jqfEGO5sXJPMhJNUR8WWAjfu66htJ0Nc1HV2OM5eYrcvjaYCfW4Egr QFWGyq4sPyYcpbb7wURbhmkfs8Vwxcj9c2cZIfXo3VJsKxULqU9Mj5hZnirI1OAy UhjLssb/8wfHmwNcqETI9ae7O+vPDMLkxdQvpviEBI+HJ7vZ6op2X4lVEsn/Bx2E /KrHGQObkwim8thTOYkQeJtqptWbiRvkpNnwryUV1fwjWPl6X1r3bXH7RdeRwCg= =aFCc -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2018-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2018-10-17 ======================================================================== From Or Gerlitz <ogerlitz@mellanox.com>: This series from Paul adds support to mlx5 e-switch tc offloading of multiple priorities and chains. This is made of four building blocks (along with few minor driver refactors): [1] Split FDB fast path prio to multiple namespaces Currently the FDB name-space contains two priorities, fast path (p0) and slow path (p1). The slow path contains the per representor SQ send-to-vport TX rule and the match-all RX miss rule. As a pre-step to support multi-chains and priorities, we split the FDB fast path to multiple namespaces (sub namespaces), each with multiple priorities. [2] E-Switch chains and priorities A chain is a group of priorities. We use the fdb parallel sub-namespaces to implement chains, and a flow table for each priority in them. Because these namespaces are parallel and in series to the slow path fdb, the chains aren't connected to each other (but to the slow path), and one must use a explicit goto action to reach a different chain. Flow tables for the priorities are created on demand and destroyed once not used. [3] Add a no-append flow insertion mode, use it for TC offloads Enhance the driver fs core, such that if a no-append flag is set by the caller, we add a new FTE, instead of appending the actions of the inserted rule when the same match already exists. For encap rules, we defer the HW offloading till we have a valid neighbor. This can result in the packet hitting a lower priority rule in the HW DP. Use the no-append API to push these packets to the slow path FDB table, so they go to the TC kernel DP as done before priorities where supported. [4] Offloading tc priorities and chains for eswitch flows Using [1], [2] and [3] above we add the support for offloading both chains and priorities. To get to a new chain, use the tc goto action. We support a fixed prio range 1-16, and chains 0-3. ============================================================================= Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-18 10:25:37 -07:00
Ido Schimmel	9b3bc7db75	mlxsw: core: Fix use-after-free when flashing firmware during init When the switch driver (e.g., mlxsw_spectrum) determines it needs to flash a new firmware version it resets the ASIC after the flashing process. The bus driver (e.g., mlxsw_pci) then registers itself again with mlxsw_core which means (among other things) that the device registers itself again with the hwmon subsystem again. Since the device was registered with the hwmon subsystem using devm_hwmon_device_register_with_groups(), then the old hwmon device (registered before the flashing) was never unregistered and was referencing stale data, resulting in a use-after free. Fix by removing reliance on device managed APIs in mlxsw_hwmon_init(). Fixes: `c86d62cc41` ("mlxsw: spectrum: Reset FW after flash") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Tested-by: Alexander Petrovskiy <alexpe@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 22:25:45 -07:00
Ido Schimmel	1231e04f5b	mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation In the device, VxLAN encapsulation takes place in the FDB table where certain {MAC, FID} entries are programmed with an underlay unicast IP. MAC addresses that are not programmed in the FDB are flooded to the relevant local ports and also to a list of underlay unicast IPs that are programmed using the all zeros MAC address in the VxLAN driver. One difference between the hardware and software data paths is the fact that in the software data path there are two FDB lookups prior to the encapsulation of the packet. First in the bridge's FDB table using {MAC, VID} and another in the VxLAN's FDB table using {MAC, VNI}. Therefore, when a new VxLAN FDB entry is notified, it is only programmed to the device if there is a corresponding entry in the bridge's FDB table. Similarly, when a new bridge FDB entry pointing to the VxLAN device is notified, it is only programmed to the device if there is a corresponding entry in the VxLAN's FDB table. Note that the above scheme will result in a discrepancy between both data paths if only one FDB table is populated in the software data path. For example, if only the bridge's FDB is populated with an entry pointing to a VxLAN device, then a packet hitting the entry will only be flooded by the kernel to remote VTEPs whereas the device will also flood the packets to other local ports member in the VLAN. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:08 -07:00
Ido Schimmel	1c30d1836a	mlxsw: spectrum: Enable VxLAN enslavement to bridges Enslavement of VxLAN devices to offloaded bridges was never forbidden by mlxsw, but this patch makes sure the required configuration is performed in order to allow VxLAN encapsulation and decapsulation to take place in the device. The patch handles both the case where a VxLAN device is enslaved to an already offloaded bridge and the case where the first mlxsw port is enslaved to a bridge that already has VxLAN device configured. Invalid configurations are sanitized and an error string is returned via extack. Since encapsulation and decapsulation do not occur when the VxLAN device is down, the driver makes sure to enable / disable these functionalities based on NETDEV_PRE_UP and NETDEV_DOWN events. Note that NETDEV_PRE_UP is used in favor of NETDEV_UP, as the former allows to veto the operation, if necessary. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:08 -07:00
Ido Schimmel	e9ba0fbc7d	bridge: switchdev: Allow clearing FDB entry offload indication Currently, an FDB entry only ceases being offloaded when it is deleted. This changes with VxLAN encapsulation. Devices capable of performing VxLAN encapsulation usually have only one FDB table, unlike the software data path which has two - one in the bridge driver and another in the VxLAN driver. Therefore, bridge FDB entries pointing to a VxLAN device are only offloaded if there is a corresponding entry in the VxLAN FDB. Allow clearing the offload indication in case the corresponding entry was deleted from the VxLAN FDB. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:08 -07:00
Ido Schimmel	4cf178d7b9	mlxsw: spectrum_router: Configure matching local routes for NVE decap When a local route that matches the source IP of an offloaded NVE tunnel is notified, the driver needs to program it to perform NVE decapsulation instead of merely trapping packets to the CPU. This patch complements "mlxsw: spectrum_router: Enable local routes promotion to perform NVE decap" where existing local routes were promoted to perform NVE decapsulation. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:07 -07:00
Ido Schimmel	498790befb	mlxsw: spectrum_fid: Clear NVE configuration when destroying 802.1D FIDs 802.1D FIDs are used to represent VLAN-unaware bridges and currently this is the only type of FID that supports NVE configuration. Since the NVE tunnel device does not take a reference on the FID, it is possible for the FID to be destroyed when it still has NVE configuration. Therefore, when destroying the FID make sure to disable its NVE configuration. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:07 -07:00
Ido Schimmel	3695291154	mlxsw: spectrum_nve: Implement VxLAN operations The common NVE core expects each encapsulation type to implement a certain set of operations that are specific to this type and the currently used ASIC. These operations include things such as the ability to determine whether a certain NVE configuration can be offloaded and ASIC-specific initialization for this type. Implement these operations for VxLAN on the Spectrum ASIC. Spectrum-2 support will be added by a future patchset. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:07 -07:00
Ido Schimmel	6e6030bd54	mlxsw: spectrum_nve: Implement common NVE core The Spectrum ASIC supports different types of NVE encapsulations (e.g., VxLAN, NVGRE) with more types to be supported by future ASICs. Despite being different, all these encapsulations share some common functionality such as the enablement of NVE encapsulation on a given filtering identifier (FID) and the addition of remote VTEPs to the linked-list of VTEPs that traffic should be flooded to. Implement this common core and allow different ASICs to register different operations for different encapsulation types. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:07 -07:00
Ido Schimmel	88782f75f9	mlxsw: spectrum_router: Allow querying VR ID based on table ID In the device, different VRFs (routing tables) are represented using different virtual routers (VRs) and thus the kernel's table IDs are mapped to VR IDs. Allow internal users of the IP router to query the VR ID based on a kernel table ID. This is needed - for example - when configuring the underlay VR where VxLAN encapsulated packets will undergo an L3 lookup. In this case, the kernel's table ID is derived from the VxLAN device's configuration. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:07 -07:00
Ido Schimmel	0c69e0fcd3	mlxsw: spectrum_router: Enable local routes promotion to perform NVE decap When an NVE tunnel with an IP underlay (e.g., VxLAN) is configured the local route to the tunnel's source IP needs to be promoted to perform NVE decapsulation. Expose an API in the unicast IP router to promote / demote local routes. The case where a local route is configured after the creation of the NVE tunnel will be handled in a subsequent patch in the set. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:07 -07:00
Ido Schimmel	564c6d727a	mlxsw: spectrum_fid: Add APIs to lookup FID without creating it Current APIs only allow looking for a FID and creating it in case it does not exist. With VxLAN, in case the bridge to which the VxLAN device was enslaved does not already have a corresponding FID, then it means that something went wrong that we need to be aware of. Add an API to look up a FID, but without creating it in order to catch above-mentioned situation. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:07 -07:00
Ido Schimmel	d3d19d4b8c	mlxsw: spectrum_fid: Allow setting and clearing NVE properties on FID In the device, the VNI and the list of remote VTEPs a packet should be flooded to is a property of the filtering identifier (FID). During encapsulation, the VNI is taken from the FID the packet was classified to. During decapsulation, the overlay packet is injected into a bridge and classified to a FID based on the VNI it came with. Allow NVE configuration for a FID. Currently, this is only supported with 802.1D FIDs which are used for VLAN-unaware bridges. However, NVE configuration is going to be supported with 802.1Q FIDs which is why the related fields are placed in the common FID struct. Since the device requires a 1:1 mapping between FID and VNI, the driver maintains a hashtable keyed by VNI and checks if the VNI is already associated with an existing FID. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:07 -07:00
Paul Blakey	bf07aa730a	net/mlx5e: Support offloading tc priorities and chains for eswitch flows Currently we fail when user specify a non-zero chain, this patch adds the support for it and tc priorities. To get to a new chain, use the tc goto action. Currently we support a fixed prio range 1-16, and chain range 0-3. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:20:49 -07:00
Paul Blakey	5dbe906ff1	net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available When adding a vxlan tc rule, and a neighbour isn't available, we don't insert any rule to hardware. Once we enable offloading flows with multiple priorities, a packet that should have matched this rule will continue in hardware pipeline and might match a wrong one. This is unlike in tc software path where it will be matched and forwarded to the vxlan device (which will cause a ARP lookup eventually) and stop processing further tc filters. To address that, when when a neighbour isn't available (EAGAIN from attach_encap), or gets deleted, change the original action to be a forward to slow path instead. Neighbour update will restore the original action once the neighbour becomes available. This will be done atomically so at any given time we will have a the correct match. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:20:49 -07:00
Paul Blakey	c92a0b9457	net/mlx5: E-Switch, Enable setting goto slow path chain action A pre-step for the tc offloads code to use this when a neigh is not available for encap rules. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:18:50 -07:00
Or Gerlitz	6d2a3ed011	net/mlx5e: Avoid duplicated code for tc offloads add/del fdb rule The code for adding/deleting fdb flow is repeated when user-space does flow add/del and when we add/del from the neigh update path - unify them to avoid the duplication. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:18:50 -07:00
Paul Blakey	42f7ad6760	net/mlx5e: For TC offloads, always add new flow instead of appending the actions When replacing a tc flower rule, flower first requests to add the new rule (new action), then deletes the old one. But currently when asked to add a new tc flower flow, we append the actions (and counters to it). This can result in a fte with two flow counters or conflicting actions (drop and encap action) which firmware complains/errs about and isn't achieving what the user aimed for. Instead, insert the flow using the new no-append flag which will add a new HW rule, the old flow and rule will be deleted later by flower Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:18:50 -07:00
Paul Blakey	d5634fee24	net/mlx5: Add a no-append flow insertion mode If no-append flag is set, we will add a new FTE, instead of appending the actions of the inserted rule when the same match already exists. While here, move the has_flow_tag boolean indicator to be a flag too. This patch doesn't change any functionality. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:18:50 -07:00
Paul Blakey	e52c280240	net/mlx5: E-Switch, Add chains and priorities A chain is a group of priorities, so use the fdb parallel sub namespaces to implement chains, and a flow table for each priority in them. Because these namespaces are parallel and in series to the slow path fdb, the chains aren't connected to one another (but to the slow path), and one must use a explicit goto action to reach a different chain. Flow tables for the priorities will be created on demand and destroyed once not used. The Firmware has four pools of tables for sizes S/XS/M/L (4k, 64k, 1m, 4m). We maintain ghost copies of the pools occupancy. When a new table is to be created, we scan the pools from large to small and find the 1st table size which can be now created. When a table is destroyed, we update the relevant pool. Multi chain/prio isn't enabled yet by this patch, for now all flows will use the default chain 0, and prio 1. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:18:50 -07:00
Or Gerlitz	482650069a	net/mlx5: E-Switch, Have explicit API to delete fwd rules Be symmetric with the e-switch API to add rules which has a specific function to add fwd rules which are used as part of vport mirroring. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:18:50 -07:00
Paul Blakey	328edb499f	net/mlx5: Split FDB fast path prio to multiple namespaces Towards supporting multi-chains and priorities, split the FDB fast path to multiple namespaces (sub namespaces), each with multiple priorities. This patch adds a new flow steering type, FS_TYPE_PRIO_CHAINS, which is like current FS_TYPE_PRIO, but may contain only namespaces, and those will be in parallel to one another in terms of managing of the flow tables connections inside them. Meaning, while searching for the next or previous flow table to connect for a new table inside such namespace we skip the parallel namespaces in the same level under the FS_TYPE_PRIO_CHAINS prio we originated from. We use this new type for splitting the fast path prio into multiple parallel namespaces, each containing normal prios. The prios inside them (and their tables) will be connected to one another, but not from one parallel namespace to another, instead the last prio in each namespace will be connected to the next prio in the containing FDB namespace, which is the slow path prio. Signed-off-by: Paul Blakey <paulb@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:18:16 -07:00
Roi Dayan	a88780a949	net/mlx5e: Split TC add rule path for nic vs e-switch Move to have clear separation on the code path to add nic vs e-switch flows. While here we break the code that deals with adding offloaded TC tool to few smaller stages, each on helper function. Besides getting us simpler and readable code, these are pre-steps for being able to have two HW flows serving one SW TC flow for some e-switch use cases. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:15:48 -07:00
Rabie Loulou	c83954abb2	net/mlx5e: Change return type of tc add flow functions Refactor the flow add utility functions to return err code instead of rule pointers. This will allow for simpler logic when one tc rule is duplicated to two HW rules in downstream patches. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:15:48 -07:00
Mark Bloch	171c7625be	net/mlx5: Use flow counter IDs and not the wrapping cache object Currently, when a flow rule is created using the FS core layer, the caller has to pass the entire flow counter object and not just the counter HW handle (ID). This requires both the FS core and the caller to have knowledge about the inner implementation of the FS layer flow counters cache and limits the possible users. Move to use the counter ID across the place when dealing with flows. Doing this decoupling, now can we privatize the inner implementation of the flow counters. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:15:48 -07:00
Mark Bloch	b8aee82250	net/mlx5: E-Switch, Get counters for offloaded flows from callers There's no real reason for the e-switch logic to manage the creation of counters for offloaded flows. The API already has the directive for the caller to denote they want to attach a counter to the created flow. As such, we go and move the management of flow counters to the mlx5e tc offload logic. This also lets us remove an inelegant interface where the FS layer had to provide a way to retrieve a counter from a flow rule. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:15:48 -07:00
Saeed Mahameed	186daf0c20	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next mlx5 updates for both net-next and rdma-next * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: (21 commits) net/mlx5: Expose DC scatter to CQE capability bit net/mlx5: Update mlx5_ifc with DEVX UID bits net/mlx5: Set uid as part of DCT commands net/mlx5: Set uid as part of SRQ commands net/mlx5: Set uid as part of SQ commands net/mlx5: Set uid as part of RQ commands net/mlx5: Set uid as part of QP commands net/mlx5: Set uid as part of CQ commands net/mlx5: Rename incorrect naming in IFC file net/mlx5: Export packet reformat alloc/dealloc functions net/mlx5: Pass a namespace for packet reformat ID allocation net/mlx5: Expose new packet reformat capabilities {net, RDMA}/mlx5: Rename encap to reformat packet net/mlx5: Move header encap type to IFC header file net/mlx5: Break encap/decap into two separated flow table creation flags net/mlx5: Add support for more namespaces when allocating modify header net/mlx5: Export modify header alloc/dealloc functions net/mlx5: Add proper NIC TX steering flow tables support net/mlx5: Cleanup flow namespace getter switch logic net/mlx5: Add memic command opcode to command checker ... Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-17 14:13:36 -07:00
David S. Miller	d0f068e572	mlx5-fixes-2018-10-10 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbvqcMAAoJEEg/ir3gV/o+eFsH/2TbJH+i1BuGMVwCB8o+U1Rz C01pJmR7Lb7WwQZ8ZKTOqQkS7BkGX1hNGyIlc4i6ZnP+4gsVJAbP6LKPjTvyD7e6 TNb8bvxTUCOovknrevKkGba8tzoTTsC4wwwbHLGHd1hkKSY1P5hXg8R7vpear+n6 /PFJwzpIXDAa8AHqeORCNYj7MneUm3kaahcmSOxOhvDbRx3UG9cgy7tEhPjZbRn5 jPFsxFCSPcGedtI+g8bzodmpneTcu1KF6QCunrl2bGt5EzgDrbaw1UUoctxD2CJR Ch45W807EvBJoFiJXXCNf9N+p5020F/Q+mTmK7khPirUjdtoLdcT9Goswpjfbtk= =vJIA -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2018-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-10-10 This pull request includes some fixes to mlx5 driver, Please pull and let me know if there's any problem. For -stable v4.11: ('net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type') For -stable v4.17: ('net/mlx5: Fix memory leak when setting fpga ipsec caps') For -stable v4.18: ('net/mlx5: WQ, fixes for fragmented WQ buffers API') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-15 21:51:28 -07:00
David S. Miller	1986647c2f	mlx5e-updates-2018-10-10 IPoIB netlink support and mlx5e pre-allocated netdevice initialization IP link was broken due to the changes in IPoIB for the rdma_netdev support after commit `cd565b4b51` ("IB/IPoIB: Support acceleration options callbacks"). This patchset fixes IPoIB pkey creation and removal using rtnetlink by adding support in both IPoIB ULP layer and mlx5 layer: From Jason and Denis: 1) Introduces changes in the RDMA netdev code in order to allow allocation of the netdev to be done by the rtnl netdev code. 2) Reworks IPoIB initialization to use the two step rdma_netdev creation. From Feras and Saeed, mlx5e netdev layer refactoring to allow accepting pre-allocated netdevs: 3) Adds support to initialize/cleanup netdevs that are not created by mlx5 driver. 4) Change mlx5e netdevice layer to accept the pre-allocated netdevice queue number. 5) Initialize mlx5e generic structures in one place to be used for all netdevs types NIC/representors/IPoIB (both mlx5 allocated and pre-allocted). -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbvqHhAAoJEEg/ir3gV/o+TRQH+wYWgvi5AbqYlGpT8xSho8Dg 5tuFE9AbYDGLLWptZgtYXsBsTaltGVqKnwWy//dAuuSI7LvqtjeKqq0G1JKau70L /49+vv8NTQ6vov2yY95yzzEo5B1/zVcYEl9dmJl4y6Xlwc+YIteewReVqRj+tucR xO7ghLWc1o4Pl7gWBAcOULyOsZc+CtG8ElwEaBROXTtYRpsR7FKn7vN+WdWZ2VOG MjtCUaG0vZlK1vaF76J3P9bL2V3y6o6i6S8RZwg1kPxO4jnrzyGrkxWMOX6f32KB JAUcLVlJW5wckcRAKfTwLxsFMrc9vLBPHyj4XneXVWGKh8/dGUGY5gdGIJV5Jlc= =m0tM -----END PGP SIGNATURE----- Merge tag 'mlx5e-updates-2018-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-10-10 IPoIB netlink support and mlx5e pre-allocated netdevice initialization IP link was broken due to the changes in IPoIB for the rdma_netdev support after commit `cd565b4b51` ("IB/IPoIB: Support acceleration options callbacks"). This patchset fixes IPoIB pkey creation and removal using rtnetlink by adding support in both IPoIB ULP layer and mlx5 layer: From Jason and Denis: 1) Introduces changes in the RDMA netdev code in order to allow allocation of the netdev to be done by the rtnl netdev code. 2) Reworks IPoIB initialization to use the two step rdma_netdev creation. From Feras and Saeed, mlx5e netdev layer refactoring to allow accepting pre-allocated netdevs: 3) Adds support to initialize/cleanup netdevs that are not created by mlx5 driver. 4) Change mlx5e netdevice layer to accept the pre-allocated netdevice queue number. 5) Initialize mlx5e generic structures in one place to be used for all netdevs types NIC/representors/IPoIB (both mlx5 allocated and pre-allocted). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-15 21:49:56 -07:00
David S. Miller	d864991b22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts were easy to resolve using immediate context mostly, except the cls_u32.c one where I simply too the entire HEAD chunk. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-12 21:38:46 -07:00
Moshe Shemesh	2645060834	net/mlx4_core: Fix warnings during boot on driverinit param set failures During boot, mlx4_core sets the driverinit configuration parameters and updates the devlink module on the initial values calling devlink_param_driverinit_value_set(). If devlink_param_driverinit_value_set() returns an error mlx4_core reports kernel module warning. This caused false alarm during boot in case kernel was compiled with CONFIG_NET_DEVLINK off. Fix by removing warning reported in case devlink_param_driverinit_value_set() fails. This actually makes the function mlx4_devlink_set_init_value() redundant to using directly devlink_param_driverinit_value_set() and so removed. It fixes the following kernel trace: mlx4_core 0000:00:06.0: devlink set parameter 0 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 1 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 4 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 5 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 3 value failed (err = -95) Fixes: `bd1b51dc66` ("mlx4: Add mlx4 initial parameters table and register it") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:24:35 -07:00
Ido Schimmel	b02597d513	mlxsw: spectrum: Add NVE packet traps The DECAP_ECN0 trap will be used to trap packets where the overlay packet is marked with Non-ECT, but the underlay packet is marked with either ECT(0), ECT(1) or CE. When trapped, such packets will be counted as errors by the VxLAN driver and thus provide better visibility. The NVE_ENCAP_ARP trap will be used to trap ARP packets undergoing NVE encapsulation. This is needed in order to support E-VPN ARP suppression, where the Linux bridge does not flood ARP packets through tunnel ports in case it can answer the ARP request itself. Note that all the packets trapped via these traps are marked with 'offload_fwd_mark', so as to not be re-flooded by the Linux bridge through the ASIC ports. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:23 -07:00
Ido Schimmel	2bd414aef6	mlxsw: resources: Add NVE resources Add the following resources to be used by the NVE code: * Number of IPv4 underlay destination IPs in a single TNUMT record * Number of IPv6 underlay destination IPs in a single TNUMT record Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:23 -07:00
Ido Schimmel	27f68c0850	mlxsw: reg: Add Monitoring Parsing State Register This register is used for setting up the parsing for hash, policy-engine and routing. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:23 -07:00
Ido Schimmel	0933781f11	mlxsw: reg: Add definition of unicast tunnel record for SFD register Will be used to program the device with FDB records pointing to a NVE tunnel. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:23 -07:00
Ido Schimmel	8efcf6bb48	mlxsw: reg: Add Tunneling NVE QoS Default Register The TNQDR register configures the default QoS settings for NVE encapsulation. It will be used to set the default DSCP of each port to 0, so that when DSCP is set to inherit and the overlay packet does not have an IP header the outer DSCP will be set to 0, in accordance with the software data path. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:23 -07:00
Ido Schimmel	fd6db27cac	mlxsw: reg: Add Tunneling NVE QoS Configuration Register The register configures how QoS is set in Encapsulation into the underlay network. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:23 -07:00
Ido Schimmel	a77d5f0bde	mlxsw: reg: Add Tunneling NVE Decapsulation ECN Mapping Register This register configures the actions that are done during NVE decapsulation based on the ECN bits. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:23 -07:00
Ido Schimmel	4a8d1860ed	mlxsw: reg: Add Tunneling NVE Encapsulation ECN Mapping Register This register performs mapping from overlay ECN to underlay ECN during NVE encapsulation. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:23 -07:00
Ido Schimmel	c723d19fad	mlxsw: reg: Add Tunneling NVE Underlay Multicast Table Register This register builds the linked list of underlay destination IPs used for BUM traffic on the overlay. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:23 -07:00
Ido Schimmel	50e6eb2a63	mlxsw: reg: Add Tunnel Port Configuration Register This register enables / disables learning on different types of tunnel ports (e.g., NVE, VPLS). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:23 -07:00
Ido Schimmel	710dd1a0ec	mlxsw: reg: Add Tunneling NVE General Configuration Register This register configures global NVE configuration such as source IP of the NVE tunnel and UDP source port calculation. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:22 -07:00
Ido Schimmel	beda7f72c3	mlxsw: spectrum: Seed LAG hash function Currently, the seed of the LAG hash function is always set to 0, which means it is identical across all switches. Instead, use a random number. This is especially important now that VxLAN is supported, as the LAG hash function is used to calculate the UDP source port of the encapsulated packet. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:22 -07:00
Ido Schimmel	a682a3024f	mlxsw: reg: Extend FDB flush types for NVE The device has the ability to flush all the FDB records that perform NVE encapsulation or only a subset of these with a specific filtering identifier (FID). Expose these types so that they could be used by subsequent patches where we need to flush the FDB records when an NVE device is unlinked from a bridge (FID). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:22 -07:00
Ido Schimmel	90ea0bb551	mlxsw: spectrum: Add a new type of KVD linear record When the device needs to flood an overlay packet to remote VTEPs it retrieves a pointer to the head of a linked-list of records that store the IP addresses of these VTEPs. These records are stored in the KVD linear memory and configured via the Tunneling NVE Underlay Multicast Table (TNUMT) register. Add a new KVD linear entry type for these records, so that we will be able to allocate and free them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:22 -07:00
Ido Schimmel	12066d612b	mlxsw: spectrum: Move L3 protocol and address definitions to global header file The L3 protocol and address definitions are going to be used by the NVE code, so move them to the global header file from the one private to the router. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:22 -07:00
Ido Schimmel	9c73b1d120	mlxsw: spectrum_switchdev: Do not assume notifier information type VxLAN notifications are going to use a different notifier information type, so cast to the correct type based on the received event. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:22 -07:00
Ido Schimmel	5050f6ae25	mlxsw: spectrum_switchdev: Check notification relevance based on upper device VxLAN FDB updates are sent with the VxLAN device which is not our upper and will therefore be ignored by current code. Solve this by checking whether the upper device (bridge) is our upper. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:22 -07:00
Ido Schimmel	ab74c3a127	mlxsw: spectrum_switchdev: Prepare for VxLAN FDB notifications VxLAN FDB notifications need to be handled differently than bridge FDB notifications, so initialize the work item based on the received notification and rename the invoked function accordingly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:22 -07:00
Ido Schimmel	bf341eb895	mlxsw: spectrum: Remove misuses of private header file The spectrum_router.h header file is private to the router block and should only be included by direct consumers of it, such as dpipe and the multicast routing code. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 10:08:22 -07:00
Nir Dotan	9e66431640	mlxsw: pci: Fix a typo Signed-off-by: Nir Dotan <nird@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-10 22:22:07 -07:00
Tariq Toukan	37fdffb217	net/mlx5: WQ, fixes for fragmented WQ buffers API mlx5e netdevice used to calculate fragment edges by a call to mlx5_wq_cyc_get_frag_size(). This calculation did not give the correct indication for queues smaller than a PAGE_SIZE, (broken by default on PowerPC, where PAGE_SIZE == 64KB). Here it is replaced by the correct new calls/API. Since (TX/RX) Work Queues buffers are fragmented, here we introduce changes to the API in core driver, so that it gets a stride index and returns the index of last stride on same fragment, and an additional wrapping function that returns the number of physically contiguous strides that can be written contiguously to the work queue. This obsoletes the following API functions, and their buggy usage in EN driver: * mlx5_wq_cyc_get_frag_size() * mlx5_wq_cyc_ctr2fragix() The new API improves modularity and hides the details of such calculation for mlx5e netdevice and mlx5_ib rdma drivers. New calculation is also more efficient, and improves performance as follows: Packet rate test: pktgen, UDP / IPv4, 64byte, single ring, 8K ring size. Before: 16,477,619 pps After: 17,085,793 pps 3.7% improvement Fixes: `3a2f703312` ("net/mlx5: Use order-0 allocations for all WQ types") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-10 18:26:16 -07:00
Huy Nguyen	a48bc51315	net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type The HW spec defines only bits 24-26 of pftype_wq as the page fault type, use the required mask to ensure that. Fixes: `d9aaed8387` ("{net,IB}/mlx5: Refactor page fault handling") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-10 18:26:16 -07:00
Talat Batheesh	fd7e848077	net/mlx5: Fix memory leak when setting fpga ipsec caps Allocated memory for context should be freed once finished working with it. Fixes: `d6c4f0298c` ("net/mlx5: Refactor accel IPSec code") Signed-off-by: Talat Batheesh <talatb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-10 18:26:15 -07:00
Feras Daoud	779d986d60	net/mlx5e: Do not ignore netdevice TX/RX queues number The current design of mlx5e driver ignores the netdevice TX/RX queues number for netdevices that RDMA IPoIB ULP creates. Instead, the queue number is initialized to the maximum number that mlx5 thinks best for performance. As a result, ULP drivers that choose to create a netdevice with queue number that is less than the maximum channels mlx5 creates, will get a memory corruption. This fix changes the mlx5e netdev logic to respect ULP netdevices TX/RX queue number and use it when creating resources instead of the maximum channel number. Fixes: `cd565b4b51` ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-10 17:58:17 -07:00
Saeed Mahameed	cdeef2b152	net/mlx5e: Use non-delayed work for update stats Convert mlx5e update stats work to a normal work structure, since it is never used delayed. Add a helper function to queue update stats work on demand which checks for some conditions and reduce code duplication to have a better abstraction. Fixes: `ed56c5193a` ("net/mlx5e: Update NIC HW stats on demand only") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-10 17:58:15 -07:00

... 2 3 4 5 6 ...

4524 Commits