Commit Graph

1935 Commits

Author SHA1 Message Date
Tariq Toukan 0dbf657c39 net/mlx5e: Fix xmit_more counter race issue
Update the xmit_more counter before notifying the HW,
to prevent a possible use-after-free of the skb.

Fixes: c8cf78fe10 ("net/mlx5e: Add ethtool counter for TX xmit_more")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:28 -07:00
Brenden Blanco 326fe02d1e net/mlx4_en: protect ring->xdp_prog with rcu_read_lock
Depending on the preempt mode, the bpf_prog stored in xdp_prog may be
freed despite the use of call_rcu inside bpf_prog_put. The situation is
possible when running in PREEMPT_RCU=y mode, for instance, since the rcu
callback for destroying the bpf prog can run even during the bh handling
in the mlx4 rx path.

Several options were considered before this patch was settled on:

Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all
of the rings are updated with the new program.
This approach has the disadvantage that as the number of rings
increases, the speed of update will slow down significantly due to
napi_synchronize's msleep(1).

Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh.
The action of the bpf_prog_put_bh would be to then call bpf_prog_put
later. Those drivers that consume a bpf prog in a bh context (like mlx4)
would then use the bpf_prog_put_bh instead when the ring is up. This has
the problem of complexity, in maintaining proper refcnts and rcu lists,
and would likely be harder to review. In addition, this approach to
freeing must be exclusive with other frees of the bpf prog, for instance
a _bh prog must not be referenced from a prog array that is consumed by
a non-_bh prog.

The placement of rcu_read_lock in this patch is functionally the same as
putting an rcu_read_lock in napi_poll. Actually doing so could be a
potentially controversial change, but would bring the implementation in
line with sk_busy_loop (though of course the nature of those two paths
is substantially different), and would also avoid future copy/paste
problems with future supporters of XDP. Still, this patch does not take
that opinionated option.

Testing was done with kernels in either PREEMPT_RCU=y or
CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting
any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did
not show up in the perf report whatsoever, and with PREEMPT_RCU=y the
overhead of rcu_read_lock (according to perf) was the same before/after.
In the rx path, rcu_read_lock is eventually called for every packet
from netif_receive_skb_internal, so the napi poll call's rcu_read_lock
is easily amortized.

v2:
Remove extra rcu_read_lock in mlx4_en_process_rx_cq body
Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or
rcu_dereference[_protected] as appropriate.
Add explicit mutex lock around rcu_assign instead of xchg loop.

Fixes: d576acf0a2 ("net/mlx4_en: add page recycle to prepare rx ring for tx support")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06 13:39:33 -07:00
Ido Schimmel aad8b6bae7 mlxsw: spectrum: Use existing flood setup when adding VLANs
When a VLAN is added on a bridge port we should use the existing unicast
flood configuration of the port instead of assuming it's enabled.

Fixes: 0293038e0c ("mlxsw: spectrum: Add support for flood control")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:56 -07:00
Ido Schimmel f1de7a28d5 mlxsw: spectrum: Don't take multiple references on a FID
In commit 14d39461b3 ("mlxsw: spectrum: Use per-FID struct for the
VLAN-aware bridge") I added a per-FID struct, which member ports can
take a reference on upon VLAN membership configuration.

However, sometimes only the VLAN flags (e.g. egress untagged) are
toggled without changing the VLAN membership. In these cases we
shouldn't take another reference on the FID.

Fixes: 14d39461b3 ("mlxsw: spectrum: Use per-FID struct for the VLAN-aware bridge")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:56 -07:00
Jiri Pirko e732263849 mlxsw: spectrum_router: Fix netevent notifier registration
Currently the notifier is registered for every asic instance, however the
same block. Fix this by moving the registration to module init.

Fixes: c723c735fa ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:56 -07:00
Jiri Pirko de7d62952b mlxsw: spectrum: Fix error path in mlxsw_sp_module_init
Add forgotten notifier unregister.

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:56 -07:00
Jiri Pirko 7146da3181 mlxsw: spectrum_router: Fix fib entry update path
Originally, I expected that there would be needed to call update
operation in case RALUE record action is changed. However, that is not
needed since write operation takes care of that nicely. Remove prepared
construct and always call the write operation.

Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:56 -07:00
Jiri Pirko 5b004412e2 mlxsw: spectrum_router: Fix failure caused by double fib removal from HW
In mlxsw we squash tables 254 and 255 together into HW. Kernel adds/dels
/32 ip to/from both 254 and 255. On del path, that causes the same
prefix being removed twice. Fix this by introducing reference counting
for private mlxsw fib entries. That required a bit of code reshuffle.
Also put dev into fib entry key so the same prefix could be represented
once per every router interface.

Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:55 -07:00
David S. Miller 6abdd5f593 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All three conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30 00:54:02 -04:00
Maor Gottlieb e5835f2833 net/mlx5: Increase number of ethtool steering priorities
Ethtool has 11 flow tables, each flow table has its own priority.
Increase the number of priorities to be aligned with the number of flow
tables.

Fixes: 1174fce8d1 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:16 -04:00
Eran Ben Elisha 1722b9694e net/mlx5: Add error prints when validate ETS failed
Upon set ETS failure due to user invalid input, add error prints to
specify the exact error to the user.

Fixes: cdcf11212b ('net/mlx5e: Validate BW weight values of ETS')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:16 -04:00
Kamal Heib bf50082c15 net/mlx5e: Fix memory leak if refreshing TIRs fails
Free 'in' command object also when mlx5_core_modify_tir fails.

Fixes: 724b2aa151 ("net/mlx5e: TIRs management refactoring")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Tariq Toukan c8cf78fe10 net/mlx5e: Add ethtool counter for TX xmit_more
Add a counter in ethtool for the number of times that
TX xmit_more was used.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Eran Ben Elisha cc8e9ebf95 net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ
The driver RQ has two possible configurations: striding RQ and
non-striding RQ.  Until this patch, the driver always reported the
number of hardware WQEs (ring descriptors). For non striding RQ
configuration, this was OK since we have one WQE per pending packet
For striding RQ, multiple packets can fit into one WQE. For better
user experience we normalize the rx_pending parameter (size of wqe/mtu)
as the average ring size in case of striding RQ.

Fixes: 461017cb00 ('net/mlx5e: Support RX multi-packet WQE ...')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Saeed Mahameed 6e8dd6d6f4 net/mlx5e: Don't wait for SQ completions on close
Instead of asking the firmware to flush the SQ (Send Queue) via
asynchronous completions when moved to error, we handle SQ flush
manually (mlx5e_free_tx_descs) same as we did when SQ flush got
timed out or on tx_timeout.

This will reduce SQs flush time and speedup interface down procedure.

Moved mlx5e_free_tx_descs to the end of en_tx.c for tx
critical code locality.

Fixes: 29429f3300 ('net/mlx5e: Timeout if SQ doesn't flush during close')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Saeed Mahameed 8484f9ed13 net/mlx5e: Don't post fragmented MPWQE when RQ is disabled
ICO (Internal control operations) SQ (Send Queue) is closed/disabled
after RQ (Receive Queue).  After RQ is closed an ICO SQ completion
might post a fragmented MPWQE (Multi Packet Work Queue Element) into
that RQ.

As on regular RQ post, check if we are allowed to post to that
RQ (RQ is enabled). Cleanup in-progress UMR MPWQE on mlx5e_free_rx_descs
if needed.

Fixes: bc77b240b3 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Saeed Mahameed f2fde18c52 net/mlx5e: Don't wait for RQ completions on close
This will significantly reduce receive queue flush time on interface
down.

Instead of asking the firmware to flush the RQ (Receive Queue) via
asynchronous completions when moved to error, we handle RQ flush
manually (mlx5e_free_rx_descs) same as we did when RQ flush got timed
out.

This will reduce RQs flush time and speedup interface down procedure
(ifconfig down) from 6 sec to 0.3 sec on a 48 cores system.

Moved mlx5e_free_rx_descs en_main.c where it is needed, to keep en_rx.c
free form non critical data path code for better code locality.

Fixes: 6cd392a082 ('net/mlx5e: Handle RQ flush in error cases')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Saeed Mahameed fe4c988bdd net/mlx5e: Limit UMR length to the device's limitation
ConnectX-4 UMR (User Memory Region) MTT translation table offset in WQE
is limited to U16_MAX, before this patch we ignored that limitation and
requested the maximum possible UMR translation length that the netdev
might need (MAX channels * MAX pages per channel).
In case of a system with #cores > 32 and when linear WQE allocation fails,
falling back to using UMR WQEs will cause the RQ (Receive Queue) to get
stuck.

Here we limit UMR length to min(U16_MAX, max required pages) (while
considering the required alignments) on driver load, by default U16_MAX is
sufficient since the default RX rings value guarantees that we are in
range, dynamically (on set_ringparam/set_channels) we will check if the
new required UMR length (num mtts) is still in range, if not, fail the
request.

Fixes: bc77b240b3 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Ido Schimmel 1c6c6d221e mlxsw: spectrum: Mirror certain packets to CPU
Instead of trapping certain packets to the CPU and then relying on it to
flood them we can instead make the device mirror them.

The following packet types are mirrored:

* DHCP: Broadcast packets that should be flooded by the device, but also
trapped in case CPU is running the DHCP server.

* IGMP query: Multicast packets that need to be forwarded to other
bridge ports, but also trapped so that receiving netdev will be marked
as a router port by the bridge driver.

* ARP request: Broadcast packets that should be forwarded to other
bridge ports, but also trapped in case requested IP is of the local
machine.

* ARP response: Unicast packets that should be forwarded by the bridge
but also trapped in case response is directed at us.

Set the trap action of such packets to mirror and mark them using
'offload_fwd_mark' to prevent the bridge driver from forwarding them
itself.

Note that OSPF packets are also marked despite their action being trap.
The reason for this is that the device traps such packets in the
pipeline after they were already flooded.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26 13:13:37 -07:00
Ido Schimmel 63a811417d mlxsw: spectrum: Allow different traps to have different actions
Up until now we only trapped packets to CPU, but we are going to allow
some packets to be mirrored (trap & forward) to CPU.

Extend the Rx listener with 'action' member.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26 13:13:36 -07:00
Ido Schimmel 93393b339d mlxsw: spectrum: Simplify traps definition
Instead of copying & pasting the same struct initialization for every
Rx listener, just use a macro.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26 13:13:36 -07:00
Doug Ledford 0c41284c83 Mellanox ConnectX-4/Connect-IB shared code (SW part)
* net/mlx5: Add sniffer namespaces
 * net/mlx5: Introduce sniffer steering hardware capabilities
 * net/mlx5: Configure IB devices according to LAG state
 * net/mlx5: Vport LAG creation support
 * net/mlx5: Add LAG flow steering namespace
 * net/mlx5: LAG demux flow table support
 * net/mlx5: LAG and SRIOV cannot be used together
 * net/mlx5e: Avoid port remapping of mlx5e netdev TISes
 * net/mlx5: Get RoCE netdev
 * net/mlx5: Implement RoCE LAG feature
 * net/mlx5: Add HW interfaces used by LAG
 * net/mlx5: Separate query_port_proto_oper for IB and EN
 * net/mlx5: Expose mlx5e_link_mode
 * net/mlx5: Update struct mlx5_ifc_xrqc_bits
 * net/mlx5: Modify RQ bitmask from mlx5 ifc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXu20hAAoJEORje4g2cliniO8P/0nMxLemOxY63u7P6DqT+UZQ
 +LN62W+/iLicNayKkt8mtcjnDm768YcF3ADvx73vRvKEeUyyEqT5ChMA59eicf70
 rrumfNXB/kfBOaPh5rFWf4Tn8WWpKW+0559drm80NslFZF9jjF9pwv5QGg7xISb7
 fYLcDQWn+5fYDuZzYsSu8zZKUEyGN0AugdjfxT5OHfh4rw+6oqGDb2fhH6LdkD8q
 j3Qx1cPmdQQnjJ5veXJFJT5qHFDqJlNmy85s4l99ItdWD/bcU29ue3Q3vNf7+lHp
 XoJB4ZRWG7sf98yXYXnOUt3iGUMdSJzpLfZqh/Nx9U1LZpdJ8lmBf7pRuR1hpPIN
 yDitcz+CMcFVr2WxvwWaUPhRE7SJsZxxr6tQISgRicYcFVyy9e7mLjABMtkh9vEn
 CXXqiDGUb/27HqTi9ha5qRiLoeT8yFpOCkINL4omV2FJKoUEbC+Jbq5P0mjnPpS1
 ZdzTOzWCtkDQGtLbi+nCIF5SVTv7CCDU+6VpGZPmk6M4/ednwajhxGPsbw6bRpna
 ck5SglGO8dFAaUv1UVRq04PIt7Lj2FRakP7sHWx3tc9XEP8syLX0OEiVB+ZN3yRn
 y2TlpsREk7AqDdRulwM4qfuNd4AxaDklXyS3C79RiJtenYO4GUGrJ6J6ryesLg8u
 tGKVV3fXEr2Hve6cTkpu
 =+m21
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma into mlx5-shared

Mellanox ConnectX-4/Connect-IB shared code (SW part)

* net/mlx5: Add sniffer namespaces
* net/mlx5: Introduce sniffer steering hardware capabilities
* net/mlx5: Configure IB devices according to LAG state
* net/mlx5: Vport LAG creation support
* net/mlx5: Add LAG flow steering namespace
* net/mlx5: LAG demux flow table support
* net/mlx5: LAG and SRIOV cannot be used together
* net/mlx5e: Avoid port remapping of mlx5e netdev TISes
* net/mlx5: Get RoCE netdev
* net/mlx5: Implement RoCE LAG feature
* net/mlx5: Add HW interfaces used by LAG
* net/mlx5: Separate query_port_proto_oper for IB and EN
* net/mlx5: Expose mlx5e_link_mode
* net/mlx5: Update struct mlx5_ifc_xrqc_bits
* net/mlx5: Modify RQ bitmask from mlx5 ifc
2016-08-25 10:01:23 -04:00
Ido Schimmel 0f7a4d8a9d mlxsw: spectrum: Don't set learning when creating vPorts
Before commit 99724c18fc ("mlxsw: spectrum: Introduce support for
router interfaces") we used to assign vFIDs to the created vPorts. Since
these vPorts were used for slow path traffic we had to disable learning
for them, as it doesn't make sense to have it enabled.

This is no longer the case and now vPorts are either used for router
interfaces (for which learning is disabled by the firmware) or bridge
ports (for which learning is explicitly enabled by the driver).

Therefore, we can remove the learning configuration upon vPort creation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:13 -07:00
Ido Schimmel 81f77bc006 mlxsw: spectrum: Remove unnecessary check in FDB processing
We now offload the learning configuration to the device and don't rely
on the driver to decide whether to learn the FDB record, so remove the
check.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00
Ido Schimmel 89b548f0cf mlxsw: spectrum: Offload learning to the switch ASIC
Up until now we simply stored the learning configuration of a bridge
port in the driver and decided whether to learn a new FDB record based
on this value.

However, this is sub-optimal in cases where learning is disabled on the
bridge port, as the device repeatedly generates learning notifications
for the same record.

Instead, offload the learning configuration to the device, thereby
preventing it from generating notifications when learning is disabled.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00
Ido Schimmel 584d73df06 mlxsw: spectrum: Configure learning for VLAN-aware bridge port
We are going to prevent the device from generating learning
notifications for a port that was configured with learning disabled.

Since learning configuration is done per {Port, VID} we need to apply
the port's learning configuration for any VID that is added to the
bridge port's VLAN filter list.

When a VID is added to the VLAN filter list of a VLAN-aware bridge port,
configure the {Port, VID} learning status according to the port's
configuration. When the VID is removed, disable learning for the {Port,
VID}.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00
Ido Schimmel 640be7b717 mlxsw: spectrum: Don't abort on first error when removing VLANs
When removing VLANs from the VLAN-aware bridge we shouldn't abort on the
first error, as we'll otherwise have resources that will never be freed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00
Ido Schimmel f7a8f6cec3 mlxsw: spectrum: Make VLAN deletion function symmetric
Commit 05978481e7 ("mlxsw: spectrum: Create PVID vPort before
registering netdevice") removed __mlxsw_sp_port_vlans_del() from the
init sequence of the driver, which forced it to be non-symmetric with
regards to __mlxsw_sp_port_vlans_add().

Make both functions symmetric as the constraint no longer exists.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00
Ido Schimmel 1803e0fb7e mlxsw: spectrum: Limit number of FDB records per learning session
Up until now a learning session ended whenever the number of queried
records was zero. This turned out to be problematic in situations where
a large number of MACs (48K) had to be processed by the switch driver,
as RTNL mutex is held during the learning session.

Instead, limit the number of FDB records that can be processed in a
session to 64. This means that every time the device is queried for
learning notifications (currently, every 100ms), up to 64 records will
be processed by the switch driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:11 -07:00
Yotam Gigi 51af96b534 mlxsw: router: Enable neighbors to be created on stacked devices
Make the function mlxsw_router_neigh_construct search the rif according
to the neighbour dev other than the dev that was passed to the ndo, thus
allowing creating neigbhours upon stacked devices.

Fixes: 6cf3c971dc ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:39:04 -07:00
Ido Schimmel f888f58795 mlxsw: spectrum: Add missing flood to router port
In case we have a layer 3 interface on top of a bridge (VLAN / FID RIF),
then we should flood the following packet types to the router:

* Broadcast: If DIP is the broadcast address of the interface, then we
need to be able to get it to CPU by trapping it following route lookup.

* Reserved IP multicast (224.0.0.X): Some control packets (e.g. OSPF)
use this range and are trapped in the router block.

Fixes: 99f44bb352 ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:39:03 -07:00
David S. Miller fff84d2a39 Mellanox ConnectX-4/Connect-IB shared code (SW part)
* net/mlx5: Add sniffer namespaces
 * net/mlx5: Introduce sniffer steering hardware capabilities
 * net/mlx5: Configure IB devices according to LAG state
 * net/mlx5: Vport LAG creation support
 * net/mlx5: Add LAG flow steering namespace
 * net/mlx5: LAG demux flow table support
 * net/mlx5: LAG and SRIOV cannot be used together
 * net/mlx5e: Avoid port remapping of mlx5e netdev TISes
 * net/mlx5: Get RoCE netdev
 * net/mlx5: Implement RoCE LAG feature
 * net/mlx5: Add HW interfaces used by LAG
 * net/mlx5: Separate query_port_proto_oper for IB and EN
 * net/mlx5: Expose mlx5e_link_mode
 * net/mlx5: Update struct mlx5_ifc_xrqc_bits
 * net/mlx5: Modify RQ bitmask from mlx5 ifc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXu20hAAoJEORje4g2cliniO8P/0nMxLemOxY63u7P6DqT+UZQ
 +LN62W+/iLicNayKkt8mtcjnDm768YcF3ADvx73vRvKEeUyyEqT5ChMA59eicf70
 rrumfNXB/kfBOaPh5rFWf4Tn8WWpKW+0559drm80NslFZF9jjF9pwv5QGg7xISb7
 fYLcDQWn+5fYDuZzYsSu8zZKUEyGN0AugdjfxT5OHfh4rw+6oqGDb2fhH6LdkD8q
 j3Qx1cPmdQQnjJ5veXJFJT5qHFDqJlNmy85s4l99ItdWD/bcU29ue3Q3vNf7+lHp
 XoJB4ZRWG7sf98yXYXnOUt3iGUMdSJzpLfZqh/Nx9U1LZpdJ8lmBf7pRuR1hpPIN
 yDitcz+CMcFVr2WxvwWaUPhRE7SJsZxxr6tQISgRicYcFVyy9e7mLjABMtkh9vEn
 CXXqiDGUb/27HqTi9ha5qRiLoeT8yFpOCkINL4omV2FJKoUEbC+Jbq5P0mjnPpS1
 ZdzTOzWCtkDQGtLbi+nCIF5SVTv7CCDU+6VpGZPmk6M4/ednwajhxGPsbw6bRpna
 ck5SglGO8dFAaUv1UVRq04PIt7Lj2FRakP7sHWx3tc9XEP8syLX0OEiVB+ZN3yRn
 y2TlpsREk7AqDdRulwM4qfuNd4AxaDklXyS3C79RiJtenYO4GUGrJ6J6ryesLg8u
 tGKVV3fXEr2Hve6cTkpu
 =+m21
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Saeed Mahameed says:

====================
Mellanox mlx5 core driver updates 2016-08-24

This series contains some low level and API updates for mlx5 core
driver interface and mlx5_ifc.h, plus mlx5 LAG core driver support,
to be shared as base code for net-next and rdma mlx5 4.9 submissions.

From Alex and Artemy, Update mlx5_ifc for modify RQ and XRC bits.

From Noa, Expose mlx5 link modes so they can be used in RDMA tree for rdma tools.

From Aviv, LAG support needed for RDMA.
    - Add needed hardware structures, layouts and interface
    - mlx5 core driver LAG implementation
    - Introduce mlx5 core driver LAG API for mlx5_ib

From Maor, add two low level patches for mlx5 hardware sniffer QP
infrastructure bits and capabilities, plus added the namespace for sniffer
steering tables.  Needed for RDMA subtree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:35:35 -07:00
David S. Miller 01555e6449 Mellanox ConnectX-4/Connect-IB shared code (HW part)
* net/mlx5: Introduce alloc_encap and dealloc_encap commands
 * net/mlx5: Update mlx5_ifc.h for vxlan encap/decap
 * net/mlx5: Enable setting minimum inline header mode for VFs
 * net/mlx5: Improve driver log messages
 * net/mlx5: Unify and improve command interface
 * {net,IB}/mlx5: Modify QP commands via mlx5 ifc
 * {net,IB}/mlx5: QP/XRCD commands via mlx5 ifc
 * {net,IB}/mlx5: MKey/PSV commands via mlx5 ifc
 * {net,IB}/mlx5: CQ commands via mlx5 ifc
 * net/mlx5: EQ commands via mlx5 ifc
 * net/mlx5: Pages management commands via mlx5 ifc
 * net/mlx5: MCG commands via mlx5 ifc
 * net/mlx5: PD and UAR commands via mlx5 ifc
 * net/mlx5: Access register and MAD IFC commands via mlx5 ifc
 * net/mlx5: Init/Teardown hca commands via mlx5 ifc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXu2zqAAoJEORje4g2clin0dQP/3SF9+4lxVaWRDnhutwIdaxd
 GDWDCEcp1x8oC1ylKEfQW57tTG8mk6pEFD5xEZSAJMGjGm5zR8QnaIS9eiPTdDkf
 QIReMP9XJUUVDqXZ8F207PwVbgB4IkHB2VPyl2Sar1HULe6Mn3nAS40A1QfYpVzs
 cYC3SFOPuLsTDZkIVQrZzKvX4WVHjcyj0tAkXkutWQ+K8cPXmpx49+ngrzVm6xnw
 j6THx3kOAEwozW5NxMC7V6DOD7KfLWzPi96BLZ2h4eQynpgJnSLOCar3zyBPH5g3
 KAk99tVjD1kp+HreSNzCd+oP8Zqrw+RBt3WlrGX2GvQ0V7XIJrpkRbLDgWhbBjej
 O1ln/xr5pqLSKgxz41LsFlrLWbOgG7r4N212iMNv3rArb9e11tqZCAbR0OzX5vZ6
 fl2W7moYRB2273Y+MnB/e1e8xf7PEIppWnyvyPrzCz1lSdzw1BzLqz5tWz2nc1dB
 yQWosVTf4xTa3OQHhUqw6CbhpRpywQZx1ZhmAzZ7+hQ90Z4hwPWWXIx7MNa4g2sJ
 toUamuonbnib3wBLQzzW2ktTbdJUx8OTF5aiVNC06QG8KAvXeUAP2Ho95Am3JpLJ
 XZ14ZP0NxOFaGgOSDRxEVKuhnUnXuIG57NSgQpMD5rjSieMl+msasrydP8X4+qny
 HlwA4nwt2bHf9k7Cg1iM
 =QIAc
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Saeed Mahameed says:

====================
Mellanox mlx5 core driver updates 2016-08-20

This series contains several low level and API updates for mlx5 core
commands interface and mlx5_ifc.h to be shared as base code for net-next and
rdma mlx5 4.9 submissions.

From Saeed, ten patches that refactors old layouts of firmware commands which
were manually generated before we introduced the mlx5_ifc, now all of the firmware
commands inbox/outbox layouts moved to use mlx5_ifc and we remove the old
manually generated structures.  Plus to those ten patches, we add two patches
that unifies mlx5 commands execution interface and improve the driver log messages
in that area.

From Hadar and Ilya, added the needed hardware bits and infrastructure for
minimum inline headers setting and encap/decap commands and capabilities,
needed for E-Switch offloads.

This series applies on top latest net-next and rdma/master, and smoothly merges with
the latest "Mellanox 100G mlx5 fixes 2016-08-16" series already applied into net branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 11:08:23 -07:00
Doug Ledford 124c13439b Mellanox ConnectX-4/Connect-IB shared code (HW part)
* net/mlx5: Introduce alloc_encap and dealloc_encap commands
 * net/mlx5: Update mlx5_ifc.h for vxlan encap/decap
 * net/mlx5: Enable setting minimum inline header mode for VFs
 * net/mlx5: Improve driver log messages
 * net/mlx5: Unify and improve command interface
 * {net,IB}/mlx5: Modify QP commands via mlx5 ifc
 * {net,IB}/mlx5: QP/XRCD commands via mlx5 ifc
 * {net,IB}/mlx5: MKey/PSV commands via mlx5 ifc
 * {net,IB}/mlx5: CQ commands via mlx5 ifc
 * net/mlx5: EQ commands via mlx5 ifc
 * net/mlx5: Pages management commands via mlx5 ifc
 * net/mlx5: MCG commands via mlx5 ifc
 * net/mlx5: PD and UAR commands via mlx5 ifc
 * net/mlx5: Access register and MAD IFC commands via mlx5 ifc
 * net/mlx5: Init/Teardown hca commands via mlx5 ifc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXu2zqAAoJEORje4g2clin0dQP/3SF9+4lxVaWRDnhutwIdaxd
 GDWDCEcp1x8oC1ylKEfQW57tTG8mk6pEFD5xEZSAJMGjGm5zR8QnaIS9eiPTdDkf
 QIReMP9XJUUVDqXZ8F207PwVbgB4IkHB2VPyl2Sar1HULe6Mn3nAS40A1QfYpVzs
 cYC3SFOPuLsTDZkIVQrZzKvX4WVHjcyj0tAkXkutWQ+K8cPXmpx49+ngrzVm6xnw
 j6THx3kOAEwozW5NxMC7V6DOD7KfLWzPi96BLZ2h4eQynpgJnSLOCar3zyBPH5g3
 KAk99tVjD1kp+HreSNzCd+oP8Zqrw+RBt3WlrGX2GvQ0V7XIJrpkRbLDgWhbBjej
 O1ln/xr5pqLSKgxz41LsFlrLWbOgG7r4N212iMNv3rArb9e11tqZCAbR0OzX5vZ6
 fl2W7moYRB2273Y+MnB/e1e8xf7PEIppWnyvyPrzCz1lSdzw1BzLqz5tWz2nc1dB
 yQWosVTf4xTa3OQHhUqw6CbhpRpywQZx1ZhmAzZ7+hQ90Z4hwPWWXIx7MNa4g2sJ
 toUamuonbnib3wBLQzzW2ktTbdJUx8OTF5aiVNC06QG8KAvXeUAP2Ho95Am3JpLJ
 XZ14ZP0NxOFaGgOSDRxEVKuhnUnXuIG57NSgQpMD5rjSieMl+msasrydP8X4+qny
 HlwA4nwt2bHf9k7Cg1iM
 =QIAc
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma into mlx5-shared

Mellanox ConnectX-4/Connect-IB shared code (HW part)

* net/mlx5: Introduce alloc_encap and dealloc_encap commands
* net/mlx5: Update mlx5_ifc.h for vxlan encap/decap
* net/mlx5: Enable setting minimum inline header mode for VFs
* net/mlx5: Improve driver log messages
* net/mlx5: Unify and improve command interface
* {net,IB}/mlx5: Modify QP commands via mlx5 ifc
* {net,IB}/mlx5: QP/XRCD commands via mlx5 ifc
* {net,IB}/mlx5: MKey/PSV commands via mlx5 ifc
* {net,IB}/mlx5: CQ commands via mlx5 ifc
* net/mlx5: EQ commands via mlx5 ifc
* net/mlx5: Pages management commands via mlx5 ifc
* net/mlx5: MCG commands via mlx5 ifc
* net/mlx5: PD and UAR commands via mlx5 ifc
* net/mlx5: Access register and MAD IFC commands via mlx5 ifc
* net/mlx5: Init/Teardown hca commands via mlx5 ifc
2016-08-23 11:52:02 -04:00
Markus Elfring 6f0b826da4 mlx5/core: Use memdup_user() rather than duplicating its implementation
* Reuse existing functionality from memdup_user() instead of keeping
  duplicate source code.

  This issue was detected by using the Coccinelle software.

* Return directly if this copy operation failed.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 17:05:38 -07:00
Jiri Pirko 8912862f06 mlxsw: spectrum_buffers: Fix pool value handling in mlxsw_sp_sb_tc_pool_bind_set
Pool index has to be converted by get_pool helper to work correctly for
egress pool. In mlxsw the egress pool index starts from 0.

Fixes: 0f433fa0ec ("mlxsw: spectrum_buffers: Implement shared buffer configuration")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 18:01:56 -07:00
Or Gerlitz f96750f8d6 net/mlx5: E-Switch, Avoid ACLs in the offloads mode
When we are in the switchdev/offloads mode, HW matching is done as
dictated by the offloaded rules and hence we don't need to enable
the ACLs mechanism used by the legacy mode.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:56 -07:00
Or Gerlitz 1a8ee6f25b net/mlx5: E-Switch, Set the send-to-vport rules in the correct table
While adding actual offloading support to the new switchdev mode, we didn't
change the setup of the send-to-vport rules to put them in the slow path
table, fix that.

Fixes: 1033665e63 ('net/mlx5: E-Switch, Use two priorities for SRIOV offloads mode')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:56 -07:00
Or Gerlitz ef78618b9d net/mlx5: E-Switch, Return the correct devlink e-switch mode
Since mlx5 has also the NONE e-switch mode, we must translate from mlx5
mode to devlink mode on the devlink eswitch mode get call, do that.

While here, remove the mlx5_ prefix from the static function helpers
that deal with the mode to comply with the rest of the code.

Fixes: c930a3ad74 ('net/mlx5e: Add devlink based SRIOV mode change')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:56 -07:00
Hadar Hen Zion dbe413e3bb net/mlx5e: Retrieve the switchdev id from the firmware only once
Avoid firmware command execution each time the switchdev HW ID attr get
call is made. We do that by reading the ID (PF NIC MAC) only once at
load time and store it on the representor structure.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:56 -07:00
Hadar Hen Zion 1dbd0d373a net/mlx5e: Use correct flow dissector key on flower offloading
The wrong key is used when extracting the address type field set by
the flower offload code. We have to use the control key and not the
basic key, fix that.

Fixes: e3a2b7ed01 ('net/mlx5e: Support offload cls_flower with drop action')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:56 -07:00
Amir Vadai 6c3b4f9086 net/mlx5: Update last-use statistics for flow rules
Set lastuse statistic, when number of packets is changed compared to
last query. This was wrongly dropped when bulk counter reading was added.

Fixes: a351a1b03b ('net/mlx5: Introduce bulk reading of flow counters')
Signed-off-by: Amir Vadai <amirva@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:55 -07:00
Paul Blakey 2c0f8ce1b5 net/mlx5: Added missing check of msg length in verifying its signature
Set and verify signature calculates the signature for each of the
mailbox nodes, even for those that are unused (from cache). Added
a missing length check to set and verify only those which are used.

While here, also moved the setting of msg's nodes token to where we
already go over them. This saves a pass because checksum is disabled,
and the only useful thing remaining that set signature does is setting
the token.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB
adapters')
Signed-off-by: Paul Blakey <paulb@mellanox.com>

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:55 -07:00
Mohamad Haj Yahia 1061c90f52 net/mlx5: Fix pci error recovery flow
When PCI error is detected we should save the state of the pci prior to
disabling it.

Also when receiving pci slot reset call we need to verify that the
device is responsive.

Fixes: 89d44f0a6c ('net/mlx5_core: Add pci error handlers to mlx5_core
driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:55 -07:00
Tariq Toukan 506753b0b4 net/mlx5e: Optimization for MTU change
Avoid unnecessary interface down/up operations upon an MTU change
when it does not affect the rings configuration.

Fixes: 461017cb00 ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:55 -07:00
Saeed Mahameed 13f9bba7cd net/mlx5e: Set port MTU on netdev creation rather on open
Port mtu shouldn't be written to hardware on every single interface
open.
Here we set it only when needed, on change_mtu and netdevice creation.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:55 -07:00
Maor Gottlieb 87d22483ce net/mlx5: Add sniffer namespaces
Add sniffer TX and RX namespaces to receive ingoing and outgoing
traffic.

Each outgoing/incoming packet is duplicated and steered to the sniffer
TX/RX namespace in addition to the regular flow.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:59 +03:00
Aviv Heller 917b41aab7 net/mlx5: Configure IB devices according to LAG state
When mlx5_ib is loaded, we would like each card's IB devices
to be added according to its LAG state (one IB device, instead of
two, is to be added if LAG is active).

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:58 +03:00
Aviv Heller 3bc34f3bcb net/mlx5: Vport LAG creation support
Add interfaces for issuing CREATE_VPORT_LAG and
DESTROY_VPORT_LAG commands.

Used for receiving PF1's eth traffic on PF0's
root ft.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:57 +03:00
Aviv Heller 3e75d4ebaa net/mlx5: Add LAG flow steering namespace
This namespace is used for LAG demux flowtable.

The idea is to position the LAG demux ft between
bypass and kernel flowtables, allowing raw-eth
traffic from both ports to be received by the PF0
IB device.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:57 +03:00
Aviv Heller aaff1bea16 net/mlx5: LAG demux flow table support
Add interfaces to allow the creation and destruction of a
LAG demux flow table.

It is a special flow table used during LAG for redirecting
non user-mode packets from PF0 to PF1 root ft, if a packet was
received on phys port two.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:56 +03:00
Aviv Heller edb31b1686 net/mlx5: LAG and SRIOV cannot be used together
Until support will be added for RoCE LAG SRIOV.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:56 +03:00
Aviv Heller db60b80273 net/mlx5e: Avoid port remapping of mlx5e netdev TISes
TISes belonging to the mlx5e NIC should not be
subject to port remap.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:55 +03:00
Aviv Heller 6a32047a44 net/mlx5: Get RoCE netdev
Used by IB driver for determining the IB bond
device's netdev, when LAG is active.

Returns PF0's netdev if mode is not active-backup,
or the PF netdev of the active slave when mode is
active-backup.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:54 +03:00
Aviv Heller 7907f23adc net/mlx5: Implement RoCE LAG feature
Available on dual port cards only, this feature keeps
track, using netdev LAG events, of the bonding
and link status of each port's PF netdev.

When both of the card's PF netdevs are enslaved to the
same bond/team master, and only them, LAG state
is active.

During LAG, only one IB device is present for both ports.

In addition to the above, this commit includes FW commands
used for managing the LAG, new facilities for adding and removing
a single device by interface, and port remap functionality according to
bond events.

Please note that this feature is currently used only for mimicking
Ethernet bonding for RoCE - netdevs functionality is not altered,
and their bonding continues to be managed solely by bond/team driver.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:54 +03:00
Aviv Heller 84df61ebc6 net/mlx5: Add HW interfaces used by LAG
Exposed LAG commands enum and layouts:
- CREATE_LAG
  HW enters LAG mode:
  RoCE traffic from port two is received on PF0 core dev.
  Allows to set tx_affinity (tx port) for QPs and TISes.
  Allows to port remap QPs and TISes, overriding their
  tx_affinity behavior.

- MODIFY_LAG
  Remap QPs and TISes to another port.

- QUERY_LAG
  Query whether LAG mode is active.

- DESTROY_LAG
  HW exits LAG mode, returning to non-LAG behavior.

- CREATE_VPORT_LAG
  Merge Ethernet flow steering, such that traffic received on port
  two jumps to PF0 root flow table.

  Available only in LAG mode.

- DESTROY_VPORT_LAG
  Ethernet flow steering returns to non-LAG behavior.

Caps added:
- lag_master
  Driver is in charge of managing the LAG.
  This is currently the only option.

- num_lag_ports
  LAG is supported only if this field's value is 2.

Other fields:
- QP/TIS tx port affinity
  During LAG, this field controls on which port a QP or TIS resides.

- TIS strict tx affinity
  When this field is set, the TIS will not be subject to port remap by
  CREATE_LAG/MODIFY_LAG.

- LAG demux flow table
  Flow table used for redirecting non user-space traffic back to
  PF1 root flow table, if the packet was received on port two.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:53 +03:00
Noa Osherovich d5beb7f2af net/mlx5: Separate query_port_proto_oper for IB and EN
Replaced mlx5_query_port_proto_oper with separate functions per link
type. The functions should take different arguments so no point in
trying to unite them.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:52 +03:00
Noa Osherovich 8cca30a7f9 net/mlx5: Expose mlx5e_link_mode
The mlx5e_link_mode enumeration will also be used in mlx5_ib for RoCE.
This patch moves the enumeration to the mlx5 driver port header file.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:52 +03:00
Alex Vesker 83b502a12e net/mlx5: Modify RQ bitmask from mlx5 ifc
Use mlx5 ifc MODIFY_BITMASK_VSD in mlx5e_modify_rq_vsd and expose counter
set capability bit in hca caps structure.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:08 +03:00
Linus Torvalds 184ca82348 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
    Fietkau.

 2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.

 3) Fix some tg3 ethtool logic bugs, and one that would cause no
    interrupts to be generated when rx-coalescing is set to 0.  From
    Satish Baddipadige and Siva Reddy Kallam.

 4) QLCNIC mailbox corruption and napi budget handling fix from Manish
    Chopra.

 5) Fix fib_trie logic when walking the trie during /proc/net/route
    output than can access a stale node pointer.  From David Forster.

 6) Several sctp_diag fixes from Phil Sutter.

 7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.

 8) Checksum fixup fixes in bpf from Daniel Borkmann.

 9) Memork leaks in nfnetlink, from Liping Zhang.

10) Use after free in rxrpc, from David Howells.

11) Use after free in new skb_array code of macvtap driver, from Jason
    Wang.

12) Calipso resource leak, from Colin Ian King.

13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.

14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.

15) Fix lockdep splats in macsec, from Sabrina Dubroca.

16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
    handling.

17) Various tc-action bug fixes, from CONG Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  net_sched: allow flushing tc police actions
  net_sched: unify the init logic for act_police
  net_sched: convert tcf_exts from list to pointer array
  net_sched: move tc offload macros to pkt_cls.h
  net_sched: fix a typo in tc_for_each_action()
  net_sched: remove an unnecessary list_del()
  net_sched: remove the leftover cleanup_a()
  mlxsw: spectrum: Allow packets to be trapped from any PG
  mlxsw: spectrum: Unmap 802.1Q FID before destroying it
  mlxsw: spectrum: Add missing rollbacks in error path
  mlxsw: reg: Fix missing op field fill-up
  mlxsw: spectrum: Trap loop-backed packets
  mlxsw: spectrum: Add missing packet traps
  mlxsw: spectrum: Mark port as active before registering it
  mlxsw: spectrum: Create PVID vPort before registering netdevice
  mlxsw: spectrum: Remove redundant errors from the code
  mlxsw: spectrum: Don't return upon error in removal path
  i40e: check for and deal with non-contiguous TCs
  ixgbe: Re-enable ability to toggle VLAN filtering
  ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
  ...
2016-08-17 17:26:58 -07:00
WANG Cong 22dc13c837 net_sched: convert tcf_exts from list to pointer array
As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to an array of pointers.

The "ugly" part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the array of pointers to a list, instead of
relying on the C99 feature to iterate the array.

Fixes: a85a970af2 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:27:51 -04:00
Ido Schimmel 9ffcc3725f mlxsw: spectrum: Allow packets to be trapped from any PG
When packets enter the device they are classified to a priority group
(PG) buffer based on their PCP value. After their egress port and
traffic class are determined they are moved to the switch's shared
buffer and await transmission, if:

(Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres &&
 Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres)
||
(Ingress{Port}.Usage < Min || Ingress{Port,PG} < Min ||
 Egress{Port}.Usage < Min || Egress{Port,TC}.Usage < Min)

Packets scheduled to transmission through CPU port (trapped to CPU) use
traffic class 7, which has a zero maximum and minimum quotas. However,
when such packets arrive from PG 0 they are admitted to the shared
buffer as PG 0 has a non-zero minimum quota.

Allow all packets to be trapped to the CPU - regardless of the PG they
were classified to - by assigning a 10KB minimum quota for CPU port and
TC7.

Fixes: 8e8dfe9fdf ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Tested-by: Tamir Winetroub <tamirw@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:28 -04:00
Ido Schimmel 8168287b5d mlxsw: spectrum: Unmap 802.1Q FID before destroying it
Before destroying the 802.1Q FID we should first remove the VID-to-FID
mapping. This makes mlxsw_sp_fid_destroy() symmetric with regards to
mlxsw_sp_fid_create().

Fixes: 14d39461b3 ("mlxsw: spectrum: Use per-FID struct for the VLAN-aware bridge")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel 0583272d91 mlxsw: spectrum: Add missing rollbacks in error path
While going over the code I noticed we are missing two rollbacks in the
port's creation error path. Add them and adjust the place of one of them
in the port's removal sequence so that both are symmetric.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Jiri Pirko 0e7df1a290 mlxsw: reg: Fix missing op field fill-up
Ralue pack function needs to set op, otherwise it is 0 for add always.

Fixes: d5a1c749d2 ("mlxsw: reg: Add Router Algorithmic LPM Unicast Entry Register definition")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel a94a614fa2 mlxsw: spectrum: Trap loop-backed packets
One of the conditions to generate an ICMP Redirect Message is that "the
packet is being forwarded out the same physical interface that it was
received from" (RFC 1812).

Therefore, we need to be able to trap such packets and let the kernel
decide what to do with them.

For each RIF, enable the loop-back filter, which will raise the LBERROR
trap whenever the ingress RIF equals the egress RIF.

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Reported-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Elad Raz c20b80187a mlxsw: spectrum: Add missing packet traps
Add the following traps:

1) MTU Error: Trap packets whose size is bigger than the egress RIF's
MTU. If DF bit isn't set, traffic will continue to be routed in slow
path.

2) TTL Error: Trap packets whose TTL expired. This allows traceroute to
work properly.

3) OSPF packets.

Fixes: 7b27ce7bb9 ("mlxsw: spectrum: Add traps needed for router implementation")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel 2f25844c23 mlxsw: spectrum: Mark port as active before registering it
Commit bbf2a4757b ("mlxsw: spectrum: Initialize ports at the end of
init sequence") moved ports initialization to the end of the init
sequence, which means ports are the first to be removed during fini.

Since the FDB delayed work is still active when ports are removed it's
possible for it to process FDB notifications of inactive ports,
resulting in a warning message.

Fix that by marking ports as inactive only after unregistering them. The
NETDEV_UNREGISTER event will invoke bridge's driver port removal
sequence that will cause the FDB (and FDB notifications) to be flushed.

Fixes: bbf2a4757b ("mlxsw: spectrum: Initialize ports at the end of init sequence")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel 05978481e7 mlxsw: spectrum: Create PVID vPort before registering netdevice
After registering a netdevice it's possible for user space applications
to configure an IP address on it. From the driver's perspective, this
means a router interface (RIF) should be created for the PVID vPort.

Therefore, we must create the PVID vPort before registering the
netdevice.

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel fa66d7e3fe mlxsw: spectrum: Remove redundant errors from the code
Currently, when device configuration fails we emit errors to the kernel
log despite the fact we already get these from the EMAD transaction
layer, so remove them.

In addition to being unnecessary, removing these error messages will
allow us to reuse mlxsw_sp_port_add_vid() to create the PVID vPort
before registering the netdevice.

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel 7a35583ec5 mlxsw: spectrum: Don't return upon error in removal path
When removing a VLAN filter from the device we shouldn't return upon the
first error we encounter, as otherwise we'll have resources that will
never be freed nor used.

Instead, we should keep trying to free as much resources as possible in
a best effort mode.

Remove the error message as well, since we already get these from the
EMAD transaction code.

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ilya Lesokhin 575ddf5888 net/mlx5: Introduce alloc_encap and dealloc_encap commands
Implement low-level commands to support vxlan encapsulation.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:46:01 +03:00
Hadar Hen Zion 9def7121be net/mlx5: Enable setting minimum inline header mode for VFs
Implement the low-level part of the PF side in setting minimum
inline header mode for VFs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:45:59 +03:00
Saeed Mahameed 2974ab6e8b net/mlx5: Improve driver log messages
Remove duplicate pci dev name printing in mlx5_core_err.
Use mlx5_core_{warn,info,err} where possible to have the pci info in the
driver log messages.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Parvi Kaustubhi <parvik@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:45:59 +03:00
Saeed Mahameed c4f287c4a6 net/mlx5: Unify and improve command interface
Now as all commands use mlx5 ifc interface, instead of doing two calls
for executing a command we embed command status checking into
mlx5_cmd_exec to simplify the interface.

Also we do here some cleanup for redundant software structures
(inbox/outbox) and functions and improved command failure output.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:45:58 +03:00
Saeed Mahameed 1a412fb1ca {net,IB}/mlx5: Modify QP commands via mlx5 ifc
Prior to this patch we assumed that modify QP commands have the
same layout.

In ConnectX-4 for each QP transition there is a specific command
and their layout can vary.

e.g: 2err/2rst commands don't have QP context in their layout and before
this patch we posted the QP context in those commands.

Fortunately the FW only checks the suffix of the commands and executes
them, while ignoring all invalid data sent after the valid command
layout.

This patch removes mlx5_modify_qp_mbox_in and changes
mlx5_core_qp_modify to receive the required transition and QP context
with opt_param_mask if needed.  This way the caller is not required to
provide the command inbox layout and it will be generated automatically.

mlx5_core_qp_modify will generate the command inbox/outbox layouts
according to the requested transition and will fill the requested
parameters.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:45:58 +03:00
Saeed Mahameed 09a7d9eca1 {net,IB}/mlx5: QP/XRCD commands via mlx5 ifc
Remove old representation of manually created QP/XRCD commands layout
amd use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:45:57 +03:00
Vincent eb8fc32354 mlxsw: spectrum_router: Fix use after free
In mlxsw_sp_router_fib4_add_info_destroy(), the fib_entry pointer is used
after it has been freed by mlxsw_sp_fib_entry_destroy(). Use a temporary
variable to fix this.

Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-14 21:32:05 -07:00
Saeed Mahameed ec22eb5310 {net,IB}/mlx5: MKey/PSV commands via mlx5 ifc
Remove old representation of manually created MKey/PSV commands layout,
and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:18 +03:00
Saeed Mahameed 2782778663 {net,IB}/mlx5: CQ commands via mlx5 ifc
Remove old representation of manually created CQ commands layout,
and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:15 +03:00
Saeed Mahameed 73b626c182 net/mlx5: EQ commands via mlx5 ifc
Remove old representation of manually created EQ commands layout,
and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:12 +03:00
Saeed Mahameed a533ed5e17 net/mlx5: Pages management commands via mlx5 ifc
Remove old representation of manually created Pages management
commands layout, and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:08 +03:00
Saeed Mahameed 20bb566bda net/mlx5: MCG commands via mlx5 ifc
Remove old representation of manually created MCG commands layout
and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:04 +03:00
Saeed Mahameed 732ef5ad8f net/mlx5: PD and UAR commands via mlx5 ifc
Remove old representation of manually created PD/UAR commands layouts
and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:00 +03:00
Saeed Mahameed 20ed51c643 net/mlx5: Access register and MAD IFC commands via mlx5 ifc
Remove old representation of manually created ACCESS_REG/MAD_IFC
commands layout and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:38:57 +03:00
Saeed Mahameed 04ed5ad5db net/mlx5: Init/Teardown hca commands via mlx5 ifc
Remove old representation of manually created Init/Teardown hca
commands layout and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:38:38 +03:00
Ido Schimmel 4de34eb574 mlxsw: spectrum: Add missing DCB rollback in error path
We correctly execute mlxsw_sp_port_dcb_fini() when port is removed, but
I missed its rollback in the error path of port creation, so add it.

Fixes: f00817df2b ("mlxsw: spectrum: Introduce support for Data Center Bridging (DCB)")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 12:57:27 -07:00
Ido Schimmel 07d50cae06 mlxsw: spectrum: Do not override PAUSE settings
The PFCC register is used to configure both PAUSE and PFC frames.
Therefore, when PFC frames are disabled we must make sure we don't
mistakenly also disable PAUSE frames (which might be enabled).

Fix this by packing the PFCC register with the current PAUSE settings.

Note that this register is also accessed via ethtool ops, but there we
are guaranteed to have PFC disabled.

Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 12:57:27 -07:00
Ido Schimmel b489a2000f mlxsw: spectrum: Do not assume PAUSE frames are disabled
When ieee_setpfc() gets called, PAUSE frames are not necessarily
disabled on the port.

Check if PAUSE frames are disabled or enabled and configure the port's
headroom buffer accordingly.

Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 12:57:27 -07:00
Linus Torvalds 0cda611386 Round one of 4.8 code
- Updates/fixes for iw_cxgb4 driver
 - Updates/fixes for mlx5 driver
 - Add flow steering and RSS API
 - Add hardware stats to mlx4 and mlx5 drivers
 - Add firmware version API for RDMA driver use
 - Add the rxe driver (this is a software RoCE driver that makes any
   Ethernet device a RoCE device)
 - Fixes for i40iw driver
 - Support for send only multicast joins in the cma layer
 - Other minor fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXo1vCAAoJELgmozMOVy/d0HcQAJqMi7siD9cSaMViYbu812pq
 3kNkHZbLNB/947uShDPhhFAWFXU0nRxEnTNSvYxRo+nxnDE/9hEEXpx8OzzKLNU+
 GXyDeHsEEriSFcaSne5Tak/QuiFm3PJv73ttXQROCtHG7KxLG9ieVbfusz42Xwiu
 5R21qfp6PZEOC+j7L/fTZh/kEN3cfaDYrGnCgmU3z0ka9xG5Qe2/+uWGNkuioRA5
 phFUR4MS+1n/VrnxPHrLXTrqv3sw8YfCfRImaXSBrxFVMqhno+cDDtEJQCRnmNrq
 7KcJO2KqDMl/QqsjxdwqojNpUTh2t7SeOeQuzUsfXl15yyyetq2Zu7ZurkCGjNtQ
 NtTt6hv5eXq3mNuBmOPKYDDgakSYyYjS0zueoi8wFFqIeSYxRJv4wx4xoeJ/Bsz8
 2LplpaPMQaTM65FhzYXGhYNBKaRkqjL9ihbIl1OcLNvfXAqLElfONM17/Yc/hgVw
 xfDtvNFrZcl7/exIpBBNOnxwbs4h78vvXsXoBiVoN7V/hBnMzDhkiBHNxNCfZXA0
 REGs/cnyy6cpiJOnVCWs77NqL75oK/qb1mEwe1M+A2kaxe/tLixUdYXo/zclDPm8
 3DLTL9lCgJIBIEiZT4q/alxLK+yUKD+SHtQT3lmF2Bfsmv/I38Uy55SXAiFO4yOq
 kwy96TvYtT43SkyNmmBf
 =oZOO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull base rdma updates from Doug Ledford:
 "Round one of 4.8 code: while this is mostly normal, there is a new
  driver in here (the driver was hosted outside the kernel for several
  years and is actually a fairly mature and well coded driver).  It
  amounts to 13,000 of the 16,000 lines of added code in here.

  Summary:

   - Updates/fixes for iw_cxgb4 driver
   - Updates/fixes for mlx5 driver
   - Add flow steering and RSS API
   - Add hardware stats to mlx4 and mlx5 drivers
   - Add firmware version API for RDMA driver use
   - Add the rxe driver (this is a software RoCE driver that makes any
     Ethernet device a RoCE device)
   - Fixes for i40iw driver
   - Support for send only multicast joins in the cma layer
   - Other minor fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
  Soft RoCE driver
  IB/core: Support for CMA multicast join flags
  IB/sa: Add cached attribute containing SM information to SA port
  IB/uverbs: Fix race between uverbs_close and remove_one
  IB/mthca: Clean up error unwind flow in mthca_reset()
  IB/mthca: NULL arg to pci_dev_put is OK
  IB/hfi1: NULL arg to sc_return_credits is OK
  IB/mlx4: Add diagnostic hardware counters
  net/mlx4: Query performance and diagnostics counters
  net/mlx4: Add diagnostic counters capability bit
  Use smaller 512 byte messages for portmapper messages
  IB/ipoib: Report SG feature regardless of HW UD CSUM capability
  IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
  IB/hfi1: Disable by default
  IB/rdmavt: Disable by default
  IB/mlx5: Fix port counter ID association to QP offset
  IB/mlx5: Fix iteration overrun in GSI qps
  i40iw: Add NULL check for puda buffer
  i40iw: Change dup_ack_thresh to u8
  i40iw: Remove unnecessary check for moving CQ head
  ...
2016-08-04 20:10:31 -04:00
Doug Ledford 7f1d25b47d Merge branches 'misc' and 'rxe' into k.o/for-4.8-1 2016-08-04 11:13:47 -04:00
Mark Bloch bfaf31687c net/mlx4: Query performance and diagnostics counters
Add a function to query diagnostics counters from the firmware.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:34 -04:00
Mark Bloch c7c122ed67 net/mlx4: Add diagnostic counters capability bit
Add a bit that indicates if the firmware supports per port
diagnostic counters.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:33 -04:00
Linus Torvalds 731c7d3a20 Merge tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux
Merge drm updates from Dave Airlie:
 "This is the main drm pull request for 4.8.

  I'm down with a cold at the moment so hopefully this isn't in too bad
  a state, I finished pulling stuff last week mostly (nouveau fixes just
  went in today), so only this message should be influenced by illness.
  Apologies to anyone who's major feature I missed :-)

  Core:
        Lockless GEM BO freeing
        Non-blocking atomic work
        Documentation changes (rst/sphinx)
        Prep for new fencing changes
        Simple display helpers
        Master/auth changes
        Register/unregister rework
        Loads of trivial patches/fixes.

  New stuff:
        ARM Mali display driver (not the 3D chip)
        sii902x RGB->HDMI bridge

  Panel:
        Support for new panels
        Improved backlight support

  Bridge:
        Convert ADV7511 to bridge driver
        ADV7533 support
        TC358767 (DSI/DPI to eDP) encoder chip support

  i915:
        BXT support enabled by default
        GVT-g infrastructure
        GuC command submission and fixes
        BXT workarounds
        SKL/BKL workarounds
        Demidlayering device registration
        Thundering herd fixes
        Missing pci ids
        Atomic updates

  amdgpu/radeon:
        ATPX improvements for better dGPU power control on PX systems
        New power features for CZ/BR/ST
        Pipelined BO moves and evictions in TTM
        GPU scheduler improvements
        GPU reset improvements
        Overclocking on dGPUs with amdgpu
        Polaris powermanagement enabled

  nouveau:
        GK20A/GM20B volt and clock improvements.
        Initial support for GP100/GP104 GPUs, GP104 will not yet support
        acceleration due to NVIDIA having not released firmware for them as of yet.

  exynos:
        Exynos5433 SoC with IOMMU support.

  vc4:
        Shader validation for branching

  imx-drm:
        Atomic mode setting conversion
        Reworked DMFC FIFO allocation
        External bridge support

  analogix-dp:
        RK3399 eDP support
        Lots of fixes.

  rockchip:
        Lots of small fixes.

  msm:
        DT bindings cleanups
        Shrinker and madvise support
        ASoC HDMI codec support

  tegra:
        Host1x driver cleanups
        SOR reworking for DP support
        Runtime PM support

  omapdrm:
        PLL enhancements
        Header refactoring
        Gamma table support

  arcgpu:
        Simulator support

  virtio-gpu:
        Atomic modesetting fixes.

  rcar-du:
        Misc fixes.

  mediatek:
        MT8173 HDMI support

  sti:
        ASOC HDMI codec support
        Minor fixes

  fsl-dcu:
        Suspend/resume support
        Bridge support

  amdkfd:
        Minor fixes.

  etnaviv:
        Enable GPU clock gating

  hisilicon:
        Vblank and other fixes"

* tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux: (1575 commits)
  drm/nouveau/gr/nv3x: fix instobj write offsets in gr setup
  drm/nouveau/acpi: fix lockup with PCIe runtime PM
  drm/nouveau/acpi: check for function 0x1B before using it
  drm/nouveau/acpi: return supported DSM functions
  drm/nouveau/acpi: ensure matching ACPI handle and supported functions
  drm/nouveau/fbcon: fix font width not divisible by 8
  drm/amd/powerplay: remove enable_clock_power_gatings_tasks from initialize and resume events
  drm/amd/powerplay: move clockgating to after ungating power in pp for uvd/vce
  drm/amdgpu: add query device id and revision id into system info entry at CGS
  drm/amdgpu: add new definition in bif header
  drm/amd/powerplay: rename smum header guards
  drm/amdgpu: enable UVD context buffer for older HW
  drm/amdgpu: fix default UVD context size
  drm/amdgpu: fix incorrect type of info_id
  drm/amdgpu: make amdgpu_cgs_call_acpi_method as static
  drm/amdgpu: comment out unused defaults_staturn_pro static const structure to fix the build
  drm/amdgpu: enable UVD VM only on polaris
  drm/amdgpu: increase timeout of IB test
  drm/amdgpu: add destroy session when generate VCE destroy msg.
  drm/amd: fix deadlock of job_list_lock V2
  ...
2016-08-01 21:44:08 -04:00
Bhaktipriya Shridhar 0a91605cda net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
The workqueue health->wq was used as per device private health thread.
This was done to perform delayed work.

The workqueue has a single workitem(&health->work) and
hence doesn't require ordering. It is involved in handling the health of
the device and is not being used on a memory reclaim path.
Hence, the singlethreaded workqueue has been replaced with the use of
system_wq.

Work item has been flushed in mlx5_health_cleanup() to ensure that
there are no pending tasks while disconnecting the driver.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26 15:18:56 -07:00
Dave Airlie 5e580523d9 Linux 4.7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXlRXSAAoJEHm+PkMAQRiGG/gH/0Z8O4zWOsrwO+X1mRToRDBH
 joFOjAmCVe83T1VpF5LYNB+9+owL/dEDt6+ZIswnhH7AfQPjs4RqwS4PcuMbCDVO
 +mDm0PmfcKaYcQZrB2Z2OwIzRNnfCTVcsDPhIHwuIHk0m4z/xuGZonD8KoAj0+tO
 3yJF6sbE1KubDVjOb+lmZZSP3cXA0pDXrNhkYhE4Tsr8fiihGjeXSNJ8t2zPLjxo
 W3MPqo0rzDvQsOwoF4TWHHagVaFSJlhLBBgqu33fI7uO3jtfQD2G8wG68JCND1j3
 qbMoBfTLFV/yQmSIJUt0Wv1axaCcwnjpweEB35A/GEeZ0mNB1rDdoBeI1eKEQkc=
 =DGFC
 -----END PGP SIGNATURE-----

Backmerge tag 'v4.7' into drm-next

Linux 4.7

As requested by Daniel Vetter as the conflicts were getting messy.
2016-07-26 17:26:29 +10:00
Alex Vesker 9b022a6e0f net/mlx4_core: Check device state before unregistering it
Verify that the device state is registered before un-registering it.
This check is required to prevent an OOPS on flows that do
re-registration of the device and its previous state was
unregistered.

Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 18:00:25 -07:00
Ido Schimmel 86cb13e4ec mlxsw: spectrum: Fix compilation error when CLS_ACT isn't set
When CONFIG_NET_CLS_ACT isn't set 'struct tcf_exts' has no member named
'actions' and we therefore must not access it. Otherwise compilation
fails.

Fix this by introducing a new macro similar to tc_no_actions(), which
always returns 'false' if CONFIG_NET_CLS_ACT isn't set.

Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 17:57:33 -07:00
Hadar Hen Zion cff92d7c7e net/mlx5e: Query minimum required header copy during xmit
Add support for query the minimum inline mode from the Firmware.
It is required for correct TX steering according to L3/L4 packet
headers.

Each send queue (SQ) has inline mode that defines the minimal required
headers that needs to be copied into the SQ WQE.
The driver asks the Firmware for the wqe_inline_mode device capability
value.  In case the device capability defined as "vport context" the
driver must check the reported min inline mode from the vport context
before creating its SQs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 17:53:40 -07:00
Hadar Hen Zion ae76715d15 net/mlx5e: Check the minimum inline header mode before xmit
Each send queue (SQ) has inline mode that defines the minimal required
inline headers in the SQ WQE.
Before sending each packet check that the minimum required headers
on the WQE are copied.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 17:53:40 -07:00
Yotam Gigi 763b4b70af mlxsw: spectrum: Add support in matchall mirror TC offloading
This patch offloads port mirroring directives to hw using the matchall TC
with action mirror. It includes both the implementation of the
ndo_setup_tc function for the spectrum driver and the spectrum hardware
offload configuration code.

The hardware offload code is basically two new functions which are capable
of adding and removing a new mirror ports pair. It is done using the MPAT,
MPAR and SBIB registers:
 - A new Switch-Port Analyzer (SPAN) entry is added using MPAT to the 'to'
   port.
 - The 'to' port is bound to the SPAN entry using MPAR register.
 - In case of egress SPAN, the 'to' port gets a new internal shared
   buffer using SBIB register.

In addition, a new database was added to the mlxsw_sp struct to store all
the SPAN entries and their bound ports list. The number of supported SPAN
entries is determined by resource query.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:12:00 -07:00
Yotam Gigi 230190548b mlxsw: reg: Add the Monitoring Port Analyzer register
The MPAR register is used to bind ports to a SPAN entry (which was
created using MPAT register) and thus mirror their traffic (ingress /
egress) to a different port.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
Yotam Gigi 43a4685620 mlxsw: reg: Add Monitoring Port Analyzer Table register
The MPAT register is used to query and configure the Switch Port Analyzer
(SPAN) table. This register is used to configure a port as a mirror output
port, while after that a mirrored input port can be bound using MPAR
register.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
Yotam Gigi 51ae8cc662 mlxsw: reg: Add Shared Buffer Internal Buffer register
The SBIB register configures per port buffer for internal use. This
register is used to configure an egress mirror buffer on the egress port
which does the mirroring.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
Nogah Frankel ded821c8d3 mlxsw: pci: Add max span resources to resources query
Add max span resources to resources query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
Nogah Frankel 57d316ba20 mlxsw: pci: Add resources query implementation.
Add resources query implementation. If exists, query the HW for its
builtin resources instead of having them as consts in the code.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:58 -07:00
David S. Miller de0ba9a0d8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just several instances of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 00:53:32 -04:00
Brenden Blanco cb7386d37e net/mlx4_en: use READ_ONCE when freeing xdp_prog
For consistency, and in order to hint at the synchronous nature of the
xdp_prog field, use READ_ONCE in the destroy path of the ring. All
occurrences should now use either READ_ONCE or xchg.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 22:07:23 -07:00
Saeed Mahameed 882b0f2fba net/mlx5e: Fix del vxlan port command buffer memset
memset the command buffers rather than the pointers to them.

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 15:29:50 -07:00
Ido Schimmel df4750e84e mlxsw: spectrum: Expose per-tc counters via ethtool
Expose the transmit queue length of each traffic class and the amount of
unicast packets discarded due to insufficient room in the shared buffer.

The first counter allows us to debug user priority to traffic class
mapping, whereas the drop counter is useful when determining shared buffer
configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 14:53:56 -07:00
Ido Schimmel 7ed674bc3c mlxsw: spectrum: Expose per-priority counters via ethtool
Expose per-priority bytes / packets / PFC packets counters via ethtool.

These counters are very useful when debugging QoS functionality and
provide a better insight into the device's forwarding plane.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 14:53:56 -07:00
Wei Yongjun 44fafdaa75 net/mlx5: Use PTR_ERR_OR_ZERO() to simplify the code
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR.

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 14:46:00 -07:00
Brenden Blanco 9ecc2d8617 net/mlx4_en: add xdp forwarding and data write support
A user will now be able to loop packets back out of the same port using
a bpf program attached to xdp hook. Updates to the packet contents from
the bpf program is also supported.

For the packet write feature to work, the rx buffers are now mapped as
bidirectional when the page is allocated. This occurs only when the xdp
hook is active.

When the program returns a TX action, enqueue the packet directly to a
dedicated tx ring, so as to avoid completely any locking. This requires
the tx ring to be allocated 1:1 for each rx ring, as well as the tx
completion running in the same softirq.

Upon tx completion, this dedicated tx ring recycles pages without
unmapping directly back to the original rx ring. In steady state tx/drop
workload, effectively 0 page allocs/frees will occur.

In order to separate out the paths between free and recycle, a
free_tx_desc func pointer is introduced that is optionally updated
whenever recycle_ring is activated. By default the original free
function is always initialized.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:33 -07:00
Brenden Blanco 224e92e02a net/mlx4_en: break out tx_desc write into separate function
In preparation for writing the tx descriptor from multiple functions,
create a helper for both normal and blueflame access.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:33 -07:00
Brenden Blanco d576acf0a2 net/mlx4_en: add page recycle to prepare rx ring for tx support
The mlx4 driver by default allocates order-3 pages for the ring to
consume in multiple fragments. When the device has an xdp program, this
behavior will prevent tx actions since the page must be re-mapped in
TODEVICE mode, which cannot be done if the page is still shared.

Start by making the allocator configurable based on whether xdp is
running, such that order-0 pages are always used and never shared.

Since this will stress the page allocator, add a simple page cache to
each rx ring. Pages in the cache are left dma-mapped, and in drop-only
stress tests the page allocator is eliminated from the perf report.

Note that setting an xdp program will now require the rings to be
reconfigured.

Before:
 26.91%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_process_rx_cq
 17.88%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_alloc_frags
  6.00%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_free_frag
  4.49%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
  3.21%  swapper      [kernel.vmlinux]  [k] intel_idle
  2.73%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
  2.57%  swapper      [mlx4_en]         [k] mlx4_en_process_rx_cq

After:
 31.72%  swapper      [kernel.vmlinux]       [k] intel_idle
  8.79%  swapper      [mlx4_en]              [k] mlx4_en_process_rx_cq
  7.54%  swapper      [kernel.vmlinux]       [k] poll_idle
  6.36%  swapper      [mlx4_core]            [k] mlx4_eq_int
  4.21%  swapper      [kernel.vmlinux]       [k] tasklet_action
  4.03%  swapper      [kernel.vmlinux]       [k] cpuidle_enter_state
  3.43%  swapper      [mlx4_en]              [k] mlx4_en_prepare_rx_desc
  2.18%  swapper      [kernel.vmlinux]       [k] native_irq_return_iret
  1.37%  swapper      [kernel.vmlinux]       [k] menu_select
  1.09%  swapper      [kernel.vmlinux]       [k] bpf_map_lookup_elem

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:32 -07:00
Brenden Blanco 47a38e1550 net/mlx4_en: add support for fast rx drop bpf program
Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.

In tc/socket bpf programs, helpers linearize skb fragments as needed
when the program touches the packet data. However, in the pursuit of
speed, XDP programs will not be allowed to use these slower functions,
especially if it involves allocating an skb.

Therefore, disallow MTU settings that would produce a multi-fragment
packet that XDP programs would fail to access. Future enhancements could
be done to increase the allowable MTU.

The xdp program is present as a per-ring data structure, but as of yet
it is not possible to set at that granularity through any ndo.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:32 -07:00
Eugenia Emantayev ec25bc04ed net/mlx4_en: Add resilience in low memory systems
This patch fixes the lost of Ethernet port on low memory system,
when driver frees its resources and fails to allocate new resources.
Issue could happen while changing number of channels, rings size or
changing the timestamp configuration.
This fix is necessary because of removing vmap use in the code.
When vmap was in use driver could allocate non-contiguous memory
and make it contiguous with vmap. Now it could fail to allocate
a large chunk of contiguous memory and lose the port.
Current code tries to allocate new resources and then upon success
frees the old resources.

Fixes: 73898db043 ('net/mlx4: Avoid wrong virtual mappings')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 16:44:11 -07:00
Eugenia Emantayev 30f56e3ced net/mlx4_en: Move filters cleanup to a proper location
Filters cleanup should be done once before destroying net device,
since filters list is contained in the private data.

Fixes: 1eb8c695bd ('net/mlx4_en: Add accelerated RFS support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 16:44:11 -07:00
Ido Schimmel 11719a58bd mlxsw: spectrum: Prevent invalid ingress buffer mapping
Packets entering the switch are mapped to a Switch Priority (SP)
according to their PCP value (untagged frames are mapped to SP 0).

The packets are classified to a priority group (PG) buffer in the port's
headroom according to their SP.

The switch maintains another mapping (SP to IEEE priority), which is
used to generate PFC frames for lossless PGs. This mapping is
initialized to IEEE = SP % 8.

Therefore, when mapping SP 'x' to PG 'y' we create a situation in which
an IEEE priority is mapped to two different PGs:

IEEE 'x' ---> SP 'x' ---> PG 'y'
IEEE 'x' ---> SP 'x + 8' ---> PG '0' (default)

Which is invalid, as a flow can use only one PG buffer.

Fix this by mapping both SP 'x' and 'x + 8' to the same PG buffer.

Fixes: 8e8dfe9fdf ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:49:51 -07:00
Ido Schimmel 28f5275e4a mlxsw: spectrum: Prevent overwrite of DCB capability fields
The number of supported traffic classes that can have ETS and PFC
simultaneously enabled is not subject to user configuration, so make
sure we always initialize them to the correct values following a set
operation.

Fixes: 8e8dfe9fdf ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:49:51 -07:00
Ido Schimmel 7347180dca mlxsw: spectrum: Don't emit errors when PFC is disabled
We can't have PAUSE frames and PFC both enabled on the same port, but
the fact that ieee_setpfc() was called doesn't necessarily mean PFC is
enabled.

Only emit errors when PAUSE frames and PFC are enabled simultaneously.

Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:49:51 -07:00
Ido Schimmel c3f1576810 mlxsw: spectrum: Indicate support for autonegotiation
The device supports link autonegotiation, so let the user know about it
by indicating support via ethtool ops.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:49:50 -07:00
Ido Schimmel 6277d46b10 mlxsw: spectrum: Force link training according to admin state
When setting a new speed we need to disable and enable the port for the
changes to take effect. We currently only do that if the operational
state of the port is up. However, setting a new speed following link
training failure will require us to explicitly set the port down and then
up.

Instead, disable and enable the port based on its administrative state.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:49:50 -07:00
Christophe Jaillet a1e3e7372c mlxsw: spectrum_router: Return -ENOENT in case of error
'vr' should be a valid pointer here, so returning 'PTR_ERR(vr)' is wrong.
Return an explicit error code (-ENOENT) instead.

Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 22:14:54 -07:00
Or Gerlitz d957b4e383 net/mlx5e: Add TC offload support for the VF representors netdevice
The VF representors support only TC filter/action offloads
(not mqprio) and this is enabled for them by default.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:29 -07:00
Or Gerlitz adb4c123f8 net/mlx5e: Add TC HW support for FDB (SRIOV e-switch) offloads
Enhance the TC offload code such that when the eswitch exists and it's
mode being SRIOV offloads, we do TC actions parsing and setup targeted
for eswitch. Next, we add the offloaded flow to the HW e-switch (fdb).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:29 -07:00
Or Gerlitz 03a9d11e6e net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads
Add the setup code that parses the TC actions needed to support offloading drop
and mirred/redirect for SRIOV e-switch. We can redirect between two devices if
they belong to the same HW switch, compare the switchdev HW ID attribute to
enforce that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:29 -07:00
Or Gerlitz 5c40348c69 net/mlx5e: Adjustments in the TC offload code towards reuse for SRIOV
Towards reusing the TC offloads code for an SRIOV use-case, change some of the
helper functions to have _nic in their names so it's clear what's NIC unique
and what's general. Also group together the NIC related helpers so we can easily
branch per the use-case in downstream patch.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:29 -07:00
Or Gerlitz 3d80d1a2f5 net/mlx5: E-Switch, Add API to configure rules for the offloaded mode
This allows for upper levels in the driver, e.g the TC offload code to add
e-switch offloaded steering rules. The caller provides the rule spec for
matching, action, source and destination vports.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Or Gerlitz 1033665e63 net/mlx5: E-Switch, Use two priorities for SRIOV offloads mode
In the offloads mode, some slow path rules are added by the driver (e.g
send-to-vport), while offloaded rules are to be added from upper layers.

The slow path rules have lower priority and we don't want matching on
offloaded rules to suffer from extra steering hops related to the slow
path rules.

We use two priorities, one for offloaded rules (fast path), and one for
the control rules (slow path). To allow for that, we enable two priorities
for the FDB namespace in the FS core code.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Or Gerlitz 5513028787 net/mlx5e: Offload TC flow counters only when supported
Currenly, the code that programs the flow actions into the firmware
doesn't check if was actually asked to offload the statistics, fix that.

Fixes: aad7e08d39 ('net/mlx5e: Hardware offloaded flower filter statistics support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Amir Vadai a351a1b03b net/mlx5: Introduce bulk reading of flow counters
This commit utilize the ability of ConnectX-4 to bulk read flow counters.
Few bulk counter queries could be done instead of issuing thousands of
firmware commands per second to get statistics of all flows set to HW,
such as those programmed when we offload tc filters.

Counters are stored sorted by hardware id, and queried in blocks (id +
number of counters).

Due to hardware requirement, start of block and number of counters in a
block must be four aligned.

Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Amir Vadai 29cc667907 net/mlx5: Store counters in rbtree instead of list
In order to use bulk counters, we need to have counters sorted by id.

Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Mohamad Haj Yahia c3b7c5c950 net/mlx5e: start/stop all tx queues upon open/close netdev
Start all tx queues (including inactive ones) when opening the netdev.
Stop all tx queues (including inactive ones) when closing the netdev.

This is a workaround for the tx timeout watchdog false alarm issue in
which the netdev watchdog is polling all the tx queues which may include
inactive queues and thus once lowering the real tx queues number
(ethtool -L) it will generate tx timeout watchdog false alarms.

Fixes: 3947ca1859 ('net/mlx5e: Implement ndo_tx_timeout callback')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13 11:38:16 -07:00
Daniel Jurgens 2c1ccc9937 net/mlx5e: Fix TX Timeout to detect queues stuck on BQL
Change netif_tx_queue_stopped to netif_xmit_stopped.  This will show
when queues are stopped due to byte queue limits.

Fixes: 3947ca1859 ('net/mlx5e: Implement ndo_tx_timeout callback')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13 11:38:16 -07:00
Jiri Pirko b38a75d2d3 mlxsw: core: Trace EMAD messages
Trace EMAD messages going down to HW and up from HW. Devlink needs to be
registered before EMAD init so the trace function can be called
with valid devlink handle.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
v1->v2:
- Use trace_devlink_hwmsg directly
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-12 14:20:18 -07:00
David S. Miller 30d0844bdc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx5/core/en.h
	drivers/net/ethernet/mellanox/mlx5/core/en_main.c
	drivers/net/usb/r8152.c

All three conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 10:35:22 -07:00
Or Gerlitz eae033c1b8 net/mlx5: Avoid setting unused var when modifying vport node GUID
GCC complains on unused-but-set-variable, clean this up.

Fixes: 23898c763f ('net/mlx5: E-Switch, Modify node guid on vf set MAC')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 11:52:42 -07:00
Yotam Gigi 0b2361d9d9 mlxsw: Add the unresolved next-hops probes
Now, the driver sends arp probes for all unresolved neighbours that are
currently a nexthop for some route on the system. The job is set
periodically every 5 seconds.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:31 -07:00
Yotam Gigi b2157149b0 mlxsw: spectrum_router: Add the nexthop neigh activity update
For nexthop neighbours we need to make kernel to think there is a traffic
flowing to them preventing it from going to stale state. Otherwise
kernel would stale it and eventually the neigh would be removed from HW
and nexthop as well. That would reduce ECMP group in HW.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Jiri Pirko a7ff87acd9 mlxsw: spectrum_router: Implement next-hop routing
Implement next-hop routing offload including ECMP. To make it possible,
introduce next-hop group entity. This entity keeps track of resolved
neighbours and updates HW adjacency table accordingly. Note that HW
next-hops are stored in this adjacency table, in form of MAC.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Jiri Pirko a59f0b312a mlxsw: reg: Add Router Algorithmic LPM ECMP Update Register
The RALEU register is used to mass update remote action adjacency index
and ecmp size.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Yotam Gigi 089f981683 mlxsw: reg: Add Router Adjacency Table register
The RATR register is used to configure the Router Adjacency (next-hop)
Table.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Jiri Pirko b090ef0686 mlxsw: Introduce simplistic KVD linear area manager
This is a very simple manager for KVD linear area. Currently, the
allocator will either allocate a single entry from pre-defined sub-area,
or in case more than one entry is needed, it will allocate 32-entry chunk
in other pre-defined sub-area.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Jiri Pirko c602242761 mlxsw: spectrum: Define sizes of KVD areas
Override the defaults and define the area sizes ourselves.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Jiri Pirko 489107bda1 mlxsw: Add KVD sizes configuration into profile
Up until now we only used hash-based tables in the device, but we are
going to use the linear table for remote routes adjacency lists.

Add the configuration fields that control the size of the linear table.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:29 -07:00
Yotam Gigi a6bf9e933d mlxsw: spectrum_router: Offload neighbours based on NUD state change
Listen to any NEIGH_UPDATE events sent and program the device
accordingly. If NUD state is VALID and neighbour isn't yet offloaded,
then program it into the device's table. Otherwise, just edit its
parameters.

If NUD state machine transitioned neighbour out of VALID state and it's
present in the device's table, then remove it.

Note that the device is programmed in delayed work, as the netevent
notification chain is atomic and prevents us from going to sleep.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:29 -07:00
Yotam Gigi c723c735fa mlxsw: spectrum_router: Periodically update the kernel's neigh table
As previously explained, the driver should periodically poll the device
for neighbours activity according to the configured DELAY_PROBE_TIME.
This will prevent active neighbours from staying in STALE state for long
periods of time.

During init configure the polling interval according to the
DELAY_PROBE_TIME used in the default table. In addition, register a
netevent notification block, so that the interval is updated whenever
DELAY_PROBE_TIME changes.

Using the computed interval schedule a delayed work, which will update
the kernel via neigh_event_send() on any active neighbour since the last
delayed work.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:29 -07:00
Yotam Gigi 7cf2c205d7 mlxsw: reg: Add Router Algorithmic LPM Unicast Host Table Dump register
The RAUHTD register allows dumping entries from the Router Unicast Host
Table.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:29 -07:00
Yotam Gigi 4457b3df3f mlxsw: reg: Add Router Algorithmic LPM Unicast Host Table register
The RAUHT register is used to configure and query the Unicast Host Table
in devices that implement the Algorithmic LPM. In other words, it is
used to configure neighbour entries in the device.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:28 -07:00