Commit Graph

6143 Commits

Author SHA1 Message Date
Jason Gunthorpe 449a224c10 Merge branch 'rdma_mmap' into rdma.git for-next
Jason Gunthorpe says:

====================
Upon review it turns out there are some long standing problems in BAR
mapping area:
 * BAR pages intended for read-only can be switched to writable via mprotect.
 * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page.
 * Disassociate causes SIGBUS when touching the pages.
 * CPU pages are being mapped through to the process via remap_pfn_range
   instead of the more appropriate vm_insert_page, causing weird behaviors
   during disassociation.

This series adds the missing VM_* flag manipulation, adds faulting a zero
page for disassociation and revises the CPU page mappings to use
vm_insert_page.
====================

For dependencies this branch is based on for-rc from
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

* branch 'rdma_mmap':
  RDMA: Remove rdma_user_mmap_page
  RDMA/mlx5: Use get_zeroed_page() for clock_info
  RDMA/ucontext: Fix regression with disassociate
  RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
  RDMA/mlx5: Do not allow the user to write to the clock page

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 16:20:34 -03:00
Jason Gunthorpe 4eb6ab13b9 RDMA: Remove rdma_user_mmap_page
Upon further research drivers that want this should simply call the core
function vm_insert_page(). The VMA holds a reference on the page and it
will be automatically freed when the last reference drops. No need for
disassociate to sequence the cleanup.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 16:18:36 -03:00
Jason Gunthorpe ddcdc368b1 RDMA/mlx5: Use get_zeroed_page() for clock_info
get_zeroed_page() returns a virtual address for the page which is better
than allocating a struct page and doing a permanent kmap on it.

Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 13:40:50 -03:00
Jason Gunthorpe d5e560d3f7 RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
Since mlx5 supports device disassociate it must use this API for all
BAR page mmaps, otherwise the pages can remain mapped after the device
is unplugged causing a system crash.

Cc: stable@vger.kernel.org
Fixes: 5f9794dc94 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-04-24 13:06:40 -03:00
Jason Gunthorpe c660133c33 RDMA/mlx5: Do not allow the user to write to the clock page
The intent of this VMA was to be read-only from user space, but the
VM_MAYWRITE masking was missed, so mprotect could make it writable.

Cc: stable@vger.kernel.org
Fixes: 5c99eaecb1 ("IB/mlx5: Mmap the HCA's clock info to user-space")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-04-24 13:06:24 -03:00
John Fleck 3c176c9d72 IB/hfi1: Remove reference to RHF.VCRCErr
The bit VCRCErr in the receive header flag is actually a
reserved field. Remove bit operations on this field.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:48:11 -03:00
Mike Marciniszyn a9c62e0078 IB/hfi1: Add selected Rcv counters
These counters are required for error analysis and debug.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:48:10 -03:00
Mike Marciniszyn d40f69c9b9 IB/{rdmavt, qib, hfi1}: Use new routine to release reference counts
The reference count adjustments on reference count completion
are open coded throughout.

Add a routine to do all reference count adjustments and use.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:49 -03:00
Mike Marciniszyn 715ab1a862 IB/rdmavt: Fix ab/ba include issues
The currently include file ordering for rdmavt headers has an
ab/ba include issue the precludes using inlines from rdma_vt.h
in rdmavt_qp.h.

At the heart of the issue is that rdma_vt.h includes rdmavt_qp.h.

Fix the ordering issue by adjusting rdma_vt.h to not require rdmavt_qp.h
and move qp related inlines to rdmavt_qp.h.

Additionally, promote rvt_mmap_info to rdma_vt.h since it is shared
by rdmavt_cq.h and rdmavt_qp.h.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:49 -03:00
Mike Marciniszyn 62644c1d2b IB/hfi1: Make opfn.h self sufficient
The opfn.h include file build-ablility depends on the including file
having the correct includes.

Fix by making opfn.h self sufficient.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:49 -03:00
Kaike Wan ea752bc5e5 IB/{rdmavt, hfi1): Miscellaneous comment fixes
This patch fixes miscellaneous comment errors.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:48 -03:00
Josh Collier 07c5ba9124 IB/hfi1: Add debugfs to control expansion ROM write protect
Some kernels now enable CONFIG_IO_STRICT_DEVMEM which prevents multiple
handles to PCI resource0. In order to continue to support expansion ROM
updates while the driver is loaded, the driver must now provide an
interface to control the expansion ROM write protection.

This patch adds an exprom_wp debugfs interface that allows the hfi1_eprom
user tool to disable the expansion ROM write protection by opening the
file and writing a '1'.  The write protection is released when writing a
'0' or automatically re-enabled when the file handle is closed.  The
current implementation will only allow one handle to be opened at a time
across all hfi1 devices.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:26:41 -03:00
Leon Romanovsky 5742582222 RDMA/hns: Remove asynchronic QP destroy
Verbs destroy callbacks are synchronous operations and can't be delayed.
The expectation is that after driver returned from destroy function, the
memory can be freed and user won't be able to access it again.

Ditch workqueue implementation used in HNS driver.

Fixes: d838c481e0 ("IB/hns: Fix the bug when destroy qp")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: oulijun <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 10:55:31 -03:00
Saeed Mahameed c3bdd5e651 Linux 5.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlyOup0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGHKoIAIKVuBSyD+m65TaM
 pjoAFa56weEc67Mmai2A84EOm0MVy9C6L7EOcOgVsJiLxDCYyWQ7xYwV2kceKJpW
 H5xauhb3+TxpxYeaeKdPPPHmBdejRwOPYvGAfnDMCqCCWQTad52sQUPCLI+yhF1t
 wgnuMi+SwNBWP9aYCXdFPK4fVhh27AcEAOEsRVCh4tIBH/wkf4GwrDr3IX1MFeMX
 jE/R43la4hu1swcWBsjkErWUasVPCgJSSQTfKDo9PQTVnoh0PHFp4fkOInVKLymQ
 7AGo+Knc+1he+sFsB2IbZwea0xqtJtjtr1oC+at8gNx66qVG+o7UZNi5LR1uPW4Z
 4+dwGBk=
 =pyXR
 -----END PGP SIGNATURE-----

Merge tag 'v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next

Linux 5.1-rc1

We forgot to reset the branch last merge window thus mlx5-next is outdated
and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next
branch with 5.1-rc1.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-22 15:25:39 -07:00
Mark Bloch 5fb58c9e2f RDMA/mlx5: Don't create IB representors when in multiport RoCE mode
Switchdev mode and mutiport RoCE mode aren't compatible at this point.
Don't create IB reps when a user switches to switchdev mode and the driver
operates in that mode.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch d3b5cc1cd9 RDMA/mlx5: Initialize roce port info before multiport master init
When working in mutliport RoCE mode it is possible to attach a slave
before the master. In that case the slave is waiting for a master to be
attached.  When the master is attached it goes over the list of waiting
slaves, finds a slave that is compatible and tries to bind it to itself.

The call stack is:
mlx5_ib_init_multiport_master() -> mlx5_ib_bind_slave_port()

In the bind function we will create a netdev notifier, but this is done
before we initialize the RoCE structure (this is done at a later stage by
the master in the ROCE stage).

Once events are delivered to that notifier we will use
mlx5_ib_get_native_port_mdev() to get the actual port and as the native
port is zero we will access an invalid index in the port structure.

Move the RoCE structure initialization to an earlier stage.

Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch 7f575103b0 RDMA/mlx5: Allow DEVX and raw creation flow on reps
Remove the limitations that were in place and provide support for DEVX and
raw flow creation on reps.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Maor Gottlieb 56e5acd405 RDMA/mlx5: Add query e-switch vport context to devx white list
Add MLX5_OP_QUERY_ESW_VPORT_CONTEXT to devx white list. It will be allowed
only if HCA_CAP.eswitch_manager==1.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch 52438be441 RDMA/mlx5: Allow inserting a steering rule to the FDB
Allow this only via mlx5 raw create flow API, legacy verbs are not
supported. To accommodate that, we add a new attribute to matcher creation
to indicate the type of flow table to be used.
	MLX5_IB_ATTR_FLOW_MATCHER_FT_TYPE
With this new attribute MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS is no longer
needed, we keep it for compatibility but at most only a single attribute can
be passed of the two.

When inserting a flow rule to the FDB we require that a DEVX FT is
provided as a destination, no other configuration is allowed.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch 3b70508a6b RDMA/mlx5: Create flow table with max size supported
Instead of failing the request, just use the supported number of flow
entries.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch 13a4376568 RDMA/mlx5: Access the prio bypass inside the FDB flow table namespace
Now that we have a specific prio inside the FDB namespace allow retrieving
it from the RDMA side.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Chengguang Xu 2d95984977 infiniband/qib: Fix typo in comment
Fix typo 'faspath' -> 'pastpath'.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 14:08:15 -03:00
Colin Ian King ff5eefe6d3 RDMA/cxgb4: Fix spelling mistake "immedate" -> "immediate"
There is a spelling mistake in a module parameter description. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-18 03:17:42 -03:00
Guy Levi 7249c8ea22 IB/mlx5: Fix scatter to CQE in DCT QP creation
When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command
since it tried to treat it as as QP create mailbox command instead of a
DCT create command.

The corrupted mailbox command causes userspace to malfunction as the
device doesn't create the QP as expected.

A new mlx5 capability is exposed to user-space which ensures that it will
not enable the feature on DCT without this fix in the kernel.

Fixes: 5d6ff1babe ("IB/mlx5: Support scatter to CQE for DC transport type")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-18 03:13:41 -03:00
David S. Miller 6b0a7f84ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflict resolution of af_smc.c from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17 11:26:25 -07:00
Colin Ian King a6d2a5a92e RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure
Currently if alloc_skb fails to allocate the skb a null skb is passed to
t4_set_arp_err_handler and this ends up dereferencing the null skb.  Avoid
the NULL pointer dereference by checking for a NULL skb and returning
early.

Addresses-Coverity: ("Dereference null return")
Fixes: b38a0ad8ec ("RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-16 08:03:17 -03:00
Colin Ian King 1db86318c4 RDMA/mlx5: Check for error return in flow_rule rather than err
Currently when the call to create_flow_rule_vport_sq fails, the error
check is being performed on err rather than on the return pointer
flow_rule.  The return flow_rule maybe NULL (which is not considered an
error) or an error code, so check for the error on flow_rule.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: d5ed8ac34c ("RDMA/mlx5: Move default representors SQ steering to rule to modify QP")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-12 11:19:49 -03:00
Devesh Sharma 1c00d7bc96 RDMA/ocrdma: Remove use of idr use pci bdf instead
Removing the use of IDR variable just to name the function ids. Using the
PCI_FUNC(pdev->devfn) instead to create the device name, associated
resources and to print driver into at various places.

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-12 10:59:02 -03:00
Kaike Wan d737b25b1a IB/hfi1: Do not flush send queue in the TID RDMA second leg
When a QP is put into error state, the send queue will be flushed.
This mechanism is implemented in both the first and the second leg
of the send engine. Since the second leg is only responsible for
data transactions in the KDETH space for the TID RDMA WRITE request,
it should not perform the flushing of the send queue.

This patch removes the flushing function of the second leg, but
still keeps the bailing out of the QP if it is put into error state.

Fixes: 70dcb2e3dc ("IB/hfi1: Add the TID second leg send packet builder")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:09:30 -03:00
Mark Bloch fb652d3299 RDMA/mlx5: Remove VF representor profile
Now that we have a single IB device with multiple ports we can remove the
VF representor profile.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:40 -03:00
Mark Bloch 26628e2d58 RDMA/mlx5: Move to single device multiport ports in switchdev mode
Move from IB device (representor) per virtual function to single IB device
with port per virtual function (port 1 represents the uplink). As number
of ports is a static property of an IB device, declare the IB device with
as many port as the possible according to the PCI bus.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch a989ea01cb RDMA/mlx5: Move SMI caps logic
We store the SMI information in the core device's struct, make sure we set
that information only once (and not per port), while here make the for
loop based on the actual size of the array.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch 35b0aa67b2 RDMA/mlx5: Refactor netdev affinity code
The design of representors is such that once an IB representor is created,
the netdev of representor already exists, we can use that fact to simplify
the netdev affinity code.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch d5ed8ac34c RDMA/mlx5: Move default representors SQ steering to rule to modify QP
Currently the steering for SQs created on representors is done on
creation, once we move to representors as ports of an IB device we need
the port argument which is given only at the modify QP stage, adjust the
code appropriately.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch 6a4d00be08 RDMA/mlx5: Move rep into port struct
In preparation of moving into a model of single IB device multiple ports
move rep to be part of the port structure. We mark a representor device by
setting is_rep, no functional change with this patch.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch 5d8f6a0e92 RDMA/mlx5: Use correct size for device resources
On allocation we use the array size and on destruction num_ports, use the
array size of destruction as well, in this context the array corresponds
to the native/actual ports on the NIC so no need to adjust this logic for
representors.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch da796ccb3e RDMA/mlx5: Move ports allocation to outside of INIT stage
In downstream patches we will need access to the ports before doing any
stages, in order to set net device per representor.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch 4a6dc8552a RDMA/mlx5: Free IB device on remove
Simplify the code and move the deallocation of the IB device into the
remove function.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch 95579e785a RDMA/mlx5: Move netdev info into the port struct
Netdev info is stored in a separate array and holds data relevant on a per
port basis, move it to be part of the port struct.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Jason Gunthorpe 5331fa0db7 Merge branch 'mlx5-next' into rdma.git for-next
From
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies on the next series

* branch 'mlx5-next':
  net/mlx5: E-Switch, add a new prio to be used by the RDMA side
  net/mlx5: E-Switch, don't use hardcoded values for FDB prios
  net/mlx5: Fix false compilation warning
  net/mlx5: Expose MPEIN (Management PCIE INfo) register layout
  net/mlx5: Add rate limit print macros
  net/mlx5: Add explicit bar address field
  net/mlx5: Replace dev_err/warn/info by mlx5_core_err/warn/info
  net/mlx5: Use dev->priv.name instead of dev_name
  net/mlx5: Make mlx5_core messages independent from mdev->pdev
  net/mlx5: Break load_one into three stages
  net/mlx5: Function setup/teardown procedures
  net/mlx5: Move health and page alloc init to mdev_init
  net/mlx5: Split mdev init and pci init
  net/mlx5: Remove redundant init functions parameter
  net/mlx5: Remove spinlock support from mlx5_write64
  net/mlx5: Remove unused MLX5_*_DOORBELL_LOCK macros

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 14:59:27 -03:00
Steve Wise ab7efbe24b RDMA/cxgb4: Use ib_device_set_netdev()
cxgb4 has a simple non-dynamic use of get_netdev, so conversion is
straightforward.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-09 10:14:54 -03:00
Jason Gunthorpe 4b38da75e0 RDMA/drivers: Convert easy drivers to use ib_device_set_netdev()
Drivers that never change their ndev dynamically do not need to use
the get_netdev callback.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
2019-04-09 10:14:54 -03:00
David Ahern 1550c17193 ipv4: Prepare rtable for IPv6 gateway
To allow the gateway to be either an IPv4 or IPv6 address, remove
rt_uses_gateway from rtable and replace with rt_gw_family. If
rt_gw_family is set it implies rt_uses_gateway. Rename rt_gateway
to rt_gw4 to represent the IPv4 version.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-08 15:22:40 -07:00
Yangyang Li 00fb67ec6b RDMA/hns: Bugfix for SCC hem free
The method of hem free for SCC context is different from qp context.

In the current version, if free SCC hem during the execution of qp free,
there may be smmu error as below:

 arm-smmu-v3 arm-smmu-v3.1.auto: event 0x10 received:
 arm-smmu-v3 arm-smmu-v3.1.auto:  0x00007d0000000010
 arm-smmu-v3 arm-smmu-v3.1.auto:  0x000012000000017c
 arm-smmu-v3 arm-smmu-v3.1.auto:  0x00000000000009e0
 arm-smmu-v3 arm-smmu-v3.1.auto:  0x0000000000000000

As SCC context is still used by hardware after qp free, we can solve this
problem by removing SCC hem free from hns_roce_qp_free.

Fixes: 6a157f7d1b ("RDMA/hns: Add SCC context allocation support for hip08")
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:42:16 -03:00
Lijun Ou 4772e03d23 RDMA/hns: Fix bug that caused srq creation to fail
Due to the incorrect use of the seg and obj information, the position of
the mtt is calculated incorrectly, and the free space of the page is not
enough to store the entire mtt, resulting in access to the next page. This
patch fixes this problem.

 Unable to handle kernel paging request at virtual address ffff00006e3cd000
 ...
 Call trace:
  hns_roce_write_mtt+0x154/0x2f0 [hns_roce]
  hns_roce_buf_write_mtt+0xa8/0xd8 [hns_roce]
  hns_roce_create_srq+0x74c/0x808 [hns_roce]
  ib_create_srq+0x28/0xc8

Fixes: 0203b14c4f ("RDMA/hns: Unify the calculation for hem index in hip08")
Signed-off-by: chenglang <chenglang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:27:58 -03:00
chenglang 2b277dae06 RDMA/hns: Support to create 1M srq queue
In mhop 0 mode, 64*bt_num queues can be supported.
In mhop 1 mode, 32K*bt_num queues can be supported.
Config srqc_hop_num to 1 to support 1M SRQ queues.

Signed-off-by: chenglang <chenglang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:26:44 -03:00
Lijun Ou e1c9a0dc29 RDMA/hns: Dump detailed driver-specific CQ
This patch adds support of resource track for hip08 and take dumping cq
context state used for debugging as an example.  More resources track
supports for hns driver will be added in future.

The output should be as follows.
$ rdma res show cq dev hnseth0 -d
dev hnseth0 cqe 1023 users 2 poll-ctx WORKQUEUE pid 0 comm [ib_core] drv_state 2 drv_ceq
n 0 drv_cqn 0 drv_hopnum 1 drv_pi 0 drv_ci 0 drv_coalesce 0 drv_period 0 drv_cnt 0

Signed-off-by: Tao Tian <tiantao6@huawei.com>
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: chenglang <chenglang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Leon Romanovsky 68e326dea1 RDMA: Handle SRQ allocations by IB/core
Convert SRQ allocation from drivers to be in the IB/core

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Leon Romanovsky d345691471 RDMA: Handle AH allocations by IB/core
Simplify drivers by ensuring lifetime of ib_ah object. The changes
in .create_ah() go hand in hand with relevant update in .destroy_ah().

We will use this opportunity and convert .destroy_ah() to don't fail, as
it was suggested a long time ago, because there is nothing to do in case
of failure during destroy.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Jason Gunthorpe e79c9c6062 IB/mlx5: Remove references to uboject->context
These should all go through udata now. Add mlx5_udata_to_mdev to convert
a udata into the struct mlx5_ib_dev as these call sites require.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Shiraz Saleem d10bcf947a RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs
Combine contiguous regions of PAGE_SIZE pages into single scatter list
entry while building the scatter table for a umem. This minimizes the
number of the entries in the scatter list and reduces the DMA mapping
overhead, particularly with the IOMMU.

Set default max_seg_size in core for IB devices to 2G and do not combine
if we exceed this limit.

Also, purge npages in struct ib_umem as we now DMA map the umem SGL with
sg_nents and npage computation is not needed. Drivers should now be using
ib_umem_num_pages(), so fix the last stragglers.

Move npages tracking to ib_umem_odp as ODP drivers still need it.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Gal Pressman <galpress@amazon.com>
Tested-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:24 -03:00
Kamal Heib ea7a5c706f RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_remove
Make sure to free the DSR on pvrdma_pci_remove() to avoid the memory leak.

Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:04:46 -03:00
Will Deacon 1b8546d7e2 i40iw: Redefine i40iw_mmiowb() to do nothing
mmiowb() is now implicit in spin_unlock(), so there's no reason to call
it from driver code. Redefine i40iw_mmiowb() to do nothing instead.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-08 12:09:15 +01:00
Will Deacon fb24ea52f7 drivers: Remove explicit invocations of mmiowb()
mmiowb() is now implied by spin_unlock() on architectures that require
it, so there is no reason to call it from driver code. This patch was
generated using coccinelle:

	@mmiowb@
	@@
	- mmiowb();

and invoked as:

$ for d in drivers include/linux/qed sound; do \
spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done

NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
spin_unlock(). However, pairing each mmiowb() removal in this patch with
the corresponding call to spin_unlock() is not at all trivial, so there
is a small chance that this change may regress any drivers incorrectly
relying on mmiowb() to order MMIO writes between CPUs using lock-free
synchronisation. If you've ended up bisecting to this commit, you can
reintroduce the mmiowb() calls using wmb() instead, which should restore
the old behaviour on all architectures other than some esoteric ia64
systems.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-08 12:01:02 +01:00
Will Deacon 949b8c7276 drivers: Remove useless trailing comments from mmiowb() invocations
In preparation for using coccinelle to remove all mmiowb() instances
from drivers, remove all trailing comments since they won't be picked up
by spatch later on and will end up being preserved in the code.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-08 12:00:56 +01:00
Saeed Mahameed b6460c72c3 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
This merge commit includes some misc shared code updates from mlx5-next branch needed
for net-next.

1) From Maxim, Remove un-used macros and spinlock from mlx5 code.

2) From Aya, Expose Management PCIE info register layout and add rate limit
print macros.

3) From Tariq, Compilation warning fix in fs_core.c

4) From Vu, Huy and Saeed, Improve mlx5 initialization flow:
The goal is to provide a better logical separation of mlx5 core
device initialization flow and will help to seamlessly support
creating different mlx5 device types such as PF, VF and SF
mlx5 sub-function virtual devices.

Mlx5_core driver needs to separate HCA resources from pci resources.
Its initialize/load/unload will be broken into stages:
1. Initialize common data structures
2. Setup function which initializes pci resources (for PF/VF)
   or some other specific resources for virtual device
3. Initialize software objects according to hardware capabilities
4. Load all mlx5_core components

It is also necessary to detach mlx5_core mdev name/message from pci
device mdev->pdev name/message for a clearer report/debug of
different mlx5 device types.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-05 14:10:16 -07:00
Potnuri Bharat Teja d2c33370ae RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state
On receiving a TERM from tje peer, Host moves the QP to TERMINATE state
and then moves the adapter out of RDMA mode. After issuing a TERM, peer
issues a CLOSE and at this point of time if the connectivity between peer
and host is lost for a significant amount of time, the QP remains in
TERMINATE state.

Therefore c4iw_modify_qp() needs to initiate a close on entering terminate
state.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-04 08:29:04 -03:00
Leon Romanovsky 0f51427bd0 RDMA/mlx5: Cleanup WQE page fault handler
Refactor the page fault handler to be more readable and extensible, this
cleanup was triggered by the error reported below. The code structure made
it unclear to the automatic tools to identify that such a flow is not
possible in real life because "requestor != NULL" means that "qp != NULL"
too.

    drivers/infiniband/hw/mlx5/odp.c:1254 mlx5_ib_mr_wqe_pfault_handler()
    error: we previously assumed 'qp' could be null (see line 1230)

Fixes: 08100fad5c ("IB/mlx5: Add ODP SRQ support")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-04 08:28:59 -03:00
Jason Gunthorpe 1c726c4421 Merge HFI1 updates into k.o/for-next
Based on rdma.git for-rc for dependencies.

From Dennis Dalessandro:

====================

Here are some code improvement patches and fixes for less serious bugs to
TID RDMA than we sent for RC.

====================

* HFI1 updates:
  IB/hfi1: Implement CCA for TID RDMA protocol
  IB/hfi1: Remove WARN_ON when freeing expected receive groups
  IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE
  IB/hfi1: Add a function to read next expected psn from hardware flow
  IB/hfi1: Delay the release of destination mr for TID RDMA WRITE DATA

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-03 15:28:05 -03:00
Kaike Wan 747b931fbe IB/hfi1: Implement CCA for TID RDMA protocol
Currently, FECN handling is not implemented on TID RDMA expected receive
packets and therefore CCA can't be turned on when TID RDMA is
enabled. This patch adds the CCA support to TID RDMA protocol by:

- modifying FECN RSM rule to include kernel receive contexts

- For TID_RDMA READ RESP or TID RDMA ACK packet, a CNP will be sent out if
  the FECN bit is set. For other TID RDMA packets that generate at least
  one response packet, the BECN bit will be set in the first response
  packet

- Copying expected packet data to destination buffer when FECN bit is set
  in the TID RDMA READ RESP or TID RDMA WRITE DATA packet. In this case,
  the expected packet is received as an eager packet

- Handling the TID sequence error for subsequent normal expected packets.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-03 15:27:39 -03:00
Kaike Wan 8da0f0f26f IB/hfi1: Remove WARN_ON when freeing expected receive groups
When PSM user receive context is freed, the expected receive groups
allocated by the receive context will also been freed. However, if there
are still TID entries in use, the receive groups rcd->tid_full_list or
rcd->tid_used_list will not be empty, and thus triggering the WARN_ONs in
the function hfi1_free_ctxt_rcv_groups().  Even if the two lists may not
be empty, the hfi1 driver will free all TID entries and receive groups
associated with the receive context to prevent any resource leakage. Since
a clean user application exit is not controlled by the hfi1 driver, this
patch will remove the WARN_ONs in hfi1_free_ctxt_rcv_groups().

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-03 15:27:30 -03:00
Kaike Wan b885d5be9c IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE
For expected packet receiving, the hfi1 hardware checks the KDETH PSN
automatically. However, when sequence error occurs, the hfi1 driver can
check the sequence instead until the hardware flow generation is reloaded.

TID RDMA READ and WRITE protocols implement similar software checking
mechanisms, but with different flags and different local variables to
store next expected PSN.

Unify the handling by using only one set of flag and local variable for
both TID RDMA READ and WRITE protocols.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-03 15:27:30 -03:00
Kaike Wan 6a40693a88 IB/hfi1: Add a function to read next expected psn from hardware flow
This patch adds a function to read next expected KDETH PSN from hardware
flow to simplify the code.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-03 15:27:30 -03:00
Kaike Wan f6f3f53255 IB/hfi1: Delay the release of destination mr for TID RDMA WRITE DATA
The reference of destination memory region is first obtained when TID RDMA
WRITE request is first received on the responder side. This reference is
released once all TID RDMA WRITE RESP packets are sent to the requester
side, even though not all TID RDMA WRITE DATA packets may have been
received. This early release will especially be undesired if the software
needs to access the destination memory before the last data packet is
received.

This patch delays the release of the MR until all TID RDMA DATA packets
have been received. A helper function to release the reference is also
created to simplify the code.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-03 15:27:30 -03:00
Huy Nguyen aa8106f137 net/mlx5: Add explicit bar address field
Add bar_addr field to store bar-0 address to avoid calling
pci_resource_start with hard-coded bar-0 as parameter.
Also note that different mlx5 device types will have bar_addr
on different bars.

This patch does not change any functionality.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-02 12:49:38 -07:00
Maxim Mikityanskiy bbf29f618e net/mlx5: Remove spinlock support from mlx5_write64
As there is no user of mlx5_write64 that passes a spinlock to
mlx5_write64, remove this functionality and simplify the function.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-02 12:49:37 -07:00
Leon Romanovsky 6734b29735 RDMA/hns: Fix bad endianess of port_pd variable
port_pd is treated as le32 in declaration and read, fix assignment to be
in le32 too. This change fixes the following compilation warnings.

drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: warning: incorrect type
in assignment (different base types)
drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: expected restricted __le32 [usertype] port_pd
drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: got restricted __be32 [usertype]

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Lijun Ou <ouliun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-02 10:40:41 -03:00
Shamir Rabinovitch ff23dfa134 IB: Pass only ib_udata in function prototypes
Now when ib_udata is passed to all the driver's object create/destroy APIs
the ib_udata will carry the ib_ucontext for every user command. There is
no need to also pass the ib_ucontext via the functions prototypes.

Make ib_udata the only argument psssed.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 15:00:47 -03:00
Shamir Rabinovitch bdeacabd1a IB: Remove 'uobject->context' dependency in object destroy APIs
Now that we have the udata passed to all the ib_xxx object destroy APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:59:35 -03:00
Shamir Rabinovitch c4367a2635 IB: Pass uverbs_attr_bundle down ib_x destroy path
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x
destroy path as ib_udata. The next patch will use the ib_udata to free the
drivers destroy path from the dependency in 'uobject->context' as we
already did for the create path.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:57:35 -03:00
Shamir Rabinovitch a6a3797df2 IB: Pass uverbs_attr_bundle down uobject destroy path
Pass uverbs_attr_bundle down the uobject destroy path. The next patch will
use this to eliminate the dependecy of the drivers in ib_x->uobject
pointers.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:55:36 -03:00
Matthew Wilcox 059d48fbf6 qib: Convert qib_unit_table to XArray
Also remove qib_devs_list.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 13:31:35 -03:00
Matthew Wilcox 03b92789e5 hfi1: Convert hfi1_unit_table to XArray
Also remove hfi1_devs_list.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 13:27:35 -03:00
Matthew Wilcox 9fd15987ed qedr: Convert srqidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-29 14:54:51 -03:00
Matthew Wilcox b6014f9e5f qedr: Convert qpidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-29 14:53:57 -03:00
Matthew Wilcox 0ee3b915b1 hfi1: Convert vesw_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-29 14:51:50 -03:00
Matthew Wilcox 736b5a70db RDMA/hns: Convert qp_table_tree to XArray
Also fully initialise the qp before storing it in the XArray.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-29 14:51:50 -03:00
Matthew Wilcox 27e19f4510 RDMA/hns: Convert cq_table to XArray
Change the locking to not disable interrupts as the lookup in interrupt
context will not see a freed CQ, thanks to the synchronize_irq() call
before freeing the CQ.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-29 14:40:40 -03:00
Shiraz Saleem 41d34865b2 RDMA/mthca: Use correct sizing on buffers holding page DMA addresses
The buffer that holds the page DMA addresses is sized off umem->nmap.
This can potentially cause out of bound accesses on the PBL array when
iterating the umem DMA-mapped SGL. This is because if umem pages are
combined, umem->nmap can be much lower than the number of system pages
in umem.

Use ib_umem_num_pages() to size this buffer.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:13:27 -03:00
Shiraz Saleem 5f818d676a RDMA/cxbg: Use correct sizing on buffers holding page DMA addresses
The PBL array that hold the page DMA address is sized off umem->nmap.
This can potentially cause out of bound accesses on the PBL array when
iterating the umem DMA-mapped SGL. This is because if umem pages are
combined, umem->nmap can be much lower than the number of system pages
in umem.

Use ib_umem_num_pages() to size this array.

Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:13:27 -03:00
Selvin Xavier 5aa8484080 RDMA/bnxt_re: Use correct sizing on buffers holding page DMA addresses
umem->nmap is used while allocating internal buffer for storing
page DMA addresses. This causes out of bounds array access while iterating
the umem DMA-mapped SGL with umem page combining as umem->nmap can be
less than number of system pages in umem.

Use ib_umem_num_pages() instead of umem->nmap to size the page array.
Add a new structure (bnxt_qplib_sg_info) to pass sglist, npages and nmap.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:13:27 -03:00
Bart Van Assche 196b4ce57d IB/qib: Remove a set-but-not-used variable
This patch avoids that a compiler warning is reported when building with
W=1.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes: 49c0e2414b ("IB/qib: Change SDMA progression mode depending on single- or multi-rail")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 11:03:49 -03:00
Bart Van Assche 920d10e458 IB/hfi1: Fix two format strings
Enable format string checking for hfi1_cdbg() and fix the resulting
compiler warnings.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 11:03:49 -03:00
Bart Van Assche 1f687edee2 IB/mlx5: Declare devx_async_cmd_event_fops static
Avoid that sparse complains about a missing declaration.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes: 6bf8f22aea ("IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 10:22:48 -03:00
David S. Miller 356d71e00d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-03-27 17:37:58 -07:00
Yuval Shaia 6a1096611c RDMA/vmw_pvrdma: Skip zeroing device attrs
Caller already clears props before calling query_device.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:30:43 -03:00
Artemy Kovalyov d623dfd283 IB/mlx5: Compare only index part of a memory window rkey
The InfiniBand Architecture Specification section 10.6.7.2.4 TYPE 2 MEMORY
WINDOWS says that if the CI supports the Base Memory Management Extensions
defined in this specification, the R_Key format for a Type 2 Memory Window
must consist of:

* 24 bit index in the most significant bits of the R_Key, which is owned
  by the CI, and
* 8 bit key in the least significant bits of the R_Key, which is owned by
  the Consumer.

This means that the kernel should compare only the index part of a R_Key
to determine equality with another R_Key.

Fixes: db570d7dea ("IB/mlx5: Add ODP support to MW")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:27:56 -03:00
Artemy Kovalyov 1e5887b700 IB/mlx5: WQE dump jumps over first 16 bytes
Move index increment after its is used or otherwise it will start the dump
of the WQE from second WQE BB.

Fixes: 34f4c9554d ("IB/mlx5: Use fragmented QP's buffer for in-kernel users")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:27:33 -03:00
Moni Shoua 1abe186ed8 IB/mlx5: Reset access mask when looping inside page fault handler
If page-fault handler spans multiple MRs then the access mask needs to
be reset before each MR handling or otherwise write access will be
granted to mapped pages instead of read-only.

Cc: <stable@vger.kernel.org> # 3.19
Fixes: 7bdf65d411 ("IB/mlx5: Handle page faults")
Reported-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:27:11 -03:00
Leon Romanovsky e95e52a178 RDMA/hns: Limit scope of hns_roce_cmq_send()
The forgotten static keyword causes to the following error to appear while
building HNS driver. Declare hns_roce_cmq_send() to be static function to
fix this warning.

drivers/infiniband/hw/hns/hns_roce_hw_v2.c:1089:5: warning: no previous
prototype for _hns_roce_cmq_send_ [-Wmissing-prototypes]
 int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,

Fixes: 6a04aed6af ("RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:25:50 -03:00
Kaike Wan d029434447 IB/hfi1: Fix the allocation of RSM table
The receive side mapping (RSM) on hfi1 hardware is a special
matching mechanism to direct an incoming packet to a given
hardware receive context. It has 4 instances of matching capabilities
(RSM0 - RSM3) that share the same RSM table (RMT). The RMT has a total of
256 entries, each of which points to a receive context.

Currently, three instances of RSM have been used:
1. RSM0 by QOS;
2. RSM1 by PSM FECN;
3. RSM2 by VNIC.

Each RSM instance should reserve enough entries in RMT to function
properly. Since both PSM and VNIC could allocate any receive context
between dd->first_dyn_alloc_ctxt and dd->num_rcv_contexts, PSM FECN must
reserve enough RMT entries to cover the entire receive context index
range (dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt) instead of only
the user receive contexts allocated for PSM
(dd->num_user_contexts). Consequently, the sizing of
dd->num_user_contexts in set_up_context_variables is incorrect.

Fixes: 2280740f01 ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 14:34:31 -03:00
Kaike Wan a8639a79e8 IB/hfi1: Eliminate opcode tests on mr deref
When an old ack_queue entry is used to store an incoming request, it may
need to clean up the old entry if it is still referencing the
MR. Originally only RDMA READ request needed to reference MR on the
responder side and therefore the opcode was tested when cleaning up the
old entry. The introduction of tid rdma specific operations in the
ack_queue makes the specific opcode tests wrong.  Multiple opcodes (RDMA
READ, TID RDMA READ, and TID RDMA WRITE) may need MR ref cleanup.

Remove the opcode specific tests associated with the ack_queue.

Fixes: f48ad614c1 ("IB/hfi1: Move driver out of staging")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 14:34:31 -03:00
Kaike Wan 93b289b9af IB/hfi1: Clear the IOWAIT pending bits when QP is put into error state
When a QP is put into error state, it may be waiting for send engine
resources. In this case, the QP will be removed from the send engine's
waiting list, but its IOWAIT pending bits are not cleared. This will
normally not have any major impact as the QP is being destroyed.  However,
the QP still needs to wind down its operations, such as draining the send
queue by scheduling the send engine. Clearing the pending bits will avoid
any potential complications. In addition, if the QP will eventually hang,
clearing the pending bits can help debugging by presenting a consistent
picture if the user dumps the qp_stats.

This patch clears a QP's IOWAIT_PENDING_IB and IO_PENDING_TID bits in
priv->s_iowait.flags in this case.

Fixes: 5da0fc9dbf ("IB/hfi1: Prepare resource waits for dual leg")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 14:34:31 -03:00
Kaike Wan 662d664666 IB/hfi1: Failed to drain send queue when QP is put into error state
When a QP is put into error state, all pending requests in the send work
queue should be drained. The following sequence of events could lead to a
failure, causing a request to hang:

(1) The QP builds a packet and tries to send through SDMA engine.
    However, PIO engine is still busy. Consequently, this packet is put on
    the QP's tx list and the QP is put on the PIO waiting list. The field
    qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN;

(2) The QP is put into error state by the user application and
    notify_error_qp() is called, which removes the QP from the PIO waiting
    list and the packet from the QP's tx list. In addition, qp->s_flags is
    cleared of RVT_S_ANY_WAIT_IO bits, which does not include
    HFI1_S_WAIT_PIO_DRAIN bit;

(3) The hfi1_schdule_send() function is called to drain the QP's send
    queue. Subsequently, hfi1_do_send() is called. Since the flag bit
    HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails.  As
    a result, hfi1_do_send() bails out without draining any request from
    the send queue;

(4) The PIO engine completes the sending and tries to wake up any QP on
    its waiting list. But the QP has been removed from the PIO waiting
    list and therefore is kept in sleep forever.

The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2).
HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN.

Fixes: 2e2ba09e48 ("IB/rdmavt, IB/hfi1: Create device dependent s_flags")
Cc: <stable@vger.kernel.org> # 4.19.x+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 14:34:31 -03:00
Kangjie Lu e2a438bd71 RDMA/i40iw: Handle workqueue allocation failure
alloc_ordered_workqueue may fail and return NULL.  The fix captures the
failure and handles it properly to avoid potential NULL pointer
dereferences.

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 10:19:07 -03:00
Enrico Weigelt, metux IT consult 1a2e158327 drivers: infiniband: Fix whitespace in kconfig
Adjust the kconfig whitespace in bnxt_re/iser to match the kernel
standard.

Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 12:49:33 -03:00
Colin Ian King a6a9274a7c RDMA/nes: remove redundant check on udata
The non-null check on udata is redundant as this check was performed just
a few statements earlier and the check is always true as udata must be
non-null at this point. Remove redundant the check on udata and the
redundant else part that can never be executed.

Detected by CoverityScan, CID#1477317 ("Logically dead code")

Fixes: 8994445054 ("IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 12:07:45 -03:00
Matthew Wilcox f1430536e0 mlx4: Convert pv_id_table to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 10:09:54 -03:00
Matthew Wilcox b02a29eb88 mlx5: Convert mlx5_srq_table to XArray
Remove the custom spinlock as the XArray handles its own locking.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 09:39:42 -03:00
Mike Marciniszyn 270a9833b2 IB/hfi1: Add running average for adaptive pio
The adaptive PIO implementation only considers the current packet size
when deciding between SDMA and pio for a packet.

This causes credit return forces if small and large packets are
interleaved.

Add a running average to avoid costly credit forces so that a large
sequence of small packets is required to go below the threshold that
chooses pio.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 09:33:21 -03:00
Erez Alfasi 19b1a294b0 RDMA: Use __packed annotation instead of __attribute__ ((packed))
"__attribute__" set of macros has been standardized, have became more
potentially portable and consistent code back in v2.6.21 by commit
82ddcb040 ("[PATCH] extend the set of "__attribute__" shortcut macros").
Moreover, nowadays checkpatch.pl warns about using __attribute__((packed))
instead of __packed.

This patch converts all the "__attribute__ ((packed))" annotations to
"__packed" within the RDMA subsystem.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 21:14:12 -03:00
Lijun Ou d0a935563b RDMA/hns: Delete unused variable in hns_roce_v2_modify_qp function
The src_mac array is not used in hns_roce_v2_modify_qp function.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 21:11:30 -03:00
Lijun Ou 82342e493b RDMA/hns: Bugfix for sending with invalidate
According to IB protocol, the send with invalidate operation will not
invalidate mr that was created through a register mr or reregister mr.

Fixes: e93df01085 ("RDMA/hns: Support local invalidate for hip08 in kernel space")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 20:59:56 -03:00
Lijun Ou 07c2339a91 RDMA/hns: Hide error print information with roce vf device
The driver should not print the error information when the hip08 driver
not support virtual function.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 20:59:55 -03:00
Lijun Ou 5b01b243b0 RDMA/hns: Only assgin some fields if the relatived attr_mask is set
According to IB protocol, some fields of qp context are filled with
optional when the relatived attr_mask are set. The relatived attr_mask
include IB_QP_TIMEOUT, IB_QP_RETRY_CNT, IB_QP_RNR_RETRY and
IB_QP_MIN_RNR_TIMER.  Besides, we move some assignments of the fields of
qp context into the outside of the specific qp state jump function.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 20:59:55 -03:00
Lijun Ou 834fa8cf6f RDMA/hns: Update the range of raq_psn field of qp context
According to hip08 UM(User Manual), the raq_psn field size is [23:0].

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 20:59:55 -03:00
Lijun Ou 601f3e6d06 RDMA/hns: Only assign the fields of the rq psn if IB_QP_RQ_PSN is set
Only when the IB_QP_RQ_PSN flags of attr_mask is set is it valid to assign
the relatived fields of rq'psn into the qp context when modified qp.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 20:59:55 -03:00
Lijun Ou f04cc17878 RDMA/hns: Only assign the relatived fields of psn if IB_QP_SQ_PSN is set
Only when the IB_QP_SQ_PSN flags of attr_mask is set is it valid to assign
the relatived fields of psn into the qp context when modified qp.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 20:59:55 -03:00
Matthew Wilcox 401b44804c cxgb4: Convert stid_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 20:56:36 -03:00
Matthew Wilcox 9f5a9632e4 cxgb4: Convert atid_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 20:56:36 -03:00
Matthew Wilcox f254ba6ae5 cxgb4: Convert hwtid_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 15:42:12 -03:00
Matthew Wilcox 7a268a9397 cxgb4: Convert mmidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 15:40:37 -03:00
Matthew Wilcox 2f43129127 cxgb4: Convert qpidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 15:39:18 -03:00
Matthew Wilcox 52e124c27e cxgb4: Convert cqidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 15:38:18 -03:00
Matthew Wilcox e64a7c02f1 cxgb3: Convert mmidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 15:36:29 -03:00
Matthew Wilcox 27114876ce cxgb3: Convert qpidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 15:35:08 -03:00
Matthew Wilcox a2f409713e cxgb3: Convert cqidr to XArray
It would make sense to convert this to an allocating XArray and remove the
kfifo that is currently used to allocate the CQID, but that work is better
done by someone who has the hardware to test with.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 15:34:22 -03:00
Paolo Abeni a350eccee5 net: remove 'fallback' argument from dev->ndo_select_queue()
After the previous patch, all the callers of ndo_select_queue()
provide as a 'fallback' argument netdev_pick_tx.
The only exceptions are nested calls to ndo_select_queue(),
which pass down the 'fallback' available in the current scope
- still netdev_pick_tx.

We can drop such argument and replace fallback() invocation with
netdev_pick_tx(). This avoids an indirect call per xmit packet
in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
with device drivers implementing such ndo. It also clean the code
a bit.

Tested with ixgbe and CONFIG_FCOE=m

With pktgen using queue xmit:
threads		vanilla 	patched
		(kpps)		(kpps)
1		2334		2428
2		4166		4278
4		7895		8100

 v1 -> v2:
 - rebased after helper's name change

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:18:55 -07:00
Feng Tang ec4fe4bcc5 i40iw: Avoid panic when handling the inetdev event
There is a panic reported that on a system with x722 ethernet, when doing
the operations like:

	# ip link add br0 type bridge
	# ip link set eno1 master br0
	# systemctl restart systemd-networkd

The system will panic "BUG: unable to handle kernel null pointer
dereference at 0000000000000034", with call chain:

	i40iw_inetaddr_event
	notifier_call_chain
	blocking_notifier_call_chain
	notifier_call_chain
	__inet_del_ifa
	inet_rtm_deladdr
	rtnetlink_rcv_msg
	netlink_rcv_skb
	rtnetlink_rcv
	netlink_unicast
	netlink_sendmsg
	sock_sendmsg
	__sys_sendto

It is caused by "local_ipaddr = ntohl(in->ifa_list->ifa_address)", while
the in->ifa_list is NULL.

So add a check for the "in->ifa_list == NULL" case, and skip the ARP
operation accordingly.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-17 21:40:40 -03:00
Aya Levin cd27287562 IB/mlx5: Fix mapping of link-mode to IB width and speed
Add mapping of link mode: CAUI4 100Gbps CR4/KR4 with 4 lines and 25Gbps.
Fix mapping of link mode: GAUI2 50Gbps CR2/KR2 to be 2 lines with 25Gbps.

Fixes: 08e8676f16 ("IB/mlx5: Add support for 50Gbps per lane link modes")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-17 21:40:39 -03:00
Yishai Hadas c5ae1954c4 IB/mlx5: Use mlx5 core to create/destroy a DEVX DCT
To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling DRAIN DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.

In that case the DRAIN DCT command will be called and only once that it
will be completed the DESTROY DCT command will be called.  Otherwise, the
DESTROY DCT may fail and a hardware leak may occur.

As of that change the DRAIN DCT command should not be exposed any more
from DEVX, it's managed internally by the driver to work as expected by
the device specification.

Fixes: 7efce3691d ("IB/mlx5: Add obj create and destroy functionality")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-17 21:40:39 -03:00
Jack Morgenstein 587443e777 IB/mlx4: Fix race condition between catas error reset and aliasguid flows
Code review revealed a race condition which could allow the catas error
flow to interrupt the alias guid query post mechanism at random points.
Thiis is fixed by doing cancel_delayed_work_sync() instead of
cancel_delayed_work() during the alias guid mechanism destroy flow.

Fixes: a0c64a17ab ("mlx4: Add alias_guid mechanism")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-17 21:40:39 -03:00
Linus Torvalds a50243b1dd 5.1 Merge Window Pull Request
This has been a slightly more active cycle than normal with ongoing core
 changes and quite a lot of collected driver updates.
 
 - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe
 
 - A new data transfer mode for HFI1 giving higher performance
 
 - Significant functional and bug fix update to the mlx5 On-Demand-Paging MR
   feature
 
 - A chip hang reset recovery system for hns
 
 - Change mm->pinned_vm to an atomic64
 
 - Update bnxt_re to support a new 57500 chip
 
 - A sane netlink 'rdma link add' method for creating rxe devices and fixing
   the various unregistration race conditions in rxe's unregister flow
 
 - Allow lookup up objects by an ID over netlink
 
 - Various reworking of the core to driver interface:
   * Drivers should not assume umem SGLs are in PAGE_SIZE chunks
   * ucontext is accessed via udata not other means
   * Start to make the core code responsible for object memory
     allocation
   * Drivers should convert struct device to struct ib_device
     via a helper
   * Drivers have more tools to avoid use after unregister problems
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlyAJYYACgkQOG33FX4g
 mxrWwQ/+OyAx4Moru7Aix0C6GWxTJp/wKgw21CS3reZxgLai6x81xNYG/s2wCNjo
 IccObVd7mvzyqPdxOeyHBsJBbQDqWvoD6O2duH8cqGMgBRgh3CSdUep2zLvPpSAx
 2W1SvWYCLDnCuarboFrCA8c4AN3eCZiqD7z9lHyFQGjy3nTUWzk1uBaOP46uaiMv
 w89N8EMdXJ/iY6ONzihvE05NEYbMA8fuvosKLLNdghRiHIjbMQU8SneY23pvyPDd
 ZziPu9NcO3Hw9OVbkwtJp47U3KCBgvKHmnixyZKkikjiD+HVoABw2IMwcYwyBZwP
 Bic/ddONJUvAxMHpKRnQaW7znAiHARk21nDG28UAI7FWXH/wMXgicMp6LRcNKqKF
 vqXdxHTKJb0QUR4xrYI+eA8ihstss7UUpgSgByuANJ0X729xHiJtlEvPb1DPo1Dz
 9CB4OHOVRl5O8sA5Jc6PSusZiKEpvWoyWbdmw0IiwDF5pe922VLl5Nv88ta+sJ38
 v2Ll5AgYcluk7F3599Uh9D7gwp5hxW2Ph3bNYyg2j3HP4/dKsL9XvIJPXqEthgCr
 3KQS9rOZfI/7URieT+H+Mlf+OWZhXsZilJG7No0fYgIVjgJ00h3SF1/299YIq6Qp
 9W7ZXBfVSwLYA2AEVSvGFeZPUxgBwHrSZ62wya4uFeB1jyoodPk=
 =p12E
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a slightly more active cycle than normal with ongoing
  core changes and quite a lot of collected driver updates.

   - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe

   - A new data transfer mode for HFI1 giving higher performance

   - Significant functional and bug fix update to the mlx5
     On-Demand-Paging MR feature

   - A chip hang reset recovery system for hns

   - Change mm->pinned_vm to an atomic64

   - Update bnxt_re to support a new 57500 chip

   - A sane netlink 'rdma link add' method for creating rxe devices and
     fixing the various unregistration race conditions in rxe's
     unregister flow

   - Allow lookup up objects by an ID over netlink

   - Various reworking of the core to driver interface:
       - drivers should not assume umem SGLs are in PAGE_SIZE chunks
       - ucontext is accessed via udata not other means
       - start to make the core code responsible for object memory
         allocation
       - drivers should convert struct device to struct ib_device via a
         helper
       - drivers have more tools to avoid use after unregister problems"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits)
  net/mlx5: ODP support for XRC transport is not enabled by default in FW
  IB/hfi1: Close race condition on user context disable and close
  RDMA/umem: Revert broken 'off by one' fix
  RDMA/umem: minor bug fix in error handling path
  RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
  cxgb4: kfree mhp after the debug print
  IB/rdmavt: Fix concurrency panics in QP post_send and modify to error
  IB/rdmavt: Fix loopback send with invalidate ordering
  IB/iser: Fix dma_nents type definition
  IB/mlx5: Set correct write permissions for implicit ODP MR
  bnxt_re: Clean cq for kernel consumers only
  RDMA/uverbs: Don't do double free of allocated PD
  RDMA: Handle ucontext allocations by IB/core
  RDMA/core: Fix a WARN() message
  bnxt_re: fix the regression due to changes in alloc_pbl
  IB/mlx4: Increase the timeout for CM cache
  IB/core: Abort page fault handler silently during owning process exit
  IB/mlx5: Validate correct PD before prefetch MR
  IB/mlx5: Protect against prefetch of invalid MR
  RDMA/uverbs: Store PR pointer before it is overwritten
  ...
2019-03-09 15:53:03 -08:00
Michael J. Ruhl bc5add0976 IB/hfi1: Close race condition on user context disable and close
When disabling and removing a receive context, it is possible for an
asynchronous event (i.e IRQ) to occur.  Because of this, there is a race
between cleaning up the context, and the context being used by the
asynchronous event.

cpu 0  (context cleanup)
    rc->ref_count-- (ref_count == 0)
    hfi1_rcd_free()
cpu 1  (IRQ (with rcd index))
	rcd_get_by_index()
	lock
	ref_count+++     <-- reference count race (WARNING)
	return rcd
	unlock
cpu 0
    hfi1_free_ctxtdata() <-- incorrect free location
    lock
    remove rcd from array
    unlock
    free rcd

This race will cause the following WARNING trace:

WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
Call Trace:
  dump_stack+0x19/0x1b
  __warn+0xd8/0x100
  warn_slowpath_null+0x1d/0x20
  hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
  is_rcv_urgent_int+0x24/0x90 [hfi1]
  general_interrupt+0x1b6/0x210 [hfi1]
  __handle_irq_event_percpu+0x44/0x1c0
  handle_irq_event_percpu+0x32/0x80
  handle_irq_event+0x3c/0x60
  handle_edge_irq+0x7f/0x150
  handle_irq+0xe4/0x1a0
  do_IRQ+0x4d/0xf0
  common_interrupt+0x162/0x162

The race can also lead to a use after free which could be similar to:

general protection fault: 0000 1 SMP
CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230
Call Trace:
  ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
  hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
  hfi1_aio_write+0xba/0x110 [hfi1]
  do_sync_readv_writev+0x7b/0xd0
  do_readv_writev+0xce/0x260
  ? handle_mm_fault+0x39d/0x9b0
  ? pick_next_task_fair+0x5f/0x1b0
  ? sched_clock_cpu+0x85/0xc0
  ? __schedule+0x13a/0x890
  vfs_writev+0x35/0x60
  SyS_writev+0x7f/0x110
  system_call_fastpath+0x22/0x27

Use the appropriate kref API to verify access.

Reorder context cleanup to ensure context removal before cleanup occurs
correctly.

Cc: stable@vger.kernel.org # v4.14.0+
Fixes: f683c80ca6 ("IB/hfi1: Resolve kernel panics by reference counting receive contexts")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-06 14:47:09 -04:00
Anshuman Khandual 98fa15f34c mm: replace all open encodings for NUMA_NO_NODE
Patch series "Replace all open encodings for NUMA_NO_NODE", v3.

All these places for replacement were found by running the following
grep patterns on the entire kernel code.  Please let me know if this
might have missed some instances.  This might also have replaced some
false positives.  I will appreciate suggestions, inputs and review.

1. git grep "nid == -1"
2. git grep "node == -1"
3. git grep "nid = -1"
4. git grep "node = -1"

This patch (of 2):

At present there are multiple places where invalid node number is
encoded as -1.  Even though implicitly understood it is always better to
have macros in there.  Replace these open encodings for an invalid node
number with the global macro NUMA_NO_NODE.  This helps remove NUMA
related assumptions like 'invalid node' from various places redirecting
them to a common definition.

Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[ixgbe]
Acked-by: Jens Axboe <axboe@kernel.dk>			[mtip32xx]
Acked-by: Vinod Koul <vkoul@kernel.org>			[dmaengine.c]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Acked-by: Doug Ledford <dledford@redhat.com>		[drivers/infiniband]
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:14 -08:00
YueHaibing 4e69cf1fe2 RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
The the below commit, hns_roce_v2_modify_qp is called inside spinlock
while using GFP_KERNEL. Change it to GFP_ATOMIC.

Fixes: 0425e3e6e0 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-04 16:41:30 -04:00
Shaobo He 952a3cc9c0 cxgb4: kfree mhp after the debug print
In function `c4iw_dealloc_mw`, variable mhp's value is printed after
freed, it is clearer to have the print before the kfree.

Otherwise racing threads could allocate another mhp with the same pointer
value and create confusing tracing.

Signed-off-by: Shaobo He <shaobo@cs.utah.edu>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-04 16:41:30 -04:00
Moni Shoua 7095ec3ca0 IB/mlx5: Set correct write permissions for implicit ODP MR
The write access of an implicit MR is inherited to all of its children.
Therefore we must set the correct write access to the parent MR.

Pass full access_flags when creating umem to let it calculate write access
correctly.

Fixes: da6a496a34 ("IB/mlx5: Ranges in implicit ODP MR inherit its write access")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-04 10:57:19 -04:00
Devesh Sharma 0fca467e81 bnxt_re: Clean cq for kernel consumers only
Kernel space provider driver should clean the CQs belonging to kernel
space consumers only. The current implementation is doing reverse of it.

Fixing the same by avoiding the call to __clean_cq on a kernel qp during
destroy.

Fixes: c50866e285 ("bnxt_re: fix the regression due to changes in alloc_pbl")
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-04 10:51:14 -04:00
Jakub Kicinski f4b6bcc700 net: devlink: turn devlink into a built-in
Being able to build devlink as a module causes growing pains.
First all drivers had to add a meta dependency to make sure
they are not built in when devlink is built as a module.  Now
we are struggling to invoke ethtool compat code reliably.

Make devlink code built-in, users can still not build it at
all but the dynamically loadable module option is removed.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26 08:49:05 -08:00
David S. Miller 70f3522614 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.

The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.

However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24 12:06:19 -08:00
Leon Romanovsky a2a074ef39 RDMA: Handle ucontext allocations by IB/core
Following the PD conversion patch, do the same for ucontext allocations.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22 14:11:37 -07:00
Devesh Sharma c50866e285 bnxt_re: fix the regression due to changes in alloc_pbl
While adding the use of for_each_sg_dma_page iterator for Brodcom's rdma
driver, there was a regression added in the __alloc_pbl path. The change
left bnxt_re in DOA state in for-next branch.

Fixing the regression to avoid the host crash when a user space object is
created. Restricting the unconditional access to hwq.pg_arr when hwq is
initialized for user space objects.

Fixes: 161ebe2498 ("RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL")
Reported-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22 11:17:22 -07:00
Håkon Bugge 2612d723aa IB/mlx4: Increase the timeout for CM cache
Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping
in a cache.

Following the RDMA Connection Manager (CM) protocol, it is clear when
an entry has to evicted form the cache. But life is not perfect,
remote peers may die or be rebooted. Hence, it's a timeout to wipe out
a cache entry, when the PF driver assumes the remote peer has gone.

During workloads where a high number of QPs are destroyed concurrently,
excessive amount of CM DREQ retries has been observed

The problem can be demonstrated in a bare-metal environment, where two
nodes have instantiated 8 VFs each. This using dual ported HCAs, so we
have 16 vPorts per physical server.

64 processes are associated with each vPort and creates and destroys
one QP for each of the remote 64 processes. That is, 1024 QPs per
vPort, all in all 16K QPs. The QPs are created/destroyed using the
CM.

When tearing down these 16K QPs, excessive CM DREQ retries (and
duplicates) are observed. With some cat/paste/awk wizardry on the
infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the
nodes:

cm_rx_duplicates:
      dreq  2102
cm_rx_msgs:
      drep  1989
      dreq  6195
       rep  3968
       req  4224
       rtu  4224
cm_tx_msgs:
      drep  4093
      dreq 27568
       rep  4224
       req  3968
       rtu  3968
cm_tx_retries:
      dreq 23469

Note that the active/passive side is equally distributed between the
two nodes.

Enabling pr_debug in cm.c gives tons of:

[171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave:
1,sl_cm_id: 0xd393089f} is NULL!

By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
tear-down phase of the application is reduced from approximately 90 to
50 seconds. Retries/duplicates are also significantly reduced:

cm_rx_duplicates:
      dreq  2460
[]
cm_tx_retries:
      dreq  3010
       req    47

Increasing the timeout further didn't help, as these duplicates and
retries stems from a too short CMA timeout, which was 20 (~4 seconds)
on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
numbers fell down to about 10 for both of them.

Adjustment of the CMA timeout is not part of this commit.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22 09:35:03 -07:00
Moni Shoua 81dd4c4be3 IB/mlx5: Validate correct PD before prefetch MR
When prefetching odp mr it is required to verify that pd of the mr is
identical to the pd for which the advise_mr request arrived with.

This check was missing from synchronous flow and is added now.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Reported-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21 16:32:45 -07:00
Moni Shoua a6bc3875f1 IB/mlx5: Protect against prefetch of invalid MR
When deferring a prefetch request we need to protect against MR or PD
being destroyed while the request is still enqueued.

The first step is to validate that PD owns the lkey that describes the MR
and that the MR that the lkey refers to is owned by that PD.

The second step is to dequeue all requests when MR is destroyed.

Since PD can't be destroyed while it owns MRs it is guaranteed that when a
worker wakes up the request it refers to is still valid.

Now, it is possible to refrain from taking a reference on the device since
it is assured to be present as pd.

While that, replace the dedicated ordered workqueue with the system
unbound workqueue to reuse an existing resource and improve
performance. This will also fix a bug of queueing to the wrong workqueue.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Reported-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21 16:32:45 -07:00
Gustavo A. R. Silva 7264235ee7 IB/hfi1: Add missing break in switch statement
Fix the following warning by adding a missing break:

drivers/infiniband/hw/hfi1/tid_rdma.c: In function ‘hfi1_tid_rdma_wqe_interlock’:
drivers/infiniband/hw/hfi1/tid_rdma.c:3251:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
   switch (prev->wr.opcode) {
   ^~~~~~
drivers/infiniband/hw/hfi1/tid_rdma.c:3259:2: note: here
  case IB_WR_RDMA_READ:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Fixes: c6c231175c ("IB/hfi1: Add interlock between TID RDMA WRITE and other requests")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kaike Wan <Kaike.wan@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21 14:08:12 -07:00
Jason Gunthorpe 815f748037 Merge branch 'mlx5-next' into rdma.git for-next
From
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

To resolve conflicts with net-next and pick up the first patch.

* branch 'mlx5-next':
  net/mlx5: Factor out HCA capabilities functions
  IB/mlx5: Add support for 50Gbps per lane link modes
  net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register
  net/mlx5: Add new fields to Port Type and Speed register
  net/mlx5: Refactor queries to speed fields in Port Type and Speed register
  net/mlx5: E-Switch, Avoid magic numbers when initializing offloads mode
  net/mlx5: Relocate vport macros to the vport header file
  net/mlx5: E-Switch, Normalize the name of uplink vport number
  net/mlx5: Provide an alternative VF upper bound for ECPF
  net/mlx5: Add host params change event
  net/mlx5: Add query host params command
  net/mlx5: Update enable HCA dependency
  net/mlx5: Introduce Mellanox SmartNIC and modify page management logic
  IB/mlx5: Use unified register/load function for uplink and VF vports
  net/mlx5: Use consistent vport num argument type
  net/mlx5: Use void pointer as the type in address_of macro
  net/mlx5: Align ODP capability function with netdev coding style
  mlx5: use RCU lock in mlx5_eq_cq_get()

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21 12:40:18 -07:00
Davidlohr Bueso ec95e0fa21 drivers/IB,qib: Fix pinned/locked limit check in qib_get_user_pages()
The current check does not take into account the previous value of
pinned_vm; thus it is quite bogus as is. Fix this by checking the
new value after the (optimistic) atomic inc.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-20 14:42:41 -07:00
Wei Yongjun 3b8f8b95d9 iw_cxgb4: Make function read_tcb() static
Fixes the following sparse warning:

drivers/infiniband/hw/cxgb4/cm.c:658:6: warning:
 symbol 'read_tcb' was not declared. Should it be static?

Fixes: 11a27e2121 ("iw_cxgb4: complete the cached SRQ buffers")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19 20:52:19 -07:00
Yangyang Li 6ac16e4039 RDMA/hns: Bugfix for set hem of SCC
The method of set hem for scc context is different from other contexts. It
should notify the hardware with the detailed idx in bt0 for scc, while for
other contexts, it only need to notify the bt step and the hardware will
calculate the idx.

Here fixes the following error when unloading the hip08 driver:

[  123.570768] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  123.579023] {1}[Hardware Error]: event severity: recoverable
[  123.584670] {1}[Hardware Error]:  Error 0, type: recoverable
[  123.590317] {1}[Hardware Error]:   section_type: PCIe error
[  123.595877] {1}[Hardware Error]:   version: 4.0
[  123.600395] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
[  123.606562] {1}[Hardware Error]:   device_id: 0000:7d:00.0
[  123.612034] {1}[Hardware Error]:   slot: 0
[  123.616120] {1}[Hardware Error]:   secondary_bus: 0x00
[  123.621245] {1}[Hardware Error]:   vendor_id: 0x19e5, device_id: 0xa222
[  123.627847] {1}[Hardware Error]:   class_code: 000002
[  123.632977] hns3 0000:7d:00.0: aer_status: 0x00000000, aer_mask: 0x00000000
[  123.639928] hns3 0000:7d:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID
[  123.647400] hns3 0000:7d:00.0: aer_uncor_severity: 0x00000000
[  123.653136] hns3 0000:7d:00.0: PCI error detected, state(=1)!!
[  123.658959] hns3 0000:7d:00.0: ROCEE uncorrected RAS error identified
[  123.665395] hns3 0000:7d:00.0: ROCEE RAS AXI rresp error
[  123.670713] hns3 0000:7d:00.0: requesting reset due to PCI error
[  123.676715] hns3 0000:7d:00.0: received reset event , reset type is 5
[  123.683147] hns3 0000:7d:00.0: AER: Device recovery successful
[  123.688978] hns3 0000:7d:00.0: PF Reset requested
[  123.693684] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[  123.700633] hns3 0000:7d:00.0: inform reset to vf(1) failded -5!

Fixes: 6a157f7d1b ("RDMA/hns: Add SCC context allocation support for hip08")
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Reviewed-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19 20:52:19 -07:00
Lijun Ou 3e394f9413 RDMA/hns: Modify qp&cq&pd specification according to UM
Accroding to hip08's limitation, qp&cq specification is 1M, mtpt
specification 1M in kernel space.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19 20:52:19 -07:00
Parvi Kaustubhi 5bb3c1e9d4 IB/usnic: Fix deadlock
There is a dead lock in usnic ib_register and netdev_notify path.

	usnic_ib_discover_pf()
	| mutex_lock(&usnic_ib_ibdev_list_lock);
	 | usnic_ib_device_add();
	  | ib_register_device()
	   | usnic_ib_query_port()
	    | mutex_lock(&us_ibdev->usdev_lock);
	     | ib_get_eth_speed()
	      | rtnl_lock()

order of lock: &usnic_ib_ibdev_list_lock -> usdev_lock -> rtnl_lock

	rtnl_lock()
	 | usnic_ib_netdevice_event()
	  | mutex_lock(&usnic_ib_ibdev_list_lock);

order of lock: rtnl_lock -> &usnic_ib_ibdev_list_lock

Solution is to use the core's lock-free ib_device_get_by_netdev() scheme
to lookup ib_dev while handling netdev & inet events.

Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Reviewed-by: Tanmay Inamdar <tinamdar@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19 20:52:19 -07:00
Leon Romanovsky cfe876d8e6 RDMA/cxgb4: Remove kref accounting for sync operation
Ucontext allocation and release aren't async events and don't need kref
accounting. The common layer of RDMA subsystem ensures that dealloc
ucontext will be called after all other objects are released.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15 21:39:15 -07:00
Leon Romanovsky be56b07b4f RDMA/nes: Remove useless usecnt variable and redundant memset
The internal design of RDMA/core ensures that there dealloc ucontext will
be called only if alloc_ucontext succeeded, hence there is no need to
manage internal variable to mark validity of ucontext.

As part of this change, remove redundant memeset too.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15 21:39:15 -07:00
Bodong Wang f8e8fa0262 net/mlx5: E-Switch, Centralize repersentor reg/unreg to eswitch driver
Eswitch has two users: IB and ETH. They both register repersentors
when mlx5 interface is added, and unregister the repersentors when
mlx5 interface is removed. Ideally, each driver should only deal with
the entities which are unique to itself. However, current IB and ETH
drivers have to perform the following eswitch operations:

1. When registering, specify how many vports to register. This number
   is the same for both drivers which is the total available vport
   numbers.
2. When unregistering, specify the number of registered vports to do
   unregister. Also, unload the repersentors which are already loaded.

It's unnecessary for eswitch driver to hands out the control of above
operations to individual driver users, as they're not unique to each
driver. Instead, such operations should be centralized to eswitch
driver. This consolidates eswitch control flow, and simplified IB and
ETH driver.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Saeed Mahameed 259fae5a2c Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Merge mlx5-next shared branched into net-next,

From Bodong Wang:
1) Introduction of ECPF (Embedded CPU Physical Function), and low level
bits for mlx5 SmartNic capabilities support.
2) Vport enumeration refactoring that affect mlx5_ib and mlx5_core

From Aya Levin,
3) Add support for 50Gbps per lane link modes in the Port Type and Speed
register (PTYS)
4) Refactor low level query functions for PTYS register
5) Add support for 50Gbps per lane link modes to mlx5_ib

Note: due to a change in API in mlx5/core and a later patch from net-next,
a fixup was squashed with this merge commit that replaces FDB_UPLINK_VPORT
with MLX5_VPORT_UPLINK which exists only in upstream net-next.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 16:45:31 -08:00
Kaike Wan e50838c27f IB/hfi1: Fix a build warning for TID RDMA READ
The following build warning was produced for the TID RDMA READ
patch ("IB/hfi1: Enable TID RDMA READ protocol"):

drivers/infiniband/hw/hfi1/qp.c: In function 'hfi1_setup_wqe':
drivers/infiniband/hw/hfi1/qp.c:328:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
   hfi1_setup_tid_rdma_wqe(qp, wqe);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/hfi1/qp.c:329:2: note: here
  case IB_QPT_UC:
  ^~~~

This patch will fix the issue by adding the "fall through" comment.

Fixes: f1ab4efa6d ("IB/hfi1: Enable TID RDMA READ protocol")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15 15:55:50 -07:00
Shamir Rabinovitch 8994445054 IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs
Now when we have the udata passed to all the ib_xxx object creation APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15 15:38:38 -07:00
Raju Rangoju f09ef134a7 iw_cxgb4: cq/qp mask depends on bar2 pages in a host page
Adjust the cq/qp mask based on the number of bar2 pages in a host page.

For user-mode rdma, the granularity of the BAR2 memory mapped to a user
rdma process during queue allocation must be based on the host page
size. The lld attributes udb_density and ucq_density are used to figure
out how many sge contexts are in a bar2 page. So the rdev->qpmask and
rdev->cqmask in iw_cxgb4 need to now be adjusted based on how many sge
bar2 pages are in a host page.

Otherwise the device fails to work on non 4k page size systems.

Fixes: 2391b0030e ("cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page size")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15 09:39:39 -07:00