Commit Graph

7321 Commits

Author SHA1 Message Date
Parav Pandit 7ec3df174f RDMA/mlx5: Use PCI device for dma mappings
DMA operation of the IB device is done using ib_device->dma_device.

Instead of accessing parent of the IB device, use the PCI dma device which
is setup to ib_device->dma_device during IB device registration.

Link: https://lore.kernel.org/r/20201125064628.8431-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:05 -04:00
Leon Romanovsky d4b2d19dc5 RDMA/mlx5: Silence the overflow warning while building offset mask
Coverity reports "Potentially overflowing expression ..." warning, which
is correct thing to complain from the compiler point of view, but this is
not possible in the current code. Still, this is a small error as there
are some future situations that might need to use a 32 bit offset. Use ULL
so the calculation works up to 63.

Fixes: b045db62f6 ("RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ")
Link: https://lore.kernel.org/r/20201125061704.6580-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:05 -04:00
Jason Gunthorpe d0b7721c5e RDMA/mlx5: Check for ERR_PTR from uverbs_zalloc()
The return code from uverbs_zalloc() was wrongly checked, it is ERR_PTR
not NULL like other allocators:

drivers/infiniband/hw/mlx5/devx.c:2110 devx_umem_reg_cmd_alloc() warn: passing zero to 'PTR_ERR'

Fixes: 878f7b31c3 ("RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx")
Link: https://lore.kernel.org/r/0-v1-4d05ccc1c223+173-devx_err_ptr_jgg@nvidia.com
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:04 -04:00
Weihang Li 66d86e529d RDMA/hns: Add UD support for HIP09
HIP09 supports service type of Unreliable Datagram, add necessary process
to enable this feature.

Link: https://lore.kernel.org/r/1605526408-6936-7-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:48 -04:00
Weihang Li 534c9bdb02 RDMA/hns: Simplify process of filling UD SQ WQE
There are some codes can be simplified or encapsulated in set_ud_wqe() to
make them easier to be understand.

Link: https://lore.kernel.org/r/1605526408-6936-6-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:48 -04:00
Weihang Li 148f904c6f RDMA/hns: Remove the portn field in UD SQ WQE
This field in UD WQE in not used by hardware.

Fixes: 7bdee4158b ("RDMA/hns: Fill sq wqe context of ud type in hip08")
Link: https://lore.kernel.org/r/1605526408-6936-5-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:48 -04:00
Weihang Li 3631dadfb1 RDMA/hns: Avoid setting loopback indicator when smac is same as dmac
The loopback flag will be set to 1 by the hardware when the source mac
address is same as the destination mac address. So the driver don't need
to compare them.

Fixes: d6a3627e31 ("RDMA/hns: Optimize wqe buffer set flow for post send")
Link: https://lore.kernel.org/r/1605526408-6936-4-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:48 -04:00
Weihang Li fba429fcf9 RDMA/hns: Fix missing fields in address vector
Traffic class and hop limit in address vector is not assigned from GRH,
but it will be filled into UD SQ WQE. So the hardware will get a wrong
value.

Fixes: 82e620d9c3 ("RDMA/hns: Modify the data structure of hns_roce_av")
Link: https://lore.kernel.org/r/1605526408-6936-3-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:47 -04:00
Weihang Li 7406c0036f RDMA/hns: Only record vlan info for HIP08
Information about vlan is stored in GMV(GID/MAC/VLAN) table for HIP09, so
there is no need to copy it to address vector.

Link: https://lore.kernel.org/r/1605526408-6936-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:47 -04:00
Avihai Horon 8138a4c21b RDMA/mlx4: Enable querying AH for XRC QP types
Address handle is set for connected QP types such as RC and UC, and thus
can also be queried.

Since XRC QP types INI and TGT are connected, it should be possible to
query their address handle as well.

Until now it was not the case, and although the firmware supported it, the
driver allowed querying the address handle only for RC and UC.

Hence, we enable it now for INI and TGT QPs as well.

Link: https://lore.kernel.org/r/20201115121425.139833-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 11:58:19 -04:00
Avihai Horon f957d4d09a RDMA/mlx5: Enable querying AH for XRC QP types
Address handle is set for connected QP types such as RC and UC, and thus
can also be queried.

Since XRC QP types INI and TGT are connected, it should be possible to
query their address handle as well.

Until now it was not the case, and although the firmware supported it, the
driver allowed querying the address handle only for RC and UC.

Hence, we enable it now for INI and TGT QPs as well.

Link: https://lore.kernel.org/r/20201115121425.139833-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 11:58:19 -04:00
Yixian Liu 17475e104d RDMA/hns: Bugfix for memory window mtpt configuration
When a memory window is bound to a memory region, the local write access
should be set for its mtpt table.

Fixes: c7c2819140 ("RDMA/hns: Add MW support for hip08")
Link: https://lore.kernel.org/r/1606386372-21094-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 10:57:32 -04:00
Wenpeng Liang ab6f7248cc RDMA/hns: Fix retry_cnt and rnr_cnt when querying QP
The maximum number of retransmission should be returned when querying QP,
not the value of retransmission counter.

Fixes: 99fcf82521 ("RDMA/hns: Fix the wrong value of rnr_retry when querying qp")
Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1606382977-21431-1-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 10:57:32 -04:00
Wenpeng Liang ebed7b7ca4 RDMA/hns: Fix wrong field of SRQ number the device supports
The SRQ capacity is got from the firmware, whose field should be ended at
bit 19.

Fixes: ba6bb7e974 ("RDMA/hns: Add interfaces to get pf capabilities from firmware")
Link: https://lore.kernel.org/r/1606382812-23636-1-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 10:57:32 -04:00
Dennis Dalessandro 3d2a9d6425 IB/hfi1: Ensure correct mm is used at all times
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.

To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.

Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it was
explicit. The mmu notifier deregistration happens when the user context is
torn down, the creation of which triggered the registration.

Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will be
addressed in a follow on patch.

Cc: <stable@vger.kernel.org>
Fixes: e0cf75deab ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a30 ("IB/hfi1: Make use of mm consistent")
Link: https://lore.kernel.org/r/20201125210112.104301.51331.stgit@awfm-01.aw.intel.com
Suggested-by: Jann Horn <jannh@google.com>
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-25 20:30:46 -04:00
Shiraz Saleem 2ed381439e RDMA/i40iw: Address an mmap handler exploit in i40iw
i40iw_mmap manipulates the vma->vm_pgoff to differentiate a push page mmap
vs a doorbell mmap, and uses it to compute the pfn in remap_pfn_range
without any validation. This is vulnerable to an mmap exploit as described
in: https://lore.kernel.org/r/20201119093523.7588-1-zhudi21@huawei.com

The push feature is disabled in the driver currently and therefore no push
mmaps are issued from user-space. The feature does not work as expected in
the x722 product.

Remove the push module parameter and all VMA attribute manipulations for
this feature in i40iw_mmap. Update i40iw_mmap to only allow DB user
mmapings at offset = 0. Check vm_pgoff for zero and if the mmaps are bound
to a single page.

Cc: <stable@kernel.org>
Fixes: d374984179 ("i40iw: add files for iwarp interface")
Link: https://lore.kernel.org/r/20201125005616.1800-2-shiraz.saleem@intel.com
Reported-by: Di Zhu <zhudi21@huawei.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-25 10:38:11 -04:00
Xi Wang 6f6e2dcbb8 RDMA/hns: Refactor the hns_roce_buf allocation flow
Add a group of flags to control the 'struct hns_roce_buf' allocation
flow, this is used to support the caller running in atomic context.

Link: https://lore.kernel.org/r/1605347916-15964-1-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 19:36:54 -04:00
Christophe JAILLET df0e4de29c IB/qib: Use dma_set_mask_and_coherent to simplify code
'pci_set_dma_mask()' + 'pci_set_consistent_dma_mask()' can be replaced by
an equivalent 'dma_set_mask_and_coherent()' which is much less verbose.

Link: https://lore.kernel.org/r/20201121095127.1335228-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:37:20 -04:00
Rikard Falkeborn 8210163022 RDMA/i40iw: Constify ops structs
The ops structs are never modified. Make them const to allow the compiler
to put them in read-only memory.

Link: https://lore.kernel.org/r/20201121002529.89148-1-rikard.falkeborn@gmail.com
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:30:17 -04:00
Xiongfeng Wang 6830ff853a IB/mthca: fix return value of error branch in mthca_init_cq()
We return 'err' in the error branch, but this variable may be set as zero
by the above code. Fix it by setting 'err' as a negative value before we
goto the error label.

Fixes: 74c2174e7b ("IB uverbs: add mthca user CQ support")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/1605837422-42724-1-git-send-email-wangxiongfeng2@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:22:34 -04:00
Kamal Heib 6d8285e604 RDMA/cxgb4: Validate the number of CQEs
Before create CQ, make sure that the requested number of CQEs is in the
supported range.

Fixes: cfdda9d764 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Link: https://lore.kernel.org/r/20201108132007.67537-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:19:46 -04:00
Gustavo A. R. Silva 808b2c925d IB/mlx5: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding the new pseudo-keyword fallthrough; instead of
letting the code fall through to the next case.

Link: https://lore.kernel.org/r/2b0c87362bc86f6adfe56a5a6685837b71022bbf.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 15:54:11 -04:00
Gustavo A. R. Silva c6191f83be IB/qedr: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.

Link: https://lore.kernel.org/r/8d7cf00ec3a4b27a895534e02077c2c9ed8a5f8e.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 15:54:11 -04:00
Gustavo A. R. Silva 667d457fa8 IB/mlx4: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.

Link: https://lore.kernel.org/r/0153716933e01608d46155941c447d011c59c1e4.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 15:54:10 -04:00
Gustavo A. R. Silva 4846bf44e1 IB/hfi1: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.

Link: https://lore.kernel.org/r/13cc2fe2cf8a71a778dbb3d996b07f5e5d04fd40.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 15:54:10 -04:00
Jakub Kicinski 56495a2442 Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-19 19:08:46 -08:00
Jason Gunthorpe bf3b7b7ba9 Merge branch 'for-rc' into rdma.git
From https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

The rc RDMA branch is needed due to dependencies on the next patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-17 15:20:26 -04:00
Jason Gunthorpe 8a7904a672 RDMA/mlx5: Lower setting the umem's PAS for SRQ
Some of the SRQ types are created using a WQ, and the WQ requires a
different parameter set to mlx5_umem_find_best_quantized_pgoff() as it has
a 5 bit page_offset.

Add the umem to the mlx5_srq_attr and defer computing the PAS data until
the code has figured out what kind of mailbox to use. Compute the PAS
directly from the umem for each of the four unique mailbox types.

This also avoids allocating memory to store the user PAS, instead it is
written directly to the mailbox as in most other cases.

Fixes: 01949d0109 ("net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0")
Link: https://lore.kernel.org/r/20201115114311.136250-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe 878f7b31c3 RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx
Since devx uses the new rdma_for_each_block() to fill the PAS it can also
use ib_umem_find_best_pgsz().

However, the umem constructionin devx is complicated, the umem must still
respect all the HW limits such as page_offset_quantized and the IOVA
alignment.

Since we don't know what the user intends to use the umem for we have to
limit it to PAGE_SIZE.

There are users trying to mix umem's with mkeys so this makes them work
reliably, at least for an identity IOVA, by ensuring the IOVA matches the
selected page size.

Last user of mlx5_ib_get_buf_offset() so it can also be removed.

Fixes: aeae94579c ("IB/mlx5: Add DEVX support for memory registration")
Link: https://lore.kernel.org/r/20201115114311.136250-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe c08fbdc577 RDMA/mlx5: mlx5_umem_find_best_quantized_pgoff() for CQ
This fixes a bug where the page_offset was not being considered when
building a CQ. The HW specification says it 'must be zero', so use
a variant of mlx5_umem_find_best_quantized_pgoff() with a 0 pgoff_bitmask
to force this result.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20201115114311.136250-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe a59b7b05ef RDMA/mlx5: Use mlx5_umem_find_best_quantized_pgoff() for QP
Delete custom logic in the QP in favor of more general variant.

Link: https://lore.kernel.org/r/20201115114311.136250-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe 7579dcdf73 RDMA/mlx5: Directly compute the PAS list for raw QP RQ's
The RQ WQ created when making a raw ethernet QP copies the PAS list from
a dummy QPC command created earlier in the flow. The WQC and QPC PAS lists
are not fully compatible as the page_offset is a different size.

Create the RQ WQ's PAS list directly and do not try to copy it from
another command structure.

Like the prior patch, this also means that badly aligned buffers were not
correctly rejected.

Link: https://lore.kernel.org/r/20201115114311.136250-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:29 -04:00
Jason Gunthorpe ad480ea5d6 RDMA/mlx5: Use mlx5_umem_find_best_quantized_pgoff() for WQ
This fixes a subtle bug, the WQ mailbox has only 5 bits to describe the
page_offset, while mlx5_ib_get_buf_offset() is hard wired to only work
with 6 bit page_offsets.

Thus it did not properly reject badly aligned buffers.

Fixes: 79b20a6c30 ("IB/mlx5: Add receive Work Queue verbs")
Fixes: 0fb2ed66a1 ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Link: https://lore.kernel.org/r/20201115114311.136250-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:29 -04:00
Jason Gunthorpe b045db62f6 RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ
SRQ uses a quantized and scaled page_offset, which is another variation of
ib_umem_find_best_pgsz(). Add mlx5_umem_find_best_quantized_pgoff() to
perform this calculation for each mailbox. A macro shows how the
calculation is directly connected to the mailbox format.

This new routine replaces the limited mlx5_ib_cont_pages() and
mlx5_ib_get_buf_offset() pairing which would reject valid configurations
rather than adjust the page_size to make it work.

In turn this is much more aggressive about choosing large page sizes for
these objects and when THP is enabled it will now often find a single page
solution.

Link: https://lore.kernel.org/r/20201115114311.136250-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:29 -04:00
Gal Pressman 8c030d780a RDMA/efa: Remove .create_ah callback assignment
Drivers now expose two callbacks for address handle creation, one for
uverbs and one for kverbs. EFA only supports uverbs so the .create_ah
assignment can be removed. Fix the core code caller to check the proper
function pointer.

Link: https://lore.kernel.org/r/20201115103404.48829-3-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:50:30 -04:00
Lang Cheng 31e2daa17e RDMA/hns: Add new PCI device ID matching for HIP09
The 200G device has a new device ID 0xA228, add it to the PCI table.

Link: https://lore.kernel.org/r/1605187184-26079-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:46:59 -04:00
Zhang Changzhong dabbd6abcd IB/hfi1: Fix error return code in hfi1_init_dd()
Fix to return a negative error code from the error handling case instead
of 0, as done elsewhere in this function.

Fixes: 4730f4a6c6 ("IB/hfi1: Activate the dummy netdev")
Link: https://lore.kernel.org/r/1605249747-17942-1-git-send-email-zhangchangzhong@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-13 12:25:21 -04:00
Heiner Kallweit aa0616a9bd IB/hfi1: switch to core handling of rx/tx byte/packet counters
Use netdev->tstats instead of a member of hfi1_ipoib_dev_priv for storing
a pointer to the per-cpu counters. This allows us to use core
functionality for statistics handling.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 14:58:13 -08:00
Weihang Li 7af80c02c7 RDMA/hns: Fix double free of the pointer to TSQ/TPQ
A return statement is omitted after getting HEM table, then the newly
allocated pointer will be freed directly, which will cause a calltrace
when the driver was removed.

Fixes: d6d91e4621 ("RDMA/hns: Add support for configuring GMV table")
Link: https://lore.kernel.org/r/1605180582-46504-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 13:33:44 -04:00
Qinglang Miao d035c3f6cd RDMA/pvrdma: Fix missing kfree() in pvrdma_register_device()
Fix missing kfree in pvrdma_register_device() when failure from
ib_device_set_netdev().

Fixes: 4b38da75e0 ("RDMA/drivers: Convert easy drivers to use ib_device_set_netdev()")
Link: https://lore.kernel.org/r/20201111032202.17925-1-miaoqinglang@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 13:05:46 -04:00
Arnd Bergmann fbb7dc5db6 RDMa/mthca: Work around -Wenum-conversion warning
gcc points out a suspicious mixing of enum types in a function that
converts from MTHCA_OPCODE_* values to IB_WC_* values:

drivers/infiniband/hw/mthca/mthca_cq.c: In function 'mthca_poll_one':
drivers/infiniband/hw/mthca/mthca_cq.c:607:21: warning: implicit conversion from 'enum <anonymous>' to 'enum ib_wc_opcode' [-Wenum-conversion]
  607 |    entry->opcode    = MTHCA_OPCODE_INVALID;

Nothing seems to ever check for MTHCA_OPCODE_INVALID again, no idea if
this is meaningful, but it seems harmless as it deals with an invalid
input.

Remove MTHCA_OPCODE_INVALID and set the ib_wc_opcode to 0xFF, which is
still bogus, but at least doesn't make compiler warnings.

Fixes: 2a4443a699 ("[PATCH] IB/mthca: fill in opcode field for send completions")
Link: https://lore.kernel.org/r/20201026211311.3887003-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 12:47:21 -04:00
Leon Romanovsky c5633a72a1 RDMA/core: Make FD destroy callback void
All FD object destroy implementations return 0, so declare this callback
void.

Link: https://lore.kernel.org/r/20201104144556.3809085-3-leon@kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 12:32:17 -04:00
Leon Romanovsky efa968ee20 RDMA/core: Postpone uobject cleanup on failure till FD close
Remove the ib_is_destroyable_retryable() concept.

The idea here was to allow the drivers to forcibly clean the HW object
even if they otherwise didn't want to (eg because of usecnt). This was an
attempt to clean up in a world where drivers were not allowed to fail HW
object destruction.

Now that we are going back to allowing HW objects to fail destroy this
doesn't make sense. Instead if a uobject's HW object can't be destroyed it
is left on the uobject list and it is up to uverbs_destroy_ufile_hw() to
clean it. Multiple passes over the uobject list allow hidden dependencies
to be resolved. If that fails the HW driver is broken, throw a WARN_ON and
leak the HW object memory.

All the other tricky failure paths (eg on creation error unwind) have
already been updated to this new model.

Link: https://lore.kernel.org/r/20201104144556.3809085-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 12:32:17 -04:00
Adit Ranadive 00469c97ef RDMA/vmw_pvrdma: Fix the active_speed and phys_state value
The pvrdma_port_attr structure is ABI toward the hypervisor, changing it
breaks the ability to report the speed properly. Revert the change to u16.

Fixes: 376ceb31ff ("RDMA: Fix link active_speed size")
Link: https://lore.kernel.org/r/20201102225437.26557-1-aditr@vmware.com
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 20:00:10 -04:00
Meir Lichtinger f946e45f59 IB/mlx5: Add support for NDR link speed
The IBTA specification has new speed - NDR. That speed supports signaling
rate of 100Gb. mlx5 IB driver translates link modes reported by ConnectX
device to IB speed and width. Added translation of new 100Gb, 200Gb and
400Gb link modes to NDR IB type and width of x1, x2 or x4 respectively.

Link: https://lore.kernel.org/r/20201026133738.1340432-3-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:48:57 -04:00
Jason Gunthorpe d5c7916fe4 RDMA/mlx5: Use ib_umem_find_best_pgsz() for mkc's
Now that all the PAS arrays or UMR XLT's for mkcs are filled using
rdma_for_each_block() we can use the common ib_umem_find_best_pgsz()
algorithm.

Link: https://lore.kernel.org/r/20201026132314.1336717-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:10:50 -04:00
Jason Gunthorpe f1eaac37da RDMA/mlx5: Split mlx5_ib_update_xlt() into ODP and non-ODP cases
Mixing these together is just a mess, make a dedicated version,
mlx5_ib_update_mr_pas(), which directly loads the whole MTT for a non-ODP
MR.

The split out version can trivially use a simple loop with
rdma_for_each_block() which allows using the core code to compute the MR
pages and avoids seeking in the SGL list after each chunk as the
__mlx5_ib_populate_pas() call required.

Significantly speeds loading large MTTs.

Link: https://lore.kernel.org/r/20201026132314.1336717-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:10:50 -04:00
Jason Gunthorpe 8010d74b99 RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()
The memory allocation is quite complicated, and makes this function hard
to understand. Refactor things so that a function call sets up the WR, SG,
DMA mapping and buffer, further splitting that into buffer and DMA/wr.

This also slightly changes the buffer allocation logic to try an order 0
page allocation (with OOM warnings on) before going to the emergency page.

Link: https://lore.kernel.org/r/20201026132314.1336717-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:55 -04:00
Jason Gunthorpe f22c30aa6d RDMA/mlx5: Move xlt_emergency_page_mutex into mr.c
This is the only user, so remove the wrappers.

Link: https://lore.kernel.org/r/20201026132314.1336717-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:54 -04:00
Jason Gunthorpe aab8d3966d RDMA/mlx5: Change mlx5_ib_populate_pas() to use rdma_for_each_block()
This routine converts the umem SGL into a list of fixed pages for DMA,
which is exactly what rdma_umem_for_each_dma_block() is for, use the
common code directly.

Link: https://lore.kernel.org/r/20201026132314.1336717-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:54 -04:00
Jason Gunthorpe f8fb311063 RDMA/mlx5: Remove npages from mlx5_ib_cont_pages()
Most callers don't need this, and the few that do can get it as
ib_umem_num_pages(umem).

Link: https://lore.kernel.org/r/20201026131936.1335664-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe 7db0eea916 RDMA/mlx5: Remove ncont from mlx5_ib_cont_pages()
This is the same as ib_umem_num_dma_blocks(umem, 1UL << page_shift), have
the callers compute it directly.

Link: https://lore.kernel.org/r/20201026131936.1335664-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe 95741ee3f0 RDMA/mlx5: Remove order from mlx5_ib_cont_pages()
Only alloc_mr_from_cache() needs order and can trivially compute it, so
lift it to the one call site and remove the NULL arguments.

Link: https://lore.kernel.org/r/20201026131936.1335664-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe f0093fb1a7 RDMA/mlx5: Move mlx5_ib_cont_pages() to the creation of the mlx5_ib_mr
For the user MR path, instead of calling this after getting the umem, call
it as part of creating the struct mlx5_ib_mr and distill its output to a
single page_shift stored inside the mr.

This avoids passing around the tuple of its output. Based on the umem and
page_shift, the output arguments can be computed using:

  count == ib_umem_num_pages(mr->umem)
  shift == mr->page_shift
  ncont == ib_umem_num_dma_blocks(mr->umem, 1 << mr->page_shift)
  order == order_base_2(ncont)

And since mr->page_shift == umem_odp->page_shift then ncont ==
ib_umem_num_dma_blocks() == ib_umem_odp_num_pages() for ODP umems.

Link: https://lore.kernel.org/r/20201026131936.1335664-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe 1c3d247eee RDMA/mlx5: Remove mlx5_ib_mr->npages
This is the same value as ib_umem_num_pages(mr->umem), use that instead.

Link: https://lore.kernel.org/r/20201026131936.1335664-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe fc3325701a RDMA/mlx5: Fix corruption of reg_pages in mlx5_ib_rereg_user_mr()
reg_pages should always contain mr->npage since when the mr is finally
de-reg'd it is always subtracted out.

If there were any error exits then mlx5_ib_rereg_user_mr() would leave the
reg_pages adjusted and this will cause it to be double subtracted
eventually.

The manipulation of reg_pages is inherently connected to the umem, so lift
it out of set_mr_fields() and only adjust it around creating/destroying a
umem.

reg_pages is only used for diagnostics in sysfs.

Fixes: 7d0cc6edcc ("IB/mlx5: Add MR cache for large UMR regions")
Link: https://lore.kernel.org/r/20201026131936.1335664-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:31:40 -04:00
Jason Gunthorpe b4d031cdae RDMA/mlx5: Remove mlx5_ib_mr->order
The is only ever set to non-zero if the MR is from the cache, and if it is
cached then the order is in cached_ent->order.

Make it clearer that use_umr_mtt_update() only returns true for cached MRs
and remove the redundant data.

Link: https://lore.kernel.org/r/20201026131936.1335664-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:31:40 -04:00
Joe Perches e28bf1f03b RDMA: Convert various random sprintf sysfs _show uses to sysfs_emit
Manual changes for sysfs_emit as cocci scripts can't easily convert them.

Link: https://lore.kernel.org/r/ecde7791467cddb570c6f6d2c908ffbab9145cac.1602122880.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-30 21:03:52 -03:00
Joe Perches 45808361d4 RDMA: Manual changes for sysfs_emit and neatening
Make changes to use sysfs_emit in the RDMA code as cocci scripts can not
be written to handle _all_ the possible variants of various sprintf family
uses in sysfs show functions.

While there, make the code more legible and update its style to be more
like the typical kernel styles.

Miscellanea:

o Use intermediate pointers for dereferences
o Add and use string lookup functions
o return early when any intermediate call fails so normal return is
  at the bottom of the function
o mlx4/mcg.c:sysfs_show_group: use scnprintf to format intermediate strings

Link: https://lore.kernel.org/r/f5c9e4c9d8dafca1b7b70bd597ee7f8f219c31c8.1602122880.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-30 21:03:52 -03:00
Weihang Li 32053e584e RDMA/hns: Add support for filling GMV table
Add a interface to fill GMV(SGID/SMAC/VLAN) table for HIP09, all of above
source address information is stored as an entry in GMV table. The users
just need to provide the index to the hardware when POST SEND.

Link: https://lore.kernel.org/r/1603508836-33054-3-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:40:42 -03:00
Weihang Li d6d91e4621 RDMA/hns: Add support for configuring GMV table
HIP09 supports to store SGID/SMAC/VLAN together in a table named GMV. The
driver needs to allocate memory for it and tell the information about this
region to hardware.

Link: https://lore.kernel.org/r/1603508836-33054-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:40:42 -03:00
Lang Cheng aba457ca89 RDMA/hns: Support owner mode doorbell
The doorbell needs to store PI information into QPC, so the RoCEE should
wait for the results of storing, that is, it needs two bus operations to
complete a doorbell. When ROCEE is in SDI mode, multiple doorbells may be
interlocked because the RoCEE can only handle bus operations serially. So a
flag to mark if HIP09 is working in SDI mode is added. When the SDI flag is
set, the ROCEE will ignore the PI information of the doorbell, continue to
fetch wqe and verify its validity by it's owner_bit.

Link: https://lore.kernel.org/r/1603195493-22741-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:13:41 -03:00
Alok Prasad a2267f8a52 RDMA/qedr: Fix memory leak in iWARP CM
Fixes memory leak in iWARP CM

Fixes: e411e0587e ("RDMA/qedr: Add iWARP connection management functions")
Link: https://lore.kernel.org/r/20201021115008.28138-1-palok@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 09:45:25 -03:00
Selvin Xavier b898d5c50c RDMA/bnxt_re: Fix entry size during SRQ create
Only static WQE is supported for SRQ. So always use the max supported SGEs
while calculating SRQ entry size.

Fixes: 2bb3c32c5c ("RDMA/bnxt_re: Change wr posting logic to accommodate variable wqes")
Link: https://lore.kernel.org/r/1602569752-12745-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-27 14:12:26 -03:00
Joe Perches 1c7fd72687 RDMA: Convert sysfs device * show functions to use sysfs_emit()
Done with cocci script:

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	strcpy(buf, chr);
+	sysfs_emit(buf, chr);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
-	len += scnprintf(buf + len, PAGE_SIZE - len,
+	len += sysfs_emit_at(buf, len,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	...
-	strcpy(buf, chr);
-	return strlen(buf);
+	return sysfs_emit(buf, chr);
}

Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:53:21 -03:00
Jason Gunthorpe 676a80adba RDMA: Remove AH from uverbs_cmd_mask
Drivers that need a uverbs AH should instead set the create_user_ah() op
similar to reg_user_mr(). MODIFY_AH and QUERY_AH cmds were never
implemented so are just deleted.

Link: https://lore.kernel.org/r/11-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:28:00 -03:00
Jason Gunthorpe 628c02bf38 RDMA: Remove uverbs cmds from drivers that don't use them
Allowing userspace to invoke these commands is probably going to crash
these drivers as they are not tested and not expecting to use them on a
user object.

For example pvrdma touches cq->ring_state which is not initialized for
user QPs.

These commands are effected:

- IB_USER_VERBS_CMD_REQ_NOTIFY_CQ is ibv_cmd_req_notify_cq() in
  rdma-core, only hfi1, ipath and rxe calls it.

- IB_USER_VERBS_CMD_POLL_CQ is ibv_cmd_poll_cq() in rdma-core, only
  ipath and hfi1 calls it.

- IB_USER_VERBS_CMD_POST_SEND/RECV is ibv_cmd_post_send/recv() in
  rdma-core, only ipath and hfi1 call them.

- IB_USER_VERBS_CMD_POST_SRQ_RECV is ibv_cmd_post_srq_recv() in
  rdma-core, only ipath and hfi1 calls it.

- IB_USER_VERBS_CMD_PEEK_CQ isn't even implemented anywhere

- IB_USER_VERBS_CMD_CREATE/DESTROY_AH is ibv_cmd_create/destroy_ah() in
  rdma-core, only bnxt_re, efa, hfi1, ipath, mlx5, orcrdma, and rxe call
  it.

Link: https://lore.kernel.org/r/10-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:28:00 -03:00
Jason Gunthorpe 1f11a7610e RDMA: Check create_flags during create_qp
Each driver should check that the QP attrs create_flags is supported.
Unfortuantely when create_flags was added to the QP attrs the drivers were
not updated. uverbs_ex_cmd_mask was used to block it - even though kernel
drivers use these flags too.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_QP from uverbs_ex_cmd_mask. Fix the error code
to be EOPNOTSUPP.

Link: https://lore.kernel.org/r/8-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe 1c407cb5d7 RDMA: Check flags during create_cq
Each driver should check that the CQ attrs is supported. Unfortuantely
when flags was added to the CQ attrs the drivers were not updated,
uverbs_ex_cmd_mask was used to block it. This was missed when create CQ
was converted to ioctl, so non-zero flags could have been passed into
drivers.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_CQ from uverbs_ex_cmd_mask.

Fixes: 41b2a71fc8 ("IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file")
Link: https://lore.kernel.org/r/7-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe 26e990badd RDMA: Check attr_mask during modify_qp
Each driver should check that it can support the provided attr_mask during
modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block
modify_qp_ex because the driver didn't check RATE_LIMIT.

Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe 652caba5b5 RDMA: Check srq_type during create_srq
uverbs was blocking srq_types the driver doesn't support based on the
CREATE_XSRQ cmd_mask. Fix all drivers to check for supported srq_types
during create_srq and move CREATE_XSRQ to the core code.

Link: https://lore.kernel.org/r/5-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe 44ce37bc8b RDMA: Move more uverbs_cmd_mask settings to the core
These functions all depend on the driver providing a specific op:

- REREG_MR is rereg_user_mr(). bnxt_re set this without providing the op
- ATTACH/DEATCH_MCAST is attach_mcast()/detach_mcast(). usnic set this
  without providing the op
- OPEN_QP doesn't involve the driver but requires a XRCD. qedr provides
  xrcd but forgot to set it, usnic doesn't provide XRCD but set it anyhow.
- OPEN/CLOSE_XRCD are the ops alloc_xrcd()/dealloc_xrcd()
- CREATE_SRQ/DESTROY_SRQ are the ops create_srq()/destroy_srq()
- QUERY/MODIFY_SRQ is op query_srq()/modify_srq(). hns sets this but
  sometimes supplies a NULL op.
- RESIZE_CQ is op resize_cq(). bnxt_re sets this boes doesn't supply an op
- ALLOC/DEALLOC_MW is alloc_mw()/dealloc_mw(). cxgb4 provided an
  (now deleted) implementation but no userspace

All drivers were checked that no drivers provide the op without also
setting uverbs_cmd_mask so this should have no functional change.

Link: https://lore.kernel.org/r/4-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe c074bb1e30 RDMA: Remove elements in uverbs_cmd_mask that all drivers set
This is a step toward eliminating uverbs_cmd_mask. Preset this list in the
core code. Only the op reg_user_mr wasn't already being required from the
drivers.

Link: https://lore.kernel.org/r/3-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:57 -03:00
Jason Gunthorpe b8e3130dd9 RDMA: Remove uverbs_ex_cmd_mask values that are linked to functions
Since a while now the uverbs layer checks if the driver implements a
function before allowing the ucmd to proceed. This largely obsoletes the
cmd_mask stuff, but there is some tricky bits in drivers preventing it
from being removed.

Remove the easy elements of uverbs_ex_cmd_mask by pre-setting them in the
core code. These are triggered soley based on the related ops function
pointer.

query_device_ex is not triggered based on an op, but all drivers already
implement something compatible with the extension, so enable it globally
too.

Link: https://lore.kernel.org/r/2-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:56 -03:00
Jason Gunthorpe a5c29a262e RDMA/cxgb4: Remove MW support
This driver never enabled IB_USER_VERBS_CMD_ALLOC_MW so memory windows
were not usable from userspace. The kernel side was removed long ago. Drop
this dead code.

Fixes: feb7c1e38b ("IB: remove in-kernel support for memory windows")
Link: https://lore.kernel.org/r/1-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:44 -03:00
Kamal Heib 53839b51a7 RDMA/bnxt_re: Set queue pair state when being queried
The API for ib_query_qp requires the driver to set cur_qp_state on return,
add the missing set.

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/20201021114952.38876-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:22:15 -03:00
Parav Pandit fbdd0049d9 RDMA/mlx5: Fix devlink deadlock on net namespace deletion
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence.  A below call graph
of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:18:19 -03:00
Linus Torvalds a1e16bc7d5 RDMA 5.10 pull request
The typical set of driver updates across the subsystem:
 
  - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns,
    usnic, qib, qedr, cxgb4, hns, bnxt_re
 
  - Various rtrs fixes and updates
 
  - Bug fix for mlx4 CM emulation for virtualization scenarios where MRA
    wasn't working right
 
  - Use tracepoints instead of pr_debug in the CM code
 
  - Scrub the locking in ucma and cma to close more syzkaller bugs
 
  - Use tasklet_setup in the subsystem
 
  - Revert the idea that 'destroy' operations are not allowed to fail at
    the driver level. This proved unworkable from a HW perspective.
 
  - Revise how the umem API works so drivers make fewer mistakes using it
 
  - XRC support for qedr
 
  - Convert uverbs objects RWQ and MW to new the allocation scheme
 
  - Large queue entry sizes for hns
 
  - Use hmm_range_fault() for mlx5 On Demand Paging
 
  - uverbs APIs to inspect the GID table instead of sysfs
 
  - Move some of the RDMA code for building large page SGLs into
    lib/scatterlist
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl+J37MACgkQOG33FX4g
 mxrKfRAAnIecwdE8df0yvVU5k0Eg6qVjMy9MMHq4va9m7g6GpUcNNI0nIlOASxH2
 l+9vnUQS3ebgsPeECaDYzEr0hh/u53+xw2g4WV5ts/hE8KkQ6erruXb9kasCe8yi
 5QWJ9K36T3c03Cd3EeH6JVtytAxuH42ombfo9BkFLPVyfG/R2tsAzvm5pVi73lxk
 46wtU1Bqi4tsLhyCbifn1huNFGbHp08OIBPAIKPUKCA+iBRPaWS+Dpi+93h3g3Bp
 oJwDhL9CBCGcHM+rKWLzek3Dy87FnQn7R1wmTpUFwkK+4AH3U/XazivhX035w1vL
 YJyhakVU0kosHlX9hJTNKDHJGkt0YEV2mS8dxAuqilFBtdnrVszb5/MirvlzC310
 /b5xCPSEusv9UVZV0G4zbySVNA9knZ4YaRiR3VDVMLKl/pJgTOwEiHIIx+vs3ejk
 p8GRWa1SjXw5LfZEQcq39J689ljt6xjCTonyuBSv7vSQq5v8pjBxvHxiAe2FIa2a
 ZyZeSCYoSh0SwJQukO2VO7aprhHP3TcCJ/987+X03LQ8tV2VWPktHqm62YCaDcOl
 fgiQuQdPivRjDDkJgMfDWDGKfZeHoWLKl5XsJhWByt0lablVrsvc+8ylUl1UI7gI
 16hWB/Qtlhfwg10VdApn+aOFpIS+s5P4XIp8ik57MZO+VeJzpmE=
 =LKpl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A usual cycle for RDMA with a typical mix of driver and core subsystem
  updates:

   - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
     hns, usnic, qib, qedr, cxgb4, hns, bnxt_re

   - Various rtrs fixes and updates

   - Bug fix for mlx4 CM emulation for virtualization scenarios where
     MRA wasn't working right

   - Use tracepoints instead of pr_debug in the CM code

   - Scrub the locking in ucma and cma to close more syzkaller bugs

   - Use tasklet_setup in the subsystem

   - Revert the idea that 'destroy' operations are not allowed to fail
     at the driver level. This proved unworkable from a HW perspective.

   - Revise how the umem API works so drivers make fewer mistakes using
     it

   - XRC support for qedr

   - Convert uverbs objects RWQ and MW to new the allocation scheme

   - Large queue entry sizes for hns

   - Use hmm_range_fault() for mlx5 On Demand Paging

   - uverbs APIs to inspect the GID table instead of sysfs

   - Move some of the RDMA code for building large page SGLs into
     lib/scatterlist"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
  RDMA/ucma: Fix use after free in destroy id flow
  RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
  RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
  RDMA: Explicitly pass in the dma_device to ib_register_device
  lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
  IB/mlx4: Convert rej_tmout radix-tree to XArray
  RDMA/rxe: Fix bug rejecting all multicast packets
  RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
  RDMA/rxe: Remove duplicate entries in struct rxe_mr
  IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
  IB/rdmavt: Fix sizeof mismatch
  MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
  RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
  RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Move to allocate SG table from pages
  lib/scatterlist: Add support in dynamic allocation of SG table from pages
  tools/testing/scatterlist: Show errors in human readable form
  tools/testing/scatterlist: Rejuvenate bit-rotten test
  RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
  RDMA/uverbs: Expose the new GID query API to user space
  ...
2020-10-17 11:18:18 -07:00
Jason Gunthorpe e0477b34d9 RDMA: Explicitly pass in the dma_device to ib_register_device
The code in setup_dma_device has become rather convoluted, move all of
this to the drivers. Drives now pass in a DMA capable struct device which
will be used to setup DMA, or drivers must fully configure the ibdev for
DMA and pass in NULL.

Other than setting the masks in rvt all drivers were doing this already
anyhow.

mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for
DMA based on their hardweare limits in:
__mthca_init_one()
  dma_set_max_seg_size (1G)

__mlx4_init_one()
  dma_set_max_seg_size (1G)

mlx5_pci_init()
  set_dma_caps()
    dma_set_max_seg_size (2G)

Other non software drivers (except usnic) were extended to UINT_MAX [1, 2]
instead of 2G as was before.

[1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/
[2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/

Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org
Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16 13:53:46 -03:00
Heiner Kallweit 3b51788a2d IB/hfi1: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/6cad1a04-f021-d94b-45fd-7cc7cf07367d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13 17:33:48 -07:00
Håkon Bugge bf6a47644e IB/mlx4: Convert rej_tmout radix-tree to XArray
Was missed during the initial review of the below patch

Fixes: 227a0e142e ("IB/mlx4: Add support for REJ due to timeout")
Link: https://lore.kernel.org/r/1602253482-6718-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-09 12:34:49 -03:00
Colin Ian King 73c5265913 RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
An incorrect sizeof is being used, u64 * is not correct, it should be just
u64 for a table of umem_pgs number of u64 items in the pbl_tbl.  Use the
idiom sizeof(*pbl_tbl) to get the object type without the need to
explicitly use u64.

Link: https://lore.kernel.org/r/20201006114700.537916-1-colin.king@canonical.com
Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-06 16:56:36 -03:00
Jason Gunthorpe 6ef999f500 RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
This driver is taking the SGL out of the umem and passing it through a
struct bnxt_qplib_sg_info. Instead of passing the SGL pass the umem and
then use rdma_umem_for_each_dma_block() directly.

Move the calls of ib_umem_num_dma_blocks() closer to their actual point of
use, npages is only set for non-umem pbl flows.

Link: https://lore.kernel.org/r/0-v1-b37437a73f35+49c-bnxt_re_dma_block_jgg@nvidia.com
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Tested-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-06 16:45:53 -03:00
Avihai Horon 1c15b4f2a4 RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so
enum ib_gid_type will match the gid types of the new query GID table API
which will be introduced in the following patches.

This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.

Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Alok Prasad f45271acdf RDMA/qedr: Endianness warnings cleanup
Making a change to fix following sparse warnings reported by kbuild bot.

 CHECK   drivers/infiniband/hw/qedr/verbs.c

drivers/infiniband/hw/qedr/verbs.c:3872:59: warning: incorrect type in assignment (different base types)
drivers/infiniband/hw/qedr/verbs.c:3872:59:    expected restricted __le32 [usertype] sge_prod
drivers/infiniband/hw/qedr/verbs.c:3872:59:    got unsigned int [usertype] sge_prod
drivers/infiniband/hw/qedr/verbs.c:3875:59: warning: incorrect type in assignment (different base types)
drivers/infiniband/hw/qedr/verbs.c:3875:59:    expected restricted __le32 [usertype] wqe_prod
drivers/infiniband/hw/qedr/verbs.c:3875:59:    got unsigned int [usertype] wqe_prod

Link: https://lore.kernel.org/r/20201001100959.19940-1-palok@marvell.com
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: acca72e2b0 ("RDMA/qedr: SRQ's bug fixes")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 20:50:37 -03:00
Yishai Hadas a03bfc37d5 RDMA/mlx5: Sync device with CPU pages upon ODP MR registration
Sync device with CPU pages upon ODP MR registration. mlx5 already has to
zero the HW's version of the PAS list, may as well deliver a PAS list that
matches the current CPU page tables configuration.

Link: https://lore.kernel.org/r/20200930163828.1336747-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:44 -03:00
Yishai Hadas 677cf51f71 RDMA/mlx5: Extend advice MR to support non faulting mode
Extend advice MR to support non faulting mode, this can improve
performance by increasing the populated page tables in the device.

Link: https://lore.kernel.org/r/20200930163828.1336747-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:05 -03:00
Yishai Hadas 8bfafde086 IB/core: Enable ODP sync without faulting
Enable ODP sync without faulting, this improves performance by reducing
the number of page faults in the system.

The gain from this option is that the device page table can be aligned
with the presented pages in the CPU page table without causing page
faults.

As of that, the overhead on data path from hardware point of view to
trigger a fault which end-up by calling the driver to bring the pages
will be dropped.

Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:05 -03:00
Yishai Hadas 36f30e486d IB/core: Improve ODP to use hmm_range_fault()
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve
performance in a few aspects:

This includes:
- Dropping the need to allocate and free memory to hold its output

- No need any more to use put_page() to unpin the pages

- The logic to detect contiguous pages is done based on the returned
  order, no need to run per page and evaluate.

In addition, moving to use hmm_range_fault() enables to reduce page faults
in the system with it's snapshot mode, this will be introduced in next
patches from this series.

As part of this, cleanup some flows and use the required data structures
to work with hmm_range_fault().

Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:39:54 -03:00
Lang Cheng cf4c0fb00d RDMA/hns: Remove unused variables and definitions
Some code was removed but the variables were still there, and some
parameters have been changed to be queried from firmware. So the
definitions of them are no longer needed.

Fixes: 2a3d923f87 ("RDMA/hns: Replace magic numbers with #defines")
Fixes: 82e620d9c3 ("RDMA/hns: Modify the data structure of hns_roce_av")
Fixes: 8254746978 ("IB/hns: Implement the add_gid/del_gid and optimize the GIDs management")
Fixes: 21b97f5387 ("RDMA/hns: Fixup qp release bug")
Link: https://lore.kernel.org/r/1601371934-40003-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 14:01:20 -03:00
Leon Romanovsky d4f40a1fb9 RDMA/i40iw: Remove intermediate pointer that points to the same struct
There is no real need to have an intermediate pointer for the same struct,
remove it, and use struct directly.

Link: https://lore.kernel.org/r/20200926102450.2966017-11-leon@kernel.org
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:07 -03:00
Leon Romanovsky 21c2fe94ab RDMA/mthca: Combine special QP struct with mthca QP
As preparation for the removal of QP allocation logic, we need to ensure
that ib_core allocates the right amount of memory before a call to the
driver create_qp(). It requires from driver to have the same structs for
all types of QPs.

Link: https://lore.kernel.org/r/20200926102450.2966017-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:07 -03:00
Leon Romanovsky b925c555a1 RDMA/drivers: Remove udata check from special QP
GSI QP can't be created from the user space, hence the udata check is
always false (udata == NULL). Remove that check and simplify the flow.

Link: https://lore.kernel.org/r/20200926102450.2966017-9-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:06 -03:00
Leon Romanovsky 8fd3cd2ae5 RDMA/mlx4: Prepare QP allocation to remove from the driver
Since all mlx4 QP have same storage type, move the QP allocation to be in
one place. This change is preparation to removal of such allocation from
the driver.

Link: https://lore.kernel.org/r/20200926102450.2966017-7-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:06 -03:00
Leon Romanovsky 915ec7ed91 RDMA/mlx4: Embed GSI QP into general mlx4_ib QP
Refactor the storage struct of mlx4 GSI QP to be embedded in mlx4_ib
QP. This allows to remove internal memory allocation of QP struct which is
hidden inside the mlx4_ib_create_qp() flow.

Link: https://lore.kernel.org/r/20200926102450.2966017-6-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:49 -03:00
Leon Romanovsky eebe580feb RDMA/mlx5: Delete not needed GSI QP signal QP type
GSI QP doesn't need signal QP type because it is initialized statically to
zero, which is IB_SIGNAL_ALL_WR also wr->send_flags isn't set too.  This
means that the GSI QP signal QP type can be removed.

Link: https://lore.kernel.org/r/20200926102450.2966017-5-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:49 -03:00
Leon Romanovsky 2dc4d6725b RDMA/mlx5: Change GSI QP to have same creation flow like other QPs
There is no reason to have separate create flow for the GSI QP, while
general create_qp routine has all needed checks and ability to allocate
and free the proper struct mlx5_ib_qp.

Link: https://lore.kernel.org/r/20200926102450.2966017-4-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Leon Romanovsky f8225e3488 RDMA/mlx5: Reuse existing fields in parent QP storage object
Remove duplication of mlx5_ib_qp and mlx5_ib_gsi_qp fields.  This change
returns the memory footprint of mlx5_ib QP to be as it was before
embedding GSI QP.

Link: https://lore.kernel.org/r/20200926102450.2966017-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Leon Romanovsky 0d9aef8603 RDMA/mlx5: Embed GSI QP into general mlx5_ib QP
The GSI QPs have different create flow from the regular QPs, but it is not
really needed. Update the code to use mlx5_ib_qp as a storage class for
all outside of GSI calls.

Link: https://lore.kernel.org/r/20200926102450.2966017-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Liu Shixin b942fc0319 RDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters()
sizeof() when applied to a pointer typed expression should give the size
of the pointed data, even if the data is a pointer.

Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Link: https://lore.kernel.org/r/20200917081354.2083293-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-25 09:17:42 -03:00