linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	84e39eeb08	Second round of merge items for 4.8 - hfi1 driver updates - Fix for max SGEs allowed via RDMA R/W API -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXoqUzAAoJELgmozMOVy/dNKAP/1/Rzn/k97eda1qFqzWpqsPl lMaxDiZZnRIAFJEqEF9Iwo1JLiFIzjpDJnqHB++CKuXZQT0NY6sHW0yrcyUwzsx7 5gui92ldkVg4vY7PTco171vyzG+79KKRZ1dFS14z7oC8XAg48zQ7yJmfb1op3dEw mgxyoLaaMwMF5aLwPoWG4+aPkBMtKUGB/ARb4ehq6M2p71c43lb18GaarJuWLdAz 1HxakXL/uzttyvGDyJGKDrT6ktXXSyvdCTRO60OrrPFJ67P2xRYXce85TLRr8srp Q5RNjyR5fP8uN0qtrQz+hl09mtBeBQHKomyFIOVwkB2r53OKqsR5g5roz3BlpA1X 7PF/MO0pKy4t8XQnLfohEwtNWgszupvxkyAAISI8MwzLOPra/V8smQ9CpTltx1UB hTu3tpAMy1auAjh8TWzzzII1ZoRZz6YCTziWnTaC3bqAljufjt1mnvjrtNmQ1sNi MCLeA3yr8HjlKWdwYr+gVfhSR1wEoOxwHZdLsvBsxmC32hFLlh6rbg2x8wceqTlR 4T8l0AERV1YPjsoSe3/pWVImKUA97qppIfeFcCZiBCBHBPlhpw3ebVt6B1mLVUCV hTMuZeFVcV75D+qr0kR5ZuVn4jgEn9zB1VH3tCV9LJnhBfySZFcP4yhATqiELaHG RVoVAiTBxq5RgNVOH4Zo =cQcp -----END PGP SIGNATURE----- Merge tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...	2016-08-04 20:26:31 -04:00
Linus Torvalds	0cda611386	Round one of 4.8 code - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXo1vCAAoJELgmozMOVy/d0HcQAJqMi7siD9cSaMViYbu812pq 3kNkHZbLNB/947uShDPhhFAWFXU0nRxEnTNSvYxRo+nxnDE/9hEEXpx8OzzKLNU+ GXyDeHsEEriSFcaSne5Tak/QuiFm3PJv73ttXQROCtHG7KxLG9ieVbfusz42Xwiu 5R21qfp6PZEOC+j7L/fTZh/kEN3cfaDYrGnCgmU3z0ka9xG5Qe2/+uWGNkuioRA5 phFUR4MS+1n/VrnxPHrLXTrqv3sw8YfCfRImaXSBrxFVMqhno+cDDtEJQCRnmNrq 7KcJO2KqDMl/QqsjxdwqojNpUTh2t7SeOeQuzUsfXl15yyyetq2Zu7ZurkCGjNtQ NtTt6hv5eXq3mNuBmOPKYDDgakSYyYjS0zueoi8wFFqIeSYxRJv4wx4xoeJ/Bsz8 2LplpaPMQaTM65FhzYXGhYNBKaRkqjL9ihbIl1OcLNvfXAqLElfONM17/Yc/hgVw xfDtvNFrZcl7/exIpBBNOnxwbs4h78vvXsXoBiVoN7V/hBnMzDhkiBHNxNCfZXA0 REGs/cnyy6cpiJOnVCWs77NqL75oK/qb1mEwe1M+A2kaxe/tLixUdYXo/zclDPm8 3DLTL9lCgJIBIEiZT4q/alxLK+yUKD+SHtQT3lmF2Bfsmv/I38Uy55SXAiFO4yOq kwy96TvYtT43SkyNmmBf =oZOO -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...	2016-08-04 20:10:31 -04:00
Doug Ledford	7f1d25b47d	Merge branches 'misc' and 'rxe' into k.o/for-4.8-1	2016-08-04 11:13:47 -04:00
Markus Elfring	380bae5bf2	IB/mthca: Clean up error unwind flow in mthca_reset() The kfree() function was called in a few cases by the mthca_reset() function during error handling even if the passed variables "bridge_header" and "hca_header" contained a null pointer. Adjust jump targets according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:35 -04:00
Markus Elfring	3491ab63b4	IB/mthca: NULL arg to pci_dev_put is OK The pci_dev_put() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:35 -04:00
Markus Elfring	f7ca535ba0	IB/hfi1: NULL arg to sc_return_credits is OK The sc_return_credits() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:35 -04:00
Mark Bloch	3f85f2aaab	IB/mlx4: Add diagnostic hardware counters Expose IB diagnostic hardware counters. The counters count IB events and are applicable for IB and RoCE. The counters can be divided into two groups, per device and per port. Device counters are always exposed. Port counters are exposed only if the firmware supports per port counters. rq_num_dup and sq_num_to are only exposed if we have firmware support for them, if we do, we expose them per device and per port. rq_num_udsdprd and num_cqovf are device only counters. rq - denotes responder. sq - denotes requester. \|-----------------------\|---------------------------------------\| \| Name \| Description \| \|-----------------------\|---------------------------------------\| \|rq_num_lle \| Number of local length errors \| \|-----------------------\|---------------------------------------\| \|sq_num_lle \| number of local length errors \| \|-----------------------\|---------------------------------------\| \|rq_num_lqpoe \| Number of local QP operation errors \| \|-----------------------\|---------------------------------------\| \|sq_num_lqpoe \| Number of local QP operation errors \| \|-----------------------\|---------------------------------------\| \|rq_num_lpe \| Number of local protection errors \| \|-----------------------\|---------------------------------------\| \|sq_num_lpe \| Number of local protection errors \| \|-----------------------\|---------------------------------------\| \|rq_num_wrfe \| Number of CQEs with error \| \|-----------------------\|---------------------------------------\| \|sq_num_wrfe \| Number of CQEs with error \| \|-----------------------\|---------------------------------------\| \|sq_num_mwbe \| Number of Memory Window bind errors \| \|-----------------------\|---------------------------------------\| \|sq_num_bre \| Number of bad response errors \| \|-----------------------\|---------------------------------------\| \|sq_num_rire \| Number of Remote Invalid request \| \| \| errors \| \|-----------------------\|---------------------------------------\| \|rq_num_rire \| Number of Remote Invalid request \| \| \| errors \| \|-----------------------\|---------------------------------------\| \|sq_num_rae \| Number of remote access errors \| \|-----------------------\|---------------------------------------\| \|rq_num_rae \| Number of remote access errors \| \|-----------------------\|---------------------------------------\| \|sq_num_roe \| Number of remote operation errors \| \|-----------------------\|---------------------------------------\| \|sq_num_tree \| Number of transport retries exceeded \| \| \| errors \| \|-----------------------\|---------------------------------------\| \|sq_num_rree \| Number of RNR NAK retries exceeded \| \| \| errors \| \|-----------------------\|---------------------------------------\| \|rq_num_rnr \| Number of RNR NAKs sent \| \|-----------------------\|---------------------------------------\| \|sq_num_rnr \| Number of RNR NAKs received \| \|-----------------------\|---------------------------------------\| \|rq_num_oos \| Number of Out of Sequence requests \| \| \| received \| \|-----------------------\|---------------------------------------\| \|sq_num_oos \| Number of Out of Sequence NAKs \| \| \| received \| \|-----------------------\|---------------------------------------\| \|rq_num_udsdprd \| Number of UD packets silently \| \| \| discarded on the Receive Queue due to \| \| \| lack of receive descriptor \| \|-----------------------\|---------------------------------------\| \|rq_num_dup \| Number of duplicate requests received \| \|-----------------------\|---------------------------------------\| \|sq_num_to \| Number of time out received \| \|-----------------------\|---------------------------------------\| \|num_cqovf \| Number of CQ overflows \| \|-----------------------\|---------------------------------------\| Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:34 -04:00
Roland Dreier	0c87b67209	IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct We allocate a small tracking structure as part of mlx4_ib_resize_cq(). However, we don't need to use GFP_ATOMIC -- immediately after the allocation, we call mlx4_cq_resize(), which allocates a command mailbox with GFP_KERNEL and then sleeps on a firmware command, so we better not be in an atomic context. This actually has a real impact, because when this GFP_ATOMIC allocation fails (and GFP_ATOMIC does fail in practice) then a userspace consumer resizing a CQ will get a spurious failure that we can easily avoid. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:32 -04:00
Bart Van Assche	a154a8cd08	IB/hfi1: Disable by default There is a strict policy in the Linux kernel that new drivers must be disabled by default. Hence leave out the "default m" line from Kconfig. Fixes: `f48ad614c1` ("IB/hfi1: Move driver out of staging") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Jubin John <jubin.john@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: <stable@vger.kernel.org> # v4.7+ Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:32 -04:00
Doug Ledford	6a89d89d85	Merge branch 'i40iw' into k.o/for-4.8	2016-08-03 21:00:16 -04:00
Doug Ledford	3e5e8e8a9a	Merge branches 'cxgb4' and 'mlx5' into k.o/for-4.8	2016-08-03 20:58:45 -04:00
Dean Luick	0636e9ab83	IB/hfi1: Add cache evict LRU list The original code used a LRU list to evict nodes which were least recently used. For correctness the evict code was moved under the handler->lock, now add back the LRU list. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	2677a7680e	IB/hfi1: Fix memory leak during unexpected shutdown During an unexpected shutdown, references to tid_rb_node were NULL'ed out without properly being released. Fix this by calling clear_tid_node in the mmu notifier remove callback rather than after these callbacks are called. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	082b353291	IB/hfi1: Remove unneeded mm argument in remove function The reworked mmu_rb interface allows the unused mm argument to be removed. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	b85ced9151	IB/hfi1: Consistently call ops->remove outside spinlock The ops->remove() callback was called by hfi1_mmu_unregister() with a NULL mm argument while holding a spinlock. In the case of sdma_rb_remove() this caused it to pass current->mm to hfi1_release_user_pages() This had 2 problems. First this would attempt to acquire the mmap_sem under a spin lock. Second the use of current->mm is not always guaranteed to be the proper mm when the fd is being closed. Rather than depend on this implicit behavior we move all calls to ops->remove outside of the spinlock. This also allows the correct mm to be used in the remove callback without fear of deadlock. Because the MMU notifier is not guaranteed to hold mm->mmap_sem, but usually does, we must delay all remove callbacks until out of the notifier, when the callbacks can take the mmap_sem if they need to. Code comments were added to clarify what the expectations are for the users of the mmu rb tree. Suggested-by: Jim Foraker <foraker1@llnl.gov> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	b7df192f74	IB/hfi1: Use evict mmu rb operation Use the new cache evict operation in the SDMA code. This allows the cache to properly coordinate evicts and removes, preventing any race. With this change, the separate list, lock, and race flag are not needed. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	1034599805	IB/hfi1: Add evict operation to the mmu rb handler Allow users to clear nodes from the rb tree based on their evict callback. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	622c202c4a	IB/hfi1: Fix TID caching actions Per file descriptor TID caching actions depend on a global that can change midway through the lifetime of that file descriptor. Make the use of caching consistent for the life of the file descriptor by using the presence of the cache handler to decide when to use the cache functions. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	e0b09ac55d	IB/hfi1: Make the cache handler own its rb tree root The objects which use cache handling should reference their own handler object not the internal data structure it uses to track the nodes. Have the "users" of the mmu notifier code pass opaque objects which can then be properly used in the mmu callbacks depending on the owners needs. This patch has the additional benefit that operations no longer require a look up in a list to find the handlers. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	3faa3d9a30	IB/hfi1: Make use of mm consistent The hfi1 driver registers a mmu_notifier callback when /dev/hfi1_* is opened, and unregisters it when the device is closed. The driver incorrectly assumes that the close will always happen from the same context as the open. In particular, closes due to SIGKILL or OOM killer activity may happen from a different context. In these cases, the wrong mm is passed to mmu_notifier_unregister(), which causes improper reference counting for the victim mm, and eventual memory corruption. Preserve the mm for all open file descriptors and use this mm rather than current->mm for memory operations for the lifetime of that fd. Note: this patch leaves 1 use of current->mm in place. This use is removed in a follow on patch because other functional changes were required prior to that use being removed. If registration fails, there is no reason to keep the handler object around. Free the handler object rather than add it to the list to prevent any mmu_notifier operations, including unregister, when registration fails. Suggested-by: Jim Foraker <foraker1@llnl.gov> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	7b3256e331	IB/hfi1: Fix user SDMA racy user request claim The user SDMA in-use claim bit is in the structure that gets zeroed out once the claim is made. Move the request in-use flag into its own bit array and use that for atomic claims. This cleans up the claim code and removes any race possibility. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	9da7e9a711	IB/hfi1: Fix error condition that needs to clean up If input validation fails, properly free the request before returning. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	a383f8ec55	IB/hfi1: Release node on insert failure If unable to insert node into the RB tree cache, node will be freed before returning from the function. Null out iovec's pointer to node so iovec does not try to free it later. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	9ff73c8715	IB/hfi1: Validate SDMA user iovector count Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	4fa0d22c9a	IB/hfi1: Validate SDMA user request index Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	bdf7752e07	IB/hfi1: Use the same capability state for all shared contexts Save the current capability state at user context creation time. Report this saved value for all shared contexts. Also get rid of unnecessary hfi1_get_base_kinfo function. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	53445bb32d	IB/hfi1: Prevent null pointer dereference If a context has not been assigned or assignment failed, pq may be NULL. Move the unregister within the protection of the null check. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	a7cd2dc5d4	IB/hfi1: Rename TID mmu_rb_* functions Clarify the names of the TID mmu functions. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	20a42d0833	IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() Checking if the rb tree is empty is redundant with the while loop which is emptying the rb tree. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	ea3a0ee52d	IB/hfi1: Restructure hfi1_file_open Rearrange the file open call in prep for new changes. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	ff4ce9bde9	IB/hfi1: Make iovec loop index easy to understand Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	639297b4f0	IB/hfi1: Use "false" not 0 For bool parameters "false" should be used Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	5ed3b15b05	IB/hfi1: Remove unused sub-context parameter subctxt is not used, just remove it. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	3c1091aa94	IB/hfi1: Consolidate __mmu_rb_remove and hfi1_mmu_rb_remove __mmu_rb_remove was called in only 1 place which was a very simple call site. Combine this function into its caller. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	c0946642e5	IB/hfi1: Always expect ops functions Remove, insert, and invalidate are always provided. No need to test. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	862548dace	IB/hfi1: Add parameter names to callback declarations This makes it more clear what these functions are operating on. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	ac335e7e80	IB/hfi1: Add parameter names to function declarations Parameter names to function declarations make it more clear what those parameters do. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	fc87879ae2	IB/hfi1: Remove unused function hfi1_mmu_rb_search Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	8e1f52df97	IB/hfi1: Remove unused uctxt->subpid and uctxt->pid These are no longer needed. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	72720ddfc6	IB/hfi1: Fix minor format error Brackets should be on the next line of a function Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	fc0b76c016	IB/hfi1: Expand reported serial number Expand the serial number space by using more bits from the GUID. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	c492980269	IB/hfi1: Allow for non-double word multiple message sizes for user SDMA The driver pads non-double word multiple message sizes but it doesn't account for this padding when the packet length is calculated. Also, the data length is miscalculated for message sizes less than 4 bytes due to the bit representation in LRH. And there's a check for non-double word multiple message sizes that prevents these messages from being sent. This patch fixes length miscalculations and enables the functionality to send non-double word multiple message sizes. Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	042b0159aa	IB/hfi1: Handle kzalloc failure in init_pervl_scs Checking the return value of the memory allocation call in init_pervl_scs() was missed. Recently the kmalloc() was changed to kzalloc() which identified the problem. While fixing this issue 2 other bugs were noticed. First, the array being allocated is accessed in the nomem path which can be reached before it is allocated. Second, kernel_send_context was not released on error. Fix both of these by creating a more common memory unwind label structure. Fixes: `35f6befc84` ("staging/rdma/hfi1: Add qp to send context mapping for PIO") Reported-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dasaratharaman Chandramouli	527dbf12e0	IB/qib, IB/hfi1: Fix grh creation in ud loopback Instead of copying the actual GRH of type struct ib_grh, existing code copies the struct ib_global_route into the sge. This patch fixes that and constructs the actual GRH from ib_global_route and copies the GRH into the sge. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli	b736a469f9	IB/hfi1: Use hdr2sc function to calculate 5-bit SC The interface is used to compute the 5-bit SC field from the LRH and the RHF bits. Modify code to use the interface instead. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli	89c057cae4	IB/hfi1: Cleanup UD packet handler. Cleanup hfi1_ud_rcv to not have to look at the packet header fields multiple times. The fields are looked up once and used throughout the function. Also fix sc computation when validating MAD packets. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Don Hiatt	d4d602e9a3	IB/hfi1: Rename hfi1_pio_header to hfi1_sdma_header. hfi1_pio_header should really be called hfi1_sdma_header as it is only used for sdma transmits. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli	a9b6b3bc29	IB/hfi1: Rename struct ahg_ib_header to struct hfi1_ahg_info struct ahg_ib_header has no header specific information. Rename it to struct hfi1_ahg_info Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli	bd24ef5eca	IB/hfi1: Remove unused elements from struct ahg_ib_header sde and hfi1_ib_header are not used anymore. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Easwar Hariharan	b5e7101954	IB/hfi1: Reset QSFP on every run through channel tuning Active QSFP cables were reset only every alternate iteration of the channel tuning algorithm instead of every iteration due to incorrect reset of the flag that controlled QSFP reset, resulting in using stale QSFP status in the channel tuning algorithm. Fixes: `8ebd4cf185` ("Add active and optical cable support") Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Easwar Hariharan	5fbd98dd20	IB/hfi1: Ignore QSFP interrupts until power stabilizes Some QSFP cables assert the interrupt line as a side effect of module plug-in and power up. This causes the SerDes and QSFP tuning algorithm to begin cable initialization by reading the QSFP memory map over I2C, which fails. This patch ignores any interrupt line assertion until the module has completed power up and voltage rails have stabilized, which can take a maximum of 500 ms per the SFF-8679 specification. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Easwar Hariharan	3ca5f4c068	IB/hfi1: Disable external device configuration requests QSFP CDR enablement is now controlled by determining power class and the configuration file. We disable the DC 8051 from requesting enablement or disabling of TX and RX CDRs by removing the code that allowed the DC 8051 to request changes. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	d9b13c2030	IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabled Hanging has been observed while writing a file over NFSoRDMA. Dmesg on the server contains messages like these: [ 931.992501] svcrdma: Error -22 posting RDMA_READ [ 952.076879] svcrdma: Error -22 posting RDMA_READ [ 982.154127] svcrdma: Error -22 posting RDMA_READ [ 1012.235884] svcrdma: Error -22 posting RDMA_READ [ 1042.319194] svcrdma: Error -22 posting RDMA_READ Here is why: With the base memory management extension enabled, FRMR is used instead of FMR. The xprtrdma server issues each RDMA read request as the following bundle: (1)IB_WR_REG_MR, signaled; (2)IB_WR_RDMA_READ, signaled; (3)IB_WR_LOCAL_INV, signaled & fencing. These requests are signaled. In order to generate completion, the fast register work request is processed by the hfi1 send engine after being posted to the work queue, and the corresponding lkey is not valid until the request is processed. However, the rdmavt driver validates lkey when the RDMA read request is posted and thus it fails immediately with error -EINVAL (-22). This patch changes the work flow of local operations (fast register and local invalidate) so that fast register work requests are always processed immediately to ensure that the corresponding lkey is valid when subsequent work requests are posted. Local invalidate requests are processed immediately if fencing is not required and no previous local invalidate request is pending. To allow completion generation for signaled local operations that have been processed before posting to the work queue, an internal send flag RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag and only generates completion for such requests. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Grzegorz Heldt	23002d5b08	IB/hfi1: Fix trace message units Trace shows incorrect amount of allocated memory. Fix trace to display memory in KB. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Grzegorz Heldt <grzegorz.heldt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Tadeusz Struk	b14db1f0aa	IB/hfi1: Add sysfs entry to override SDMA interrupt affinity Add sysfs entry to allow user to override affinity for SDMA engine interrupts. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dean Luick	c3f8de0b33	IB/hfi1: Add static PCIe Gen3 CTLE tuning Enhance the PCIe Gen3 recipe to support static CTLE tuning, and add a switch to choose between static and dynamic approaches. Make discrete chips default to static CTLE tuning. Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	8adf71fa14	IB/hfi1: Fix "suspicious rcu_dereference_check() usage" warnings This fixes the following warnings with PROVE_LOCKING and PROVE_RCU enabled in the kernel: case (1): [ INFO: suspicious RCU usage. ] drivers/infiniband/hw/hfi1/init.c:532 suspicious rcu_dereference_check() usage! case (2): [ INFO: suspicious RCU usage. ] drivers/infiniband/hw/hfi1/hfi.h:1624 suspicious rcu_dereference_check() usage! Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dean Luick	b3bf270bed	IB/hfi1: Read all firmware versions Read the version of the SBus, PCIe SerDes, and Fabric Serdes firmwares at driver load time. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dean Luick	6854c6925d	IB/hfi1: Explain state complete frame details When link up fails in LNI, the local and peer state complete frames are reported as numbers. Explain what the values mean so the operator can better diagnose the problem. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Harish Chegondi	8784ac0243	IB/hfi1: Modify the default number of kernel receive conexts Currently, the default number of kernel receive contexts is set to the number of NUMA nodes on the system plus one for control context. However, the systems that have a single socket and/or have NUMA disabled in the BIOS will have only one receive context by default. This patch would ensure that by default there will be at least two kernel receive contexts plus one for control context regardless of the number of NUMA nodes on the system. The user can override the default number of kernel receive contexts with the krcvqs module parameter. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	c72cfe3e38	IB/hfi1: Add support for extended memory management Advertise and add the capability of handing all aspects of IBTA extended memory management support in post send. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	0db3dfa03c	IB/hfi1: Work request processing for fast register mr and invalidate In order to support extended memory management support, add send side processing of work requests of type IB_WR_REG_MR, IB_WR_LOCAL_INV, and IB_WR_SEND_WITH_INV. The first two are local operations and are supported for both RC and UC. Send with invalidate is only supported for RC because the corresponding IB opcodes are not defined for UC. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	a2df0c8332	IB/hfi1: Handle send with invalidate opcode in the RC recv path As part of enabling extended memory management support, add the processing of the RC send with invalidate. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Mitko Haralanov	5fd2b562ed	IB/hfi1: Pull FECN/BECN processing to a common place There were multiple places where FECN/BECN processing was being done for the different types of QPs. All of that code was very similar, which meant that it could be pulled into a single function used by the different QP types. To retain the performance in the fastpath, the common code starts with an inline function, which only calls the slow path if the packet has any of the [FB]ECN bits set. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Tymoteusz Kielan	1b23f02cf4	IB/hfi1: Fix to fully initialize send context area While handling buffer control MAD, partially initialized dd->kernel_send_context area may cause potential dereference of uninitialized pointers. Fix by using kzalloc_node() instead of kmalloc_node(). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jakub Pawlak	3210314ad3	IB/hfi1: Fix integrity errors counter value calculation PMA should not sum TX and RX replay counts when reporting local link integrity errors. Fixed by removing C_DC_TX_REPLAY counter from calculation of the link integrity errors counter value. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Mike Marciniszyn	9ec4faa391	IB/qib: Add qib post send table Add initial table for table driven post_send support. Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:44 -04:00
Mike Marciniszyn	1ac57c50e9	IB/hfi1: Add hfi1 post send tables Add initial table for table driven post_send support. Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:44 -04:00
Jakub Pawlak	71e68e3db8	IB/hfi1: Correct receive packet handler assignment Prevent processing receive packet in case when opcode is accepted by QP but handler for this type of packet is not defined. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:43 -04:00
Jianxin Xiong	14833b8c52	IB/hfi1: Improve SDMA engine assignment for user SDMA Currently each user context is assigned a single SDMA engine based on the VL, context id, and subcontext id. That means for MPI applications, each rank can only use one SDMA engine for all messages. This may create unwanted backup for independent messages going to different destinations upon congestion at one destination. This patch adds the packet "dlid" to the formula of SDMA engine selection for user SDMA requests. A simple hash table is used to maintain even distribution among the available SDMA engines regardless how the "dlid" values are distributed. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:43 -04:00
Dean Luick	e014991d07	IB/hfi1: Remove TWSI references Remove the TWSI code. The driver now uses the kernel's built-in i2c bit bus module. Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:42 -04:00
Dean Luick	dba715f0c8	IB/hfi1: Use built-in i2c bit-shift bus adapter Use built-in i2c bit-shift bus adapter to control the i2c busses on the chip. Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:42 -04:00
Sebastian Sanchez	b094a36f90	IB/hfi1: Refine user process affinity algorithm When performing process affinity recommendations for MPI ranks, the current algorithm doesn't take into account multiple HFI units. Also, real cores and HT cores are not distinguished from one another. Therefore, all HT cores are recommended to be assigned first within the local NUMA node before recommending the assignments of cores in other NUMA nodes. It's ideal to assign all real cores across all NUMA nodes first, then all HT 1 cores, then all HT 2 cores, and so on to balance CPU workload. CPU cores in other NUMA nodes could be running interrupt handlers, and this is not taken into account. To balance the CPU workload for user processes, the following recommendation algorithm is used: For each user process that is opening a context on HFI Y: a) If all cores are assigned to user processes, start assignments all over from the first core b) Assign real cores first, then HT cores (First set of HT cores on all physical cores, then second set of HT cores, and, so on) in the following order: 1. Same NUMA node as HFI Y and not running an IRQ handler 2. Same NUMA node as HFI Y and running an IRQ handler 3. Different NUMA node to HFI Y and not running an IRQ handler 4. Different NUMA node to HFI Y and running an IRQ handler c) Mark core as assigned in the global affinity structure. As user processes are done, remove core assignments from global affinity structure. This implementation allows an arbitrary number of HT cores and provides support for multiple HFIs. This is being included in the kernel rather than user space due to the fact that user space has no way of knowing the CPU recommendations for contexts running as part of other jobs. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:33 -04:00
Sebastian Sanchez	d63730192f	IB/hfi1: Reserve and collapse CPU cores for contexts Kernel receive queues oversubscribe CPU cores on multi-HFI systems. To prevent this, the kernel receive queues are separated onto different cores, and the SDMA engine interrupts are constrained to a lesser number of cores. hfi1s_on_numa_node*krcvqs is the number of CPU cores that are reserved for kernel receive queues for all HFIs. Each HFI initializes its kernel receive queues to one of the reserved CPU cores. If there ends up being 0 CPU cores leftover for SDMA engines, use the same CPU cores as receive contexts. In addition, general and control contexts are assigned to their own CPU core, however, both types of contexts tend to have low traffic. To save CPU cores, collapse general and control contexts to one CPU core for all HFI units. This change prevents SDMA engine interrupts from wrapping around general contexts. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:07 -04:00
Dennis Dalessandro	4197344ba5	IB/hfi1: Add global structure for affinity assignments When HFI units get initialized, they each use their own mask copy for affinity assignments. On a multi-HFI system, affinity assignments overbook CPU cores as each HFI doesn't have knowledge of affinity assignments for other HFI units. Therefore, some CPU cores are never used for interrupt handlers in systems with high number of CPU cores per NUMA node. For multi-HFI systems, SDMA engine interrupt assignments start all over from the first CPU in the local NUMA node after the first HFI initialization. This change allows assignments to continue where the last HFI unit left off. Add global structure for affinity assignments for multiple HFIs to share affinity mask. Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Jubin John <jubin.john@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:45:14 -04:00
Alex Vesker	321a9e3ebc	IB/mlx5: Fix port counter ID association to QP offset The q-counter-id is given in modify-QP command associates the QP with the counter. The offset to which the counter ID was set is incorrect, causing IB port counters not to count on QP. Fixes: `0837e86a7a` ('IB/mlx5: Add per port counters') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Tested-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:40:49 -04:00
Slava Shwartsman	b0ffeb537f	IB/mlx5: Fix iteration overrun in GSI qps Number of outstanding_pi may overflow and as a result may indicate that there are no elements in the queue. The effect of doing this is that the MAD layer will get stuck waiting for completions. The MAD layer will think that the QP is full - because it didn't receive these completions. This fix changes it so the outstanding_pi number is increased with 32-bit wraparound and is not limited to max_send_wr so that the difference between outstanding_pi and outstanding_ci will really indicate the number of outstanding completions. Cc: Stable <stable@vger.kernel.org> Fixes: `ea6dc20362` ('IB/mlx5: Reorder GSI completions') Signed-off-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:32:51 -04:00
Mustafa Ismail	da5c138185	i40iw: Add NULL check for puda buffer i40iw_puda_get_listbuf may return NULL if the list is empty. Add NULL check prior to accessing the pointer. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	8c1ea86d44	i40iw: Change dup_ack_thresh to u8 Change dup_ack_thressh to u8 since it is a 3 bit field. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	c5d057d32b	i40iw: Remove unnecessary check for moving CQ head In i40iw_cq_poll_completion, we always move the tail. So there is no reason to check for overflow everytime we move the head. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	b494c3e677	i40iw: Simplify code to set fragments in SQ WQE Replace a subtract and multiply with an add; while populating fragments in SQ wqe. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	b54143bea9	i40iw: Remove unnecessary parameter to i40iw_cq_poll_completion Post_cq parameter passed to i40iw_cq_poll_completion is always true; so remove. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	5ec11ed23e	i40iw: Do not access pointer after free Child_listen_node pointer is used in a debug print after kfree. Move the print before kfree. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	342c387b0d	i40iw: Correct and use size parameter to i40iw_reg_phys_mr Fix size parameter passed to i40iw_reg_phys_mr and use it to register memory. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Shiraz Saleem	fe5d6e625d	i40iw: Fix return codes Fix incorrect usage of ENOSYS and other return codes. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Steve Wise	ad61a4c7a9	iw_cxgb4: don't block in destroy_qp awaiting the last deref Blocking in c4iw_destroy_qp() causes a deadlock when apps destroy a qp or disconnect a cm_id from their cm event handler function. There is no need to block here anyway, so just replace the refcnt atomic with a kref object and free the memory on the last put. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:15:17 -04:00
Steve Wise	1b1a889dbb	iw_cxgb4: explicitly move the qp to ERROR state during flush This forces the connection to abort if the application failed to disconnect before flushing. This is aligned with how the common flush services work. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:15:17 -04:00
Steve Wise	12eb5137ed	iw_cxgb4: stop MPA_REPLY timer when disconnecting There exists a race where the application can setup a connection and then disconnect it before iw_cxgb4 processes the fw4_ack message. For passive side connections, the fw4_ack message is used to know when to stop the ep timer for MPA_REPLY messages. If the application disconnects before the fw4_ack is handled then c4iw_ep_disconnect() needs to clean up the timer state and stop the timer before restarting it for the disconnect timer. Failure to do this results in a "timer already started" message and a premature stopping of the disconnect timer. Fixes: `e4b76a2` ("RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiation") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:15:17 -04:00
Amitoj Kaur Chawla	3b8fb4b86c	RDMA/cxgb3: Use AF_INET for sin_family field Elsewhere the sin_family field holds a value with a name of the form AF_..., so it seems reasonable to do so here as well. Also the values of PF_INET and AF_INET are the same. The Coccinelle semantic patch that makes this change is as follows: // <smpl> @@ struct sockaddr_in sip; @@ ( sip.sin_family == - PF_INET + AF_INET \| sip.sin_family != - PF_INET + AF_INET \| sip.sin_family = - PF_INET + AF_INET ) // </smpl> Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:04:42 -04:00
Hariprasad S	56b2eca3fe	RDMA/iw_cxgb4: Use kfree_skb instead of kfree The commit `0f8ab0b6e9` ("RDMA/iw_cxgb4: Low resource fixes for Memory registration") from Jun 10, 2016, leads to the following static checker warning: drivers/infiniband/hw/cxgb4/mem.c:612 c4iw_alloc_mw() error: use kfree_skb() here instead of kfree(mhp->dereg_skb) Also fixes skb leak in c4iw_dealloc_mw Fixes: `0f8ab0b6e9` ("RDMA/iw_cxgb4: Low resource fixes for Memory registration") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:04:42 -04:00
Wei Yongjun	619615005e	IB/mlx5: Fix duplicate const warning Fixes the following sparse warning: drivers/infiniband/hw/mlx5/main.c:2574:25: warning: duplicate const Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:00:33 -04:00
Jakub Pawlak	2b71904674	IB/hfi1: Add counter to track unsupported packets drop Add sw counter to track dropped unsupported packets. Report unsupported packets drop as the RcvError. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Jakub Pawlak	583eb8b8a1	IB/hfi1: Add VL XmitDiscards counters to the opapmaquery Add per VL XmitDiscards counters to the opapmaquery status and error response. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Mike Marciniszyn	ad4210823b	IB/hfi1: Fix trace sparse errors Fix sparse errors by making sure the fast assign destinations are host cpu typed. For the void __iomem *, just make the field match source data. Fix a bug where the hw_free trace printed the pointer vs. the dereferenced value. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Sebastian Sanchez	462b6b2170	IB/hfi1: Separate tracepoints into specific headers The ftrace infrastructure used to evaluate the TRACE_SYSTEM macro on every DEFINE_EVENT() macro. Now the TRACE_SYSTEM macro only gets evaluated when trace/define_trace.h is included, so the group event information is lost. This was introduced in commit `acd388fd3a` ("tracing: Give system name a pointer") Therefore, each system tracepoint must be on its own file. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Tadeusz Struk	21a4c95d3f	IB/hfi1: Fix typo Fix a copy and paste typo in comment. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Ira Weiny	0904f32796	IB/hfi1: Remove unnecessary done label in hfi1_write_iter Simple code clean up of hfi1_write_iter. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Ira Weiny	f8181697fd	IB/hfi1: Clean up port state structure definition The definition of port state changed mid development and the old structure was kept accidentally. Remove this dead code. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
David S. Miller	de0ba9a0d8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Just several instances of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-24 00:53:32 -04:00
Brenden Blanco	224e92e02a	net/mlx4_en: break out tx_desc write into separate function In preparation for writing the tx descriptor from multiple functions, create a helper for both normal and blueflame access. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-19 21:46:33 -07:00
Shiraz Saleem	8e0e7aedad	i40iw: Enable remote access rights for stag allocation Fix to enable remote access rights when allocating stag. Fixes: `b7aee855d3` ("RDMA/i40iw: Add base memory management extensions") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-07-12 10:46:34 -04:00
Nicolas Iooss	b0548cff99	i40iw: do not print unitialized variables in error message i40iw_create_cqp() printed the contents of variables maj_err and min_err in an error message before they could be initialized (by calling dev->cqp_ops->cqp_create). Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-07-12 10:46:34 -04:00
Tadeusz Struk	98f179a5ea	IB/hfi1: Fix sleep inside atomic issue in init_asic_data The critical section should protect only the list traversal and dd->asic_data modification, not the memory allocation. The fix pulls the allocation out of the critical section. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-07-12 10:46:24 -04:00
Mike Marciniszyn	896ce45da2	IB/hfi1: Correct issues with sc5 computation There are several computatations of the sc in the ud receive routine. Besides the code duplication, all are wrong when the sc is greater than 15. In that case the code incorrectly or's a 1 into the computed sc instead of 1 shifted left by 4. Fix precomputed sc5 by using an already implemented routine hdr2sc() and deleting flawed duplicated code. Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-07-12 10:46:24 -04:00
Maor Gottlieb	c5bb17302e	net/mlx5: Refactor mlx5_add_flow_rule Reduce the set of arguments passed to mlx5_add_flow_rule by introducing flow_spec structure. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-05 00:06:02 -07:00
Doug Ledford	fb92d8fb1b	Merge branches 'cxgb4-4.8', 'mlx5-4.8' and 'fw-version' into k.o/for-4.8	2016-06-23 12:29:26 -04:00
Doug Ledford	9903fd1374	Merge branches '4.7-rc-misc', 'hfi1-fixes', 'i40iw-rc-fixes' and 'mellanox-rc-fixes' into k.o/for-4.7-rc	2016-06-23 12:22:33 -04:00
Ira Weiny	939b6ca873	IB/hfi1: Add device FW version string Export the firmware version through the core. Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	15453e857a	IB/usnic: Support device FW version string And remove sysfs file in favor of the common core. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	bd395005d2	IB/ocrdma: Support device FW version string And remove sysfs in favor of the core support. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	96357454eb	IB/nes: Support device FW version string And remove the sysfs in favor of the core version. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	51ed03978e	IB/mthca: Supprot device FW version string And remove the sysfs entry in favor of the core support. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	c73428230d	IB/mlx5: Support device FW version string And remove sysfs entry in favor of the common code. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Ira Weiny	e9db59fcd2	IB/mlx4: Support device FW version string And remove the sysfs in favor of common core version. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Ira Weiny	f65c52ca23	IB/i40iw: Support device FW version string And remove sysfs support in favor of the core version. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Ira Weiny	ce1922435d	IB/cxgb4: Support device FW version string And remove sysfs fw_ver in favor of the core. Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Ira Weiny	e180369424	IB/cxgb3: Support device FW version string Also remove fw_ver sysfs to be replaced by the common core one. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Mark Bloch	0ad17a8f7f	IB/mlx5: Add port protocol stats Expose new counters using the get protocol stats callback. We expose the following counters: \|------------------------------------------------------------------------\| \| Name \| IB \| EN \| Description \| \|------------------------------------------------------------------------\| \|rx_write_requests \| + \| - \| Number of received WRITE requests for \| \| \| \| \| the associated QP. \| \|------------------------------------------------------------------------\| \|rx_read_requests \| + \| - \| Number of received READ requests for \| \| \| \| \| the associated QP. \| \|------------------------------------------------------------------------\| \|rx_atomic_requests \| + \| - \| Number of received ATOMIC requests for \| \| \| \| \| the associated QP. \| \|------------------------------------------------------------------------\| \|out_of_buffer \| + \| + \| Number of drops occurred due to lack \| \| \| \| \| of WQE for the associated QPs/RQs. \| \|------------------------------------------------------------------------\| \|out_of_sequence \| + \| - \| Number of errors in the packet \| \| \| \| \| transport sequence number \| \|------------------------------------------------------------------------\| \|duplicate_request \| + \| + \| Number of received duplicated packets. \| \| \| \| \| A request that previously executed is \| \| \| \| \| named duplicated. \| \|------------------------------------------------------------------------\| \|rnr_nak_retry_err \| + \| + \| Number of received RNR NAC packets. \| \| \| \| \| The QP retry limit did not exceed. \| \|------------------------------------------------------------------------\| \|packet_seq_err \| + \| + \| Number of received NAK - sequence error\| \| \| \| \| packets. The QP retry limit did not \| \| \| \| \| exceed. \| \|------------------------------------------------------------------------\| \|implied_nak_err \| + \| + \| Number of times the requester detected \| \| \| \| \| an ACK with a PSN larger than expected \| \| \| \| \| PSN for RDMA READ or ATOMIC response \| \| \| \| \| The QP retry limit did not exceed. \| \|------------------------------------------------------------------------\| \|local_ack_timeout_err\| + \| - \| Number of NO ACK responses from \| \| \| \| \| responder within timer interval. \| \| \| \| \| The QP retry limit did not exceed. \| \|------------------------------------------------------------------------\| Counters are available if all of them are supported. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:40:00 -04:00
Mark Bloch	0837e86a7a	IB/mlx5: Add per port counters In order to support statistics for ports, we attach each QP to a counter set which is dedicate to this port. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:39:14 -04:00
Artemy Kovalyov	af1ba291c5	{net, IB}/mlx5: Refactor internal SRQ API Currently, the SRQ API uses the obsolete mlx5_*_srq_mbox_{in,out} structs which limit the ability to pass the SRQ attributes between net and IB parts of the driver. This patch changes the SRQ API so as to use auto-generated structs and provides a better way to pass attributes which will be in use by coming features. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:20:07 -04:00
Bodong Wang	402ca53644	IB/mlx5: Report mlx5 TSO capabilities when querying device Enable mlx5 based hardware to report TCP segmentation offload (TSO) capabilities from kernel to user space. A TSO enabled NIC will accept big chunks of data with sizes greater than MTU for TCP traffic. The TSO engine will break the data into separate packets and will insert headers automatically. The capabilities are exposed to user space through query_device by uhw directly. The following capabilities are reported: 1. The maximum payload size in bytes supported for segmentation by TSO engine. 2. Bitmap showing which QP types are supported by TSO operation. The bitmap is built by members from 'enmu ib_qp_type'. For example, similar code should be performed if UD QP is supported: supported_qpts \|= 1 << IB_QPT_UD; To make user-space library aware of whether kernel supports uhw or not, a new flag: cmds_supp_uhw will be returned back to user-space through alloc_ucontext. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:09:18 -04:00
Maor Gottlieb	026bae0cb4	IB/mlx5: Enable flow steering for IPv6 traffic Enable flow steering for IPv6 traffic by using an IPv6 spec. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:45 -04:00
Maor Gottlieb	89ea94a7b6	IB/mlx5: Reset flow support for IB kernel ULPs The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:45 -04:00
Maor Gottlieb	7c2344c3bb	IB/mlx5: Implements disassociate_ucontext API Implements the IB core disassociate_ucontext API. The driver detaches the HW resources for a given user context to prevent a dependency between application termination and device disconnect. This is done by managing the VMAs that were mapped to the HW bars such as doorbell and blueflame. When need to detach, remap them to an arbitrary kernel page returned by the zap API. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:44 -04:00
Yishai Hadas	28d6137008	IB/mlx5: Add RSS QP support Add support for Raw Ethernet RX HASH QP. Currently, creation and destruction of such a QP are supported. This QP is implemented as a simple TIR object which points to the receive RQ indirection table. The given hashing configuration is used to configure the TIR and by that it chooses the right RQ from the RQ indirection table. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:44 -04:00
Yishai Hadas	c5f9092936	IB/mlx5: Add Receive Work Queue Indirection table operations Some mlx5 based hardwares support a RQ table object. This RQ table points to a few RQ objects. We implement the receive work queue indirection table API (create and destroy) by using this hardware object. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:43 -04:00
Yishai Hadas	79b20a6c30	IB/mlx5: Add receive Work Queue verbs A QP can be created without internal WQs "packaged" inside it, this QP can be configured to use "external" WQ object as its receive/send queue. WQ is a necessary component for RSS technology since RSS mechanism is supposed to distribute the traffic between multiple Receive Work Queues Receive WQs are implemented by RQs. Implement the WQ creation, modification and destruction verbs. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:43 -04:00
Hariprasad S	dd6b024126	RDMA/iw_cxgb4: Low resource fixes for Completion queue Pre-allocate buffers to deallocate completion queue, so that completion queue is deallocated during RDMA termination when system is running out of memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:18 -04:00
Hariprasad S	0f8ab0b6e9	RDMA/iw_cxgb4: Low resource fixes for Memory registration Pre-allocate buffers for deregistering memory region and memory window during RDMA connection close, when system is running out of memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:17 -04:00
Hariprasad S	4a740838bf	RDMA/iw_cxgb4: Low resource fixes for connection manager Pre-allocate buffers for sending various control messages to close connection, abort connection, etc so that we gracefully handle connections when system is running out of memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:17 -04:00
Hariprasad S	4c72efefd9	RDMA/iw_cxgb4: Add missing error codes for act open cmd Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:17 -04:00
Hariprasad S	bce2841f5a	RDMA/iw_cxgb4: clean up c4iw_reject_cr() Get rid of unneeded code, and refactor things a bit. For MPA version 0 we abort the connection. For > 0, we attempt to send an MPA_START/REJECT Reply, and then disconnect gracefully. If the send of the MPA message fails, then we abort the connection. We can ignore c4iw_ep_disconnect() errors here because it will clean up the endpoint if there are failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:16 -04:00
Hariprasad S	68cebcab59	RDMA/iw_cxgb4: allocate enough space for debugfs "qps" dump With IPv6 addresses, the "qps" debugfs is running out of space and truncating the output. Bump the required size accordingly. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:16 -04:00
Hariprasad S	3d4e79949c	RDMA/iw_cxgb4: only read markers_enabled mod param once markers_enabled should be read only once during MPA negotiation. The present code does read markers_enabled twice during negotiation which results in setting wrong recv/xmit markers if the markers_enabled is changed in the middle of negotiation. With this change the markers_enabled is read only once during MPA negotiation. recv markers are set based on markers enabled module parameter and xmit markers are set based on markers flag from the MPA_START_REQ/MPA_START_REP. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:43:19 -04:00
Shiraz Saleem	7748e4990d	i40iw: Enable level-1 PBL for fast memory registration Set the chunk_size to enable level-1 PBL support when the fast memory page count is more than one. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:35:34 -04:00
Faisal Latif	0477e18145	i40iw: Return correct max_fast_reg_page_list_len Return correct value for max_fast_reg_page_list_len from i40iw_query_device(). Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:35:34 -04:00
Faisal Latif	ee23abd75c	i40iw: Correct status check on i40iw_get_pble i40iw_get_pble returns 0 on success. Correct the check on return code. Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:35:34 -04:00
Shiraz Saleem	747f1c6d9b	i40iw: Correct CQ arming CQ is armed for solicited events only, ignoring other notification flags. Correct this by arming for next and arming for solicited event if IB_CQ_SOLICITED is set. Also protect CQ shadow area update with spinlock. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:35:34 -04:00
Mike Marciniszyn	2aee309d3e	IB/hfi1: Fix deadlock with txreq allocation slow path A failure in the get_txreq() inline will result in a slow path retry using __get_txreq(). __get_txreq() attempts to procure the qp s_lock, which is already held in all callers. Fix by deleting the s_lock maintenance in __get_txreq() and add sparse syntax hooks to future proof the code. Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:16:15 -04:00
Chuck Lever	cbc9355a93	IB/mlx4: Prevent cross page boundary allocation Prevent cross page boundary allocation by allocating new page, this is required to be aligned with ConnectX-3 HW requirements. Not doing that might cause to "RDMA read local protection" error. Fixes: `1b2cd0fc67` ('IB/mlx4: Support the new memory registration API') Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:08:25 -04:00
Dotan Barak	5b420d9cf7	IB/mlx4: Fix memory leak if QP creation failed When RC, UC, or RAW QPs are created, a qp object is allocated (kzalloc). If at a later point (in procedure create_qp_common) the qp creation fails, this qp object must be freed. Fixes: `1ffeb2eb8b` ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:08:25 -04:00
Yishai Hadas	5533c18ab0	IB/mlx4: Verify port number in flow steering create flow In procedure mlx4_ib_create_flow, passing an invalid port number will cause an out-of-bounds array access. Data passed to this procedure can come from user-space. Therefore, need to validate port number before proceeding onwards. Note that we check against the number of physical ports declared at the verbs (ib core) level; When bonding is active, the verbs level sees one physical port, even though the low-level driver sees two ports. Fixes: `f77c0162a3` ("IB/mlx4: Add receive flow steering support") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:07:04 -04:00
Yishai Hadas	a6100603a4	IB/mlx4: Fix error flow when sending mads under SRIOV Fix mad send error flow to prevent double freeing address handles, and leaking tx_ring entries when SRIOV is active. If ib_mad_post_send fails, the address handle pointer in the tx_ring entry must be set to NULL (or there will be a double-free) and tx_tail must be incremented (or there will be a leak of tx_ring entries). The tx_ring is handled the same way in the send-completion handler. Fixes: `37bfc7c1e8` ("IB/mlx4: SR-IOV multiplex and demultiplex MADs") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:07:03 -04:00
Yishai Hadas	f2940e2c76	IB/mlx4: Fix the SQ size of an RC QP When calculating the required size of an RC QP send queue, leave enough space for masked atomic operations, which require more space than "regular" atomic operation. Fixes: `6fa8f71984` ("IB/mlx4: Add support for masked atomic operations") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:06:54 -04:00
Talat Batheesh	00bf534fce	IB/mlx5: Fix wrong naming of port_rcv_data counter port_xmit_data is written instead of port_rcv_data. Fixes: `3efd9a1121` ('IB/mlx5: Modify MAD reading counters method to use counter registers') Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:03:57 -04:00
Eli Cohen	c9b254955b	IB/mlx5: Fix post send fence logic If the caller specified IB_SEND_FENCE in the send flags of the work request and no previous work request stated that the successive one should be fenced, the work request would be executed without a fence. This could result in RDMA read or atomic operations failure due to a MR being invalidated. Fix this by adding the mlx5 enumeration for fencing RDMA/atomic operations and fix the logic to apply this. Fixes: `e126ba97db` ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:03:57 -04:00
Sebastian Sanchez	34d351f8dd	IB/hfi1: Send a pkey change event on driver pkey update Swapping a cable from a "Mgmt Allowed=No" switch port to a "Mgmt Allowed=Yes" switch port doesn't send a pkey change notification. Therefore, the link doesn't become active as the oib_utils layer uses an old pkey table cache. Fix by ensuring the pkey change notification is sent when the table is changed both explicitly by the FM and implicitly by the driver via a cable swap. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:27 -04:00
Sebastian Sanchez	3ec5fa28c9	IB/hfi1: Remove FULL_MGMT_P_KEY from pkey table at link up FULL_MGMT_P_KEY doesn't get cleared from the pkey table at link bounce because the link down and link bounce code paths are different when moving a QSFP cable on a switch. This causes an HFI unit connected to a switch to try to be initialized to an FM node when the QSFP cable is moved from a MgmtAllowed=NO port to a MgmtAllowed=YES port and back to a MgmtAllowed=NO port. Remove FULL_MGMT_P_KEY from pkey table at link up. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:27 -04:00
Tadeusz Struk	c078f0dd01	IB/hfi1: Fix potential buffer overflow This fixes potential buffer overflow because the sprintf function doesn't check buffer boundaries. Use snprintf instead. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:27 -04:00
Tadeusz Struk	93dd0a0978	IB/hfi1: Fix potential NULL ptr dereference This fixes potential NULL ptr dereference because IS_ERR(dd) doesn't handle NULL. Fix the issue by initializing the pointer with a not NULL error code. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:27 -04:00
Ira Weiny	5e9ef24619	IB/qib: Prevent context loss If a context has already been assigned to an FD, prevent another assignment. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:26 -04:00
Ira Weiny	ca2f30a0a6	IB/hfi1: Prevent context loss If a context has already been assigned to an FD, prevent another assignment. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:26 -04:00
Jubin John	c3c64a951c	IB/hfi1: Increase packet egress timeout The current value of 500us for the packet egress timeout is too small which causes the host to declare failure on draining packets too early and unnecessarily bounces the link. Increase this to 50ms taking into account the switch packet discard timer default and the worst case per-VL package drainage rate. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:26 -04:00
Jubin John	b4ba6633ea	IB/hfi1: Fix credit return threshold adjustment The credit return threshold adjustment on mtu change algorithm does not take into account all the kernel send contexts that are assigned per VL. Use the pio send context map to adjust the credit return thresholds for all the allocated and assigned kernel send contexts based on the MTU adjustment per VL. The pio send context map can be changed dynamically based on the actual number of operational vls which is set by the fabric manager. When this happens update the credit return threshold values for all the remapped kernel send contexts. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:26 -04:00
Jason Gunthorpe	8c5122e45a	IB/mlx4: Properly initialize GRH TClass and FlowLabel in AHs When this code was reworked for IBoE support the order of assignments for the sl_tclass_flowlabel got flipped around resulting in TClass & FlowLabel being permanently set to 0 in the packet headers. This breaks IB routers that rely on these headers, but only affects kernel users - libmlx4 does this properly for user space. Cc: stable@vger.kernel.org Fixes: `fa417f7b52` ("IB/mlx4: Add support for IBoE") Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 19:36:54 -04:00
Achiad Shochat	f879ee8d90	IB/mlx5: Fix alternate path code Userspace flag IBV_QP_ALT_PATH is supposed to set the alternate path including fields alt_pkey_index and alt_timeout. Added IB_QP_PKEY_INDEX and IB_QP_TIMEOUT to the attribute mask when calling mlx5_set_path for the alternate path to force setting the alt_pkey_index and alt_timeout values. Fixes: bf24481a3a7c4 ('IB/mlx5: Consider alternate path in pkey ...') Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:03:50 -04:00
Noa Osherovich	d3ae2bdeba	IB/mlx5: Fix pkey_index length in the QP path record Pkey index fields in the QP context path record are extended to 16 bits, as required by IB spec (version 1.3). This change affects all QP commands which include path records. To enable this change, moved the free adaptive routing flag bit (free_ar) to the most significant byte of the QP path record. Fixes: `e126ba97db` ('mlx5: Add driver for Mellanox Connect-IB ...') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:03:50 -04:00
Noa Osherovich	3c4c37746c	IB/mlx5: Fix entries check in mlx5_ib_resize_cq Verify that number of entries is less than device capability. Add an appropriate warning message for error flow. Fixes: `bde51583f4` ('IB/mlx5: Add support for resize CQ') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:03:50 -04:00
Noa Osherovich	9ea5785286	IB/mlx5: Fix entries checks in mlx5_ib_create_cq Number of entries shouldn't be greater than the device's max capability. This should be checked before rounding the entries number to power of two. Fixes: `51ee86a4af` ('IB/mlx5: Fix check of number of entries...') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:03:50 -04:00
Noa Osherovich	2cc6ad5f21	IB/mlx5: Check BlueFlame HCA support BlueFlame support is reported only for PFs when the HCA capability is on. Fixes: `938fe83c8d` ('net/mlx5_core: New device capabilities...') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:03:50 -04:00
Noa Osherovich	0540d8148d	IB/mlx5: Fix returned values of query QP Some variables were not initialized properly: max_recv_wr, max_recv_sge, max_send_wr, qp_context and max_inline_data. Fixes: `e126ba97db` ('mlx5: Add driver for Mellanox Connect-IB...') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:03:49 -04:00
Noa Osherovich	bc5c6eed05	IB/mlx5: Limit query HCA clock When PAGE_SIZE is larger than 4K, the user shouldn't be able to query the HCA core clock. This counter is within 4KB boundary and the user-space shall not read information that's after this boundary. Fixes: `b368d7cb8c` ('IB/mlx5: Add hca_core_clock_offset to...') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:03:49 -04:00
Eran Ben Elisha	c0fcebf552	IB/mlx5: Fix FW version diaplay in sysfs Add a 4-digit padding to show FW version in proper format. Fixes: `9603b61de1` ('mlx5: Move pci device handling from...') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:03:49 -04:00
Noa Osherovich	2788cf3bd9	IB/mlx5: Return PORT_ERR in Active to Initializing tranisition FW port-change events are fired on Active <-> non Active port state transitions only. When the port state changes from Active to Initializing (Active -> Down -> Initializing), a single event is fired. The HCA transitions from Down to Initializing unless prevented from doing so, hence the driver should also propagate events when the port state is Initializing to consumers so they'll be aware that the port is no longer Active and act accordingly. Fixes: `e126ba97db` ('mlx5: Add driver for Mellanox Connect-IB...') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:03:49 -04:00
Maor Gottlieb	da6d6ba3c6	IB/mlx5: Set flow steering capability bit Flow steering is supported by mlx5 device when the following features are supported by firmware: 1. NIC RX flow table. 2. Device has enough flow steering levels. 3. Atomic modification of flow table entry. 4. Flow tables chaining. To check if flow steering is supported it's enough to check if the driver opened the mlx5 bypass namespace. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:03:49 -04:00
Bart Van Assche	55c40648d3	IB/hfi1: Suppress sparse warnings Avoid that sparse reports the following warnings for the hfi1 driver: trace.c:217:13: warning: no previous prototype for ‘print_u64_array’ [-Wmissing-prototypes] user_sdma.c:1361:17: warning: dubious: !x & y Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-06 19:37:23 -04:00
Bart Van Assche	d55215c50e	IB/hfi1: Use bit 0 instead of bit 1 The first argument of test_bit() and clear_bit() is a bit number and not a bitmask. Hence change that first argument from (1 << 0) into 0. This patch avoids that smatch reports the following warnings: user_sdma.c:1059: sdma_cache_evict() warn: test_bit() takes a bit number user_sdma.c:1590: sdma_rb_remove() warn: test_bit() takes a bit number Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-06 19:37:23 -04:00
Bart Van Assche	48a0cc139f	IB/hfi1: Fix indentation Make the indentation of the source code consistent. Detected by smatch. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-06 19:37:23 -04:00
Bart Van Assche	ca920f5b67	IB/mlx4: Fix device managed flow steering support test Perform the test for device managed flow steering support even if memory windows are not supported. I noticed this because smatch reported inconsistent indentation for the device managed flow steering support test. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Cc: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-06 19:31:51 -04:00
Krzysztof Kozlowski	ce67fef68d	IB/usnic: Remove unused DMA attributes The DMA attributes are set but never used. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-06 19:28:06 -04:00
Leon Romanovsky	f242d93ae9	IB/hfi1: Avoid large frame size warning When CONFIG_FRAME_WARN is set to 1024 bytes, which is useful to find stack consumers, we get a warning in hfi1 driver. drivers/infiniband/hw/hfi1/affinity.c: In function ‘hfi1_get_proc_affinity’: drivers/infiniband/hw/hfi1/affinity.c:415:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] This change removes unneeded buf[1024] declaration and usage. Fixes: `f48ad614c1` ("IB/hfi1: Move driver out of staging") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-06 19:16:52 -04:00
Dan Carpenter	a8b7da58ec	IB/hfi1: fix some indenting That extra tabs are misleading. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-06 19:15:07 -04:00
Linus Torvalds	1cbe06c3cf	Round two of 4.7 merge window patches - Dynamic counter infrastructure in the IB drivers This is a sysfs based code to allow free form access to the hardware counters RDMA devices might support so drivers don't need to code this up repeatedly themselves - SendOnlyFullMember multicast support - IB router support - A couple misc fixes - The big item on the list: hfi1 driver updates, plus moving the hfi1 driver out of staging -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXRyupAAoJELgmozMOVy/deHAQAJp87yn8hntHXh81Kjrvezch McJ0v7FTsfsLVe/Y+YAbaAREjanhnUDzZ36WUqS9rHadOl+Ia0fD+M53aTk9hDKS R9/oWQDsUb8k7imdhNrSS72Q2kbwr++8RqBoEEFcl4fuDCVsrRhUaf+perJTEnoK I9APDQCNU3FCTPFRllK5al6gi5UGEW9vO9JQZb+VnTVW9+LjlYobzcZfttAw8oRN rtQLP3xGb7ZwV0ngH0dt2TthN2ih97TavnISojTdJ5I+WHFxLV0YbHSK/E2JLdoi hVknhrflPIKr+mRVv97yCCz92kGAdTKG3sT9vNUVUQ0Y8kEdrW/LrnrmxsBEYzkl GcQeqg7//ffOlQ0dGZZjlkPg/8yMPo9CBsfPDcOvXeenK0lkao9qg2yrIz+CBC/9 YlpUKfL0JLRal9O6y3Mh++ZQ9R1kvQN8/CgHLUlRQdWLfxf7VPA4rIcaWT7iGfvg z6jIJKjsK+hqKhY9oKh3YfNmXh65IQ5OF3+tBzHKwNiGXNSsMYAVYP3vc/HYTJvz 4EpsenndKb5l0AOvDch8rMmw7YZPf2NOdi90TW6Bq1VgTIDRxh0WzR2QTnWBVRSf XAcC3DLXTHqC/UUpHrdXgfvU/0nQXaVUvPWIAM/t3hy91GLscByDw4973+tISs/B 7AHSptbVQaBUpTjWHfKf =5+cf -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "This is the second group of code for the 4.7 merge window. It looks large, but only in one sense. I'll get to that in a minute. The list of changes here breaks down as follows: - Dynamic counter infrastructure in the IB drivers This is a sysfs based code to allow free form access to the hardware counters RDMA devices might support so drivers don't need to code this up repeatedly themselves - SendOnlyFullMember multicast support - IB router support - A couple misc fixes - The big item on the list: hfi1 driver updates, plus moving the hfi1 driver out of staging There was a group of 15 patches in the hfi1 list that I thought I had in the first pull request but they weren't. So that added to the length of the hfi1 section here. As far as these go, everything but the hfi1 is pretty straight forward. The hfi1 is, if you recall, the driver that Al had complaints about how it used the write/writev interfaces in an overloaded fashion. The write portion of their interface behaved like the write handler in the IB stack proper and did bi-directional communications. The writev interface, on the other hand, only accepts SDMA request structures. The completions for those structures are sent back via an entirely different event mechanism. With the security patch, we put security checks on the write interface, however, we also knew they would be going away soon. Now, we've converted the write handler in the hfi1 driver to use ioctls from the IB reserved magic area for its bidirectional communications. With that change, Intel has addressed all of the items originally on their TODO when they went into staging (as well as many items added to the list later). As such, I moved them out, and since they were the last item in the staging/rdma directory, and I don't have immediate plans to use the staging area again, I removed the staging/rdma area. Because of the move out of staging, as well as a series of 5 patches in the hfi1 driver that removed code people thought should be done in a different way and was optional to begin with (a snoop debug interface, an eeprom driver for an eeprom connected directory to their hfi1 chip and not via an i2c bus, and a few other things like that), the line count, especially the removal count, is high" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (56 commits) staging/rdma: Remove the entire rdma subdirectory of staging IB/core: Make device counter infrastructure dynamic IB/hfi1: Fix pio map initialization IB/hfi1: Correct 8051 link parameter settings IB/hfi1: Update pkey table properly after link down or FM start IB/rdamvt: Fix rdmavt s_ack_queue sizing IB/rdmavt: Max atomic value should be a u8 IB/hfi1: Fix hard lockup due to not using save/restore spin lock IB/hfi1: Add tracing support for send with invalidate opcode IB/hfi1, qib: Add ieth to the packet header definitions IB/hfi1: Move driver out of staging IB/hfi1: Do not free hfi1 cdev parent structure early IB/hfi1: Add trace message in user IOCTL handling IB/hfi1: Remove write(), use ioctl() for user cmds IB/hfi1: Add ioctl() interface for user commands IB/hfi1: Remove unused user command IB/hfi1: Remove snoop/diag interface IB/hfi1: Remove EPROM functionality from data device IB/hfi1: Remove UI char device IB/hfi1: Remove multiple device cdev ...	2016-05-28 11:04:16 -07:00
Christoph Lameter	b40f4757da	IB/core: Make device counter infrastructure dynamic In practice, each RDMA device has a unique set of counters that the hardware implements. Having a central set of counters that they must all adhere to is limiting and causes many useful counters to not be available. Therefore we create a dynamic counter registration infrastructure. The driver must implement a stats structure allocation routine, in which the driver must place the directory name it wants, a list of names for all of the counters, an array of u64 counters themselves, plus a few generic configuration options. We then implement a core routine to create a sysfs file for each of the named stats elements, and a core routine to retrieve the stats when any of the sysfs attribute files are read. To avoid excessive beating on the stats generation routine in the drivers, the core code also caches the stats for a short period of time so that someone attempting to read all of the stats in a given device's directory will not result in a stats generation call per file read. Future work will attempt to standardize just the shared stats elements, and possibly add a method to get the stats via netlink in addition to sysfs. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> [ Add caching, make structure names more informative, add i40iw support, other significant rewrites from the original patch ]	2016-05-26 12:52:51 -04:00
Doug Ledford	8779e7658d	Merge branch 'hfi1-2' into k.o/for-4.7	2016-05-26 12:50:05 -04:00
Jubin John	f158486527	IB/hfi1: Fix pio map initialization The pio map initialization function is off by 1 causing the last kernel send context that is allocated to not get mapped into the pio map which leads to the last kernel send context not being used by any of the qps. The send context reserved for VL15 is taken care of by setting the scontext variable that is used as the index into the kernel send context array to 1 and does not need to be accounted for in the kernel send context counting loop as it is currently done. Fix the kernel send context counting loop to account for all the allocated send contexts and map all of them to the different VLs. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Dean Luick	f6de3d39cf	IB/hfi1: Correct 8051 link parameter settings Two 8051 link settings, external device config and tuning method, were written in the wrong location and the previous settings were not cleared. For both, clear the old value and write the new value. Fixes: `8ebd4cf185` ("staging/rdma/hfi1: Add active and optical cable support") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Sebastian Sanchez	ce8b2fd095	IB/hfi1: Update pkey table properly after link down or FM start When FM is disabled, and the HFI port on the switch is changed from MgmtAllowed=YES to MgmtAllowed=NO and the link is bounced, FULL_MGMT_P_KEY doesn't get cleared from the pkey table. This also occurs when the QSFP cable is moved from a switch port with MgmtAllowed=YES to a MgmtAllowed=NO port. Clear pkey entry properly. Also, when the driver is loaded and the switch port is set to MgmtAllowed=NO, FULL_MGMT_P_KEY shouldn't be added to pkey table after FM is started. Only set FULL_MGMT_P_KEY in the pkey table if switch port is configured to MgmtAllowed=YES. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Mike Marciniszyn	7049de65c9	IB/hfi1: Fix hard lockup due to not using save/restore spin lock Commit `b9b06cb6fe` ("IB/hfi1: Fix missing lock/unlock in verbs drain callback") added a spin lock. Unfortunately, the new lock code can be called from a base level interrupt state, and an interrupt that can get stacked will attempt to get the same lock. Fix by using the flag save/restore spin lock variation. Cc: stable@vger.kernel.org # 4.6+ Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Jianxin Xiong	bdd8a98ce4	IB/hfi1: Add tracing support for send with invalidate opcode Enable trace generation for packets with the "Send Last with Invalidate" and "Send Only with Invalidate" opcodes. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Jianxin Xiong	23f7d0d29e	IB/hfi1, qib: Add ieth to the packet header definitions A new union member "ieth" (Invalidate Extended Transport Header) is added to the packet header definition in preparation of supporting the send with invalidate opcode. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Doug Ledford	e6f61130ed	Merge branches 'misc-4.7-2', 'ipoib' and 'ib-router' into k.o/for-4.7	2016-05-26 11:55:19 -04:00
Dennis Dalessandro	f48ad614c1	IB/hfi1: Move driver out of staging The TODO list for the hfi1 driver was completed during 4.6. In addition other objections raised (which are far beyond what was in the TODO list) have been addressed as well. It is now time to remove the driver from staging and into the drivers/infiniband sub-tree. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:35:14 -04:00
Jubin John	f70f5f6af3	IB/qib: Remove unused qib_7322_intr_msgs[] Building the qib driver with gcc version 6.1.0 raises the following build warning: drivers/infiniband/hw/qib/qib_iba7322.c:1311:39: warning: 'qib_7322_intr_msgs' defined but not used [-Wunused-const-variable=] static const struct qib_hwerror_msgs qib_7322_intr_msgs[] = { ^~~~~~~~~~~~~~~~~~ Remove the unused qib_7322_intr_msgs[] Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:11 -04:00
Erez Shitrit	507f6afa3a	IB/core: Introduce capabilitymask2 field in ClassPortInfo mad Change struct ib_class_port_info to conform to IB Spec 1.3 That in order to get specific capability mask from ClassPortInfo mad. >From the IB Spec, ClassPortInfo section: "CapabilityMask2 Bits 0-26: Additional class-specific capabilities... RespTimeValue the rest 5 bits" The new struct now has one field for capabilitymask2 (previously was the reserved field) and the resp_time field. And it fixes up qib and srpt, use of the field repurposed to be used as capabilitymask2: IB/qib: Change pma_get_classportinfo IB/srpt: Adjust the use of ib_class_port_info Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-25 15:39:02 -04:00
Honggang Li	0de4cbb3dd	RDMA/cxgb3: device driver frees DMA memory with different size [ 598.852037] ------------[ cut here ]------------ [ 598.856698] WARNING: at lib/dma-debug.c:887 check_unmap+0xf8/0x920() [ 598.863079] cxgb3 0000:01:00.0: DMA-API: device driver frees DMA memory with different size [device address=0x0000000003310000] [map size=17 bytes] [unmap size=16 bytes] [ 598.878265] Modules linked in: xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad kvm_amd kvm ipmi_devintf ipmi_ssif dcdbas pcspkr ipmi_si sg ipmi_msghandler acpi_power_meter amd64_edac_mod shpchp edac_core sp5100_tco k10temp edac_mce_amd i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic iw_cxgb3 pata_acpi ib_core ib_addr mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm pata_atiixp drm ahci libahci serio_raw i2c_core cxgb3 libata bnx2 mdio dm_mirror dm_region_hash dm_log dm_mod [ 598.946822] CPU: 3 PID: 11820 Comm: cmtime Not tainted 3.10.0-327.el7.x86_64.debug #1 [ 598.954681] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS 2.0.2 10/22/2012 [ 598.962193] ffff8808077479a8 000000000381a432 ffff880807747960 ffffffff81700918 [ 598.969663] ffff880807747998 ffffffff8108b6c0 ffff880807747a80 ffff8808063f55c0 [ 598.977132] ffffffff833ca850 0000000000000282 ffff88080b1bb800 ffff880807747a00 [ 598.984602] Call Trace: [ 598.987062] [<ffffffff81700918>] dump_stack+0x19/0x1b [ 598.992224] [<ffffffff8108b6c0>] warn_slowpath_common+0x70/0xb0 [ 598.998254] [<ffffffff8108b75c>] warn_slowpath_fmt+0x5c/0x80 [ 599.004033] [<ffffffff813903b8>] check_unmap+0xf8/0x920 [ 599.009369] [<ffffffff81025959>] ? sched_clock+0x9/0x10 [ 599.014702] [<ffffffff81390cee>] debug_dma_free_coherent+0x7e/0xa0 [ 599.021008] [<ffffffffa01ece2c>] cxio_destroy_cq+0xcc/0x160 [iw_cxgb3] [ 599.027654] [<ffffffffa01e8da0>] iwch_destroy_cq+0xf0/0x140 [iw_cxgb3] [ 599.034307] [<ffffffffa01c4bfe>] ib_destroy_cq+0x1e/0x30 [ib_core] [ 599.040601] [<ffffffffa04ff2d2>] ib_uverbs_close+0x302/0x4d0 [ib_uverbs] [ 599.047417] [<ffffffff812335a2>] __fput+0x102/0x310 [ 599.052401] [<ffffffff8123388e>] ____fput+0xe/0x10 [ 599.057297] [<ffffffff810bbde4>] task_work_run+0xb4/0xe0 [ 599.062719] [<ffffffff81092a84>] do_exit+0x304/0xc60 [ 599.067789] [<ffffffff81025905>] ? native_sched_clock+0x35/0x80 [ 599.073820] [<ffffffff81025959>] ? sched_clock+0x9/0x10 [ 599.079153] [<ffffffff8170a49c>] ? _raw_spin_unlock_irq+0x2c/0x50 [ 599.085358] [<ffffffff8109346c>] do_group_exit+0x4c/0xc0 [ 599.090779] [<ffffffff810a8661>] get_signal_to_deliver+0x2e1/0x960 [ 599.097071] [<ffffffff8101c497>] do_signal+0x57/0x6e0 [ 599.102229] [<ffffffff81714bd1>] ? sysret_signal+0x5/0x4e [ 599.107738] [<ffffffff8101cb7f>] do_notify_resume+0x5f/0xb0 [ 599.113418] [<ffffffff81714e7d>] int_signal+0x12/0x17 [ 599.118576] ---[ end trace 1e4653102e7e7019 ]--- [ 599.123211] Mapped at: [ 599.125577] [<ffffffff8138ed8b>] debug_dma_alloc_coherent+0x2b/0x80 [ 599.131968] [<ffffffffa01ec862>] cxio_create_cq+0xf2/0x1f0 [iw_cxgb3] [ 599.139920] [<ffffffffa01e9c05>] iwch_create_cq+0x105/0x4e0 [iw_cxgb3] [ 599.147895] [<ffffffffa0500584>] create_cq.constprop.14+0x184/0x2e0 [ib_uverbs] [ 599.156649] [<ffffffffa05027fb>] ib_uverbs_create_cq+0x10b/0x140 [ib_uverbs] Fixes: `b955150ea7` ('RDMA/cxgb3: When a user QP is marked in error, also mark the CQs in error') Signed-off-by: Honggang Li <honli@redhat.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-24 11:19:46 -04:00
Linus Torvalds	76b584d312	Primary 4.7 merge window changes - Updates to the new Intel X722 iWARP driver - Updates to the hfi1 driver - Fixes for the iw_cxgb4 driver - Misc core fixes - Generic RDMA READ/WRITE API addition - SRP updates - Misc ipoib updates - Minor mlx5 updates -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXPIAnAAoJELgmozMOVy/dEYUP/0A6NH5ptzUrwPuLrWoz8h+e KBfJE7H09mfKBx0Rq8YmnU+pz4lk8vrMLLaqpbGN57mwO0a1lK9bgc3E6KUhQPhc dpGEX/NG1+aILomD7M4l1yAKkG17kxFLD75cLCeaxhO76jBRWsunukqk5mT/u0EG fUYZs1fRb9t2LDtTNhPfSFR1+dgP5S17xLhpl9ttn87hTmIuiGWR6ig2nTC7azZD G0d7RVjohfY2sDD28YgQiUEJ+q+1ymp3XTaCZhPCVl9VCRPweEdtLKcbNWZIvClx ewuTCgADXg8tAL/6zu6bEKqZlC17UrmVJee3csKQLT09PJkSEICeFR/ld6ONUKAF nDhi3ySa75Xxg9VDcCnuRkXKK+/zi7oDelZuh9mvMG0JJqPK9rTZDD29j2kBf7C0 bdx4R5cI4KJWQ/GlCyi/nLiuYkmAiCugzcGnRho4ub+EJ0yX1w6n8KVYr37kFsFu q6MCnEfArEgDpbq1wo0+9MWtqBYrnOI/XtG81Zd+6X2MW975qU85wUdUSjg6OOb1 v1osyAmFDy9A0Y80yY+l1HHrSVIvI0IAWZDfxsbCLQY8O03ZNcvxE2RsrzWd5CKL iZsX24tjV0WR9+lORHLfAKB3DL9CcfHv/tHo7q+5iAHmIuWZGrEN22ELkwS/4X7x d/V0XDjzs6lgQeTJ7R4B =e+Zv -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Primary 4.7 merge window changes - Updates to the new Intel X722 iWARP driver - Updates to the hfi1 driver - Fixes for the iw_cxgb4 driver - Misc core fixes - Generic RDMA READ/WRITE API addition - SRP updates - Misc ipoib updates - Minor mlx5 updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (148 commits) IB/mlx5: Fire the CQ completion handler from tasklet net/mlx5_core: Use tasklet for user-space CQ completion events IB/core: Do not require CAP_NET_ADMIN for packet sniffing IB/mlx4: Fix unaligned access in send_reply_to_slave IB/mlx5: Report Scatter FCS device capability when supported IB/mlx5: Add Scatter FCS support for Raw Packet QP IB/core: Add Scatter FCS create flag IB/core: Add Raw Scatter FCS device capability IB/core: Add extended device capability flags i40iw: pass hw_stats by reference rather than by value i40iw: Remove unnecessary synchronize_irq() before free_irq() i40iw: constify i40iw_vf_cqp_ops structure IB/mlx5: Add UARs write-combining and non-cached mapping IB/mlx5: Allow mapping the free running counter on PROT_EXEC IB/mlx4: Use list_for_each_entry_safe IB/SA: Use correct free function IB/core: Fix a potential array overrun in CMA and SA agent IB/core: Remove unnecessary check in ibnl_rcv_msg IB/IWPM: Fix a potential skb leak RDMA/nes: replace custom print_hex_dump() ...	2016-05-20 14:35:07 -07:00
Linus Torvalds	c04a588029	powerpc updates for 4.7 Highlights: - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V - Live patching support for ppc64le (also merged via livepatching.git) Various cleanups & minor fixes from: - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie, Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras, Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung Bauermann, Valentin Rothberg, Vipin K Parashar. General: - Update LMB associativity index during DLPAR add/remove from Nathan Fontenot - Fix branching to OOL handlers in relocatable kernel from Hari Bathini - Add support for userspace Power9 copy/paste from Chris Smart - Always use STRICT_MM_TYPECHECKS from Michael Ellerman - Add mask of possible MMU features from Michael Ellerman PCI: - Enable pass through of NVLink to guests from Alexey Kardashevskiy - Cleanups in preparation for powernv PCI hotplug from Gavin Shan - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" from Guilherme G. Piccoli - Remove the dependency on EEH struct in DDW mechanism from Guilherme G. Piccoli selftests: - Test cp_abort during context switch from Chris Smart - Add several tests for transactional memory support from Rashmica Gupta perf: - Add support for sampling interrupt register state from Anju T - Add support for unwinding perf-stackdump from Chandan Kumar cxl: - Configure the PSL for two CAPI ports on POWER8NVL from Philippe Bergheaud - Allow initialization on timebase sync failures from Frederic Barrat - Increase timeout for detection of AFU mmio hang from Frederic Barrat - Handle num_of_processes larger than can fit in the SPA from Ian Munsie - Ensure PSL interrupt is configured for contexts with no AFU IRQs from Ian Munsie - Add kernel API to allow a context to operate with relocate disabled from Ian Munsie - Check periodically the coherent platform function's state from Christophe Lombard Freescale: - Updates from Scott: "Contains 86xx fixes, minor device tree fixes, an erratum workaround, and a kconfig dependency fix." -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXPsGzAAoJEFHr6jzI4aWAVoAP/iKdrDe0eYHlVAE9SqnbsiZs lgDxdsC8P3fsmP1G9o/HkKhC82zHl/La8Ztz8dtqa+LkSzbfliWP1ztJsI7GsBFo tyCKzWnX9Rwvd3meHu/o/SQ29TNLm/PbPyyRqpj5QPbJ8XCXkAXR7ZZZqjvcMsJW /AgIr7Cgf53tl9oZzzl/c7CnNHhMq+NBdA71vhWtUx+T97wfJEGyKW6HhZyHDbEU iAki7fu77ZpEqC/Fh9swf0dCGBJ+a132NoMVo0AdV7EQLznUYlQpQEqa+1PyHZOP /ArOzf2mDg6m3PfCo1eiB07v8PnVZ3llEUbVAJNg3GUxbE4SHrqq/kwm0iElm3p/ DvFxerCwdX9vmskJX4wDs+pSZRabXYj9XVMptsgFzA4joWrqqb7mBHqaort88YcY YSljEt1bHyXmiJ+dBya40qARsWUkCVN7ZgEzdxckq0KI3w7g2tqpqIbO2lClWT6t B3GpqQ4jp34+d1M14FB91fIGK7tMvOhSInE0Mv9+tPvRsepXqiiU/SwdAtRlr3m2 zs/K+4FYcVjJ3Rmpgc+tI38PbZxHe212I35YN6L1LP+4ZfAtzz0NyKdooTIBtkbO 19pX4WbBjKq8zK+YutrySncBIrbnI6VjW51vtRhgVKZliPFO/6zKagyU6FbxM+E5 udQES+t3F/9gvtxgxtDe =YvyQ -----END PGP SIGNATURE----- Merge tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V - Live patching support for ppc64le (also merged via livepatching.git) Various cleanups & minor fixes from: - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie, Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras, Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung Bauermann, Valentin Rothberg, Vipin K Parashar. General: - Update LMB associativity index during DLPAR add/remove from Nathan Fontenot - Fix branching to OOL handlers in relocatable kernel from Hari Bathini - Add support for userspace Power9 copy/paste from Chris Smart - Always use STRICT_MM_TYPECHECKS from Michael Ellerman - Add mask of possible MMU features from Michael Ellerman PCI: - Enable pass through of NVLink to guests from Alexey Kardashevskiy - Cleanups in preparation for powernv PCI hotplug from Gavin Shan - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" from Guilherme G Piccoli - Remove the dependency on EEH struct in DDW mechanism from Guilherme G Piccoli selftests: - Test cp_abort during context switch from Chris Smart - Add several tests for transactional memory support from Rashmica Gupta perf: - Add support for sampling interrupt register state from Anju T - Add support for unwinding perf-stackdump from Chandan Kumar cxl: - Configure the PSL for two CAPI ports on POWER8NVL from Philippe Bergheaud - Allow initialization on timebase sync failures from Frederic Barrat - Increase timeout for detection of AFU mmio hang from Frederic Barrat - Handle num_of_processes larger than can fit in the SPA from Ian Munsie - Ensure PSL interrupt is configured for contexts with no AFU IRQs from Ian Munsie - Add kernel API to allow a context to operate with relocate disabled from Ian Munsie - Check periodically the coherent platform function's state from Christophe Lombard Freescale: - Updates from Scott: "Contains 86xx fixes, minor device tree fixes, an erratum workaround, and a kconfig dependency fix." * tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (192 commits) powerpc/86xx: Fix PCI interrupt map definition powerpc/86xx: Move pci1 definition to the include file powerpc/fsl: Fix build of the dtb embedded kernel images powerpc/fsl: Fix rcpm compatible string powerpc/fsl: Remove FSL_SOC dependency from FSL_LBC powerpc/fsl-pci: Add a workaround for PCI 5 errata powerpc/fsl: Fix SPI compatible on t208xrdb and t1040rdb powerpc/powernv/npu: Add PE to PHB's list powerpc/powernv: Fix insufficient memory allocation powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner() powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover() powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover() powerpc/eeh: Don't report error in eeh_pe_reset_and_recover() Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()" powerpc/powernv/npu: Enable NVLink pass through powerpc/powernv/npu: Rework TCE Kill handling powerpc/powernv/npu: Add set/unset window helpers powerpc/powernv/ioda2: Export debug helper pe_level_printk() ...	2016-05-20 10:12:41 -07:00
Matan Barak	c16d2750a0	IB/mlx5: Fire the CQ completion handler from tasklet Previously, mlx5_ib_cq_comp was executed from interrupt context. Under heavy load, this could cause the CPU core to be in an interrupt context too long. Instead of executing the handler from the interrupt context we execute it from a much friendly tasklet context. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-18 10:45:49 -04:00
shamir rabinovitch	04ef0f1a01	IB/mlx4: Fix unaligned access in send_reply_to_slave The problem is that the function 'send_reply_to_slave' gets the 'req_sa_mad' as a pointer whose address is only aliged to 4 bytes but is 8 bytes in size. This can result in unaligned access faults on certain architectures. Sowmini Varadhan pointed to this reply from Dave Miller that say that memcpy should not be used to solve alignment issues: https://lkml.org/lkml/2015/10/21/352 Optimization of memcpy to 'ldx' instruction can only happen if the compiler knows that the size of the data we are copying is 8 bytes and it assumes it is aligned to 8 bytes. If the compiler know the type is not aligned to 8 it must not optimize the 8 byte copy. Defining the data type as aligned to 4 forces the compiler to treat all accesses as though they aren't aligned and avoids the 'ldx' optimization. Full credit for the idea goes to Jason Gunthorpe <jgunthorpe@obsidianresearch.com>. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-18 10:25:56 -04:00
Linus Torvalds	16bf834805	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits) gitignore: fix wording mfd: ab8500-debugfs: fix "between" in printk memstick: trivial fix of spelling mistake on management cpupowerutils: bench: fix "average" treewide: Fix typos in printk IB/mlx4: printk fix pinctrl: sirf/atlas7: fix printk spelling serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/ w1: comment spelling s/minmum/minimum/ Blackfin: comment spelling s/divsor/divisor/ metag: Fix misspellings in comments. ia64: Fix misspellings in comments. hexagon: Fix misspellings in comments. tools/perf: Fix misspellings in comments. cris: Fix misspellings in comments. c6x: Fix misspellings in comments. blackfin: Fix misspelling of 'register' in comment. avr32: Fix misspelling of 'definitions' in comment. treewide: Fix typos in printk Doc: treewide : Fix typos in DocBook/filesystem.xml ...	2016-05-17 17:05:30 -07:00
Doug Ledford	0651ec932a	Merge branches 'cxgb4-2', 'i40iw-2', 'ipoib', 'misc-4.7' and 'mlx5-fcs' into k.o/for-4.7	2016-05-13 19:40:38 -04:00
Majd Dibbiny	cff5a0f3a3	IB/mlx5: Report Scatter FCS device capability when supported Report Scatter FCS support when the Firmware supports as well. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:28 -04:00
Majd Dibbiny	358e42ea66	IB/mlx5: Add Scatter FCS support for Raw Packet QP Enable Scatter FCS in the RQ context when the user passes Scatter FCS create flag. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:28 -04:00
Colin Ian King	73fbb4db6c	i40iw: pass hw_stats by reference rather than by value passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:26 -04:00
Lars-Peter Clausen	d42ed55aaf	i40iw: Remove unnecessary synchronize_irq() before free_irq() Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:26 -04:00
Julia Lawall	fb997816fd	i40iw: constify i40iw_vf_cqp_ops structure The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:26 -04:00
Guy Levi	37aa5c36aa	IB/mlx5: Add UARs write-combining and non-cached mapping By this patch, the user space library will be able to improve performance using appropriate ringing DoorBell method according to the memory type it asked for. Currently only one mapping command is allowed for UARs: MLX5_IB_MMAP_REGULAR_PAGE. Using this mapping, the kernel maps the UARs to write-combining (WC) if the system supports it. If the system is not supporting WC the UARs are mapped to non-cached(NC). In this case the user space library can't tell which mapping is applied. This patch adds 2 new mapping commands: MLX5_IB_MMAP_WC_PAGE and MLX5_IB_MMAP_NC_PAGE. For these commands the kernel maps exactly as requested and fails if it can't. Since there is no generic way to check if the requested memory region can be mapped as WC, driver enables conclusive WC mapping only for x86, PowerPC and ARM which support WC for the device's memory region. Signed-off-by: Guy Levy <guyle@mellanox.com> Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:03 -04:00
Matan Barak	6cbac1e4cd	IB/mlx5: Allow mapping the free running counter on PROT_EXEC The current mlx5 code disallows mapping the free running counter of mlx5 based hardwares when PROT_EXEC is set. Although this behaviour is correct, Linux does add an implicit VM_EXEC to the vm_flags if the READ_IMPLIES_EXEC bit is set in the process personality. This happens for example if the process stack is executable. This causes libmlx5 to output a warning and prevents the user from reading the free running clock. Executing the init segment of the hardware isn't a security risk (at least no more than executing a process own stack), so we just prevent writes to there. Fixes: `d69e3bcf79` ('IB/mlx5: Mmap the HCA's core clock register to user-space') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:03 -04:00
Geliang Tang	ee71b9681d	IB/mlx4: Use list_for_each_entry_safe Simplify the code in search_relocate_mgid0_group with by using list_for_each_entry_safe(). Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:03 -04:00
Andy Shevchenko	faca88273b	RDMA/nes: replace custom print_hex_dump() There is no need to duplicate a lot of code that is in the kernel library for ages. Replace duplicating code by calling to print_hex_dump() directly. Note that output is slightly changed: - hex and ascii parts have just two spaces delimeter - there is no delimeter for ascii portions - file and line removed from prefix (they were redundant anyway since previous output shows same closer enough) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:01 -04:00
Colin Ian King	aa70345369	IB/mlx4: trivial fix of spelling mistake on "argument" fix spelling mistake, argumant -> argument Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:00 -04:00
Denys Vlasenko	da74bf4aea	IB/nes: Deinline nes_free_qp_mem, save 1072 bytes This function compiles to 550 bytes of machine code. Three callsites, all in nes_create_qp. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Faisal Latif <faisal.latif@intel.com> CC: Doug Ledford <dledford@redhat.com> CC: linux-rdma@vger.kernel.org CC: linux-kernel@vger.kernel.org Reviewed-By: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:00 -04:00
Tatyana Nikolova	1997412db6	RDMA/nes: Adding queue drain functions Adding sq and rq drain functions, which block until all previously posted wr-s in the specified queue have completed. A completion object is signaled to unblock the thread, when the last cqe for the corresponding queue is processed. Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:59 -04:00
Colin Ian King	78c49f83ee	i40iw: pass hw_stats by reference rather than by value passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:34 -04:00
Lars-Peter Clausen	6d6c5c1eb3	i40iw: Remove unnecessary synchronize_irq() before free_irq() Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:33 -04:00
Julia Lawall	2016de7ba0	i40iw: constify i40iw_vf_cqp_ops structure The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:33 -04:00
Bart Van Assche	ba987e51a6	iw_cxgb4: Convert a __force cast __force casts should be avoided if there is a better alternative. Hence modify the comparison of s_addr with INADDR_ANY such that the __force cast is no longer necessary. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Vipul Pandya <vipul@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:11 -04:00
Hariprasad S	64bec74a9b	RDMA/iw_cxgb4: Add arp failure handlers to send_mpa_reply/reject() These handlers when called print error message to the kernel log, but the actual handling is done by _c4iw_free_ep() and process_timeout(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:11 -04:00
Hariprasad S	093108cb36	RDMA/iw_cxgb4: Always wake up waiter in c4iw_peer_abort_intr() Currently c4iw_peer_abort_intr() does not wake up the waiter if the endpoint state indicates we're using MPAv2 and we're currently trying to connect. This was introduced with commit `7c0a33d611` ("RDMA/cxgb4: Don't wakeup threads for MPAv2") However, this original fix is flawed because it introduces a race that can cause a deadlock of the iwarp stack. Here is the race: ->local side sets up an active offload connection. ->local side sends MPA_START request. ->peer sends MPA_START response. ->local side ingress cpl thread begins processing the MPA_START response, but before it changes the state from MPA_REQ_SENT to FPDU_MODE: ->peer sends a RST which results in a ABORT_REQ_RSS. This triggers peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the stack will re-attempt the connection using MPAv1. ->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR to firmware. But since HW sent an abort, FW correctly drops the RI_WR/INIT WR. ->So the cpl thread is stuck waiting for a reply and cannot process the ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a halt because no more ingress cpls are processed by the stack... The correct fix for the issue is to always do the wake up in c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect(). Fixes: `7c0a33d611` ("RDMA/cxgb4: Don't wakeup threads for MPAv2") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:10 -04:00
Hariprasad S	4a4dd8db9d	RDMA/iw_cxgb4: Handle ret value of process_mpa_reply() in rx_data Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:10 -04:00
Hariprasad S	f86fac79af	RDMA/iw_cxgb4: atomic find and reference for listening endpoints Add get_ep_from_stid() which will atomically find and reference the endpoint struct if found. This avoids touch-after-free races between threads destroying listening endpoints and the CPL processing thread processing an incoming PASS_ACCEPT_REQ CPL. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:09 -04:00
Hariprasad S	e8667a9b35	RDMA/iw_cxgb4: Handle ULP accept/reject during ABORTING c4iw_reject() and c4iw_accept() need to handle the case where the endpoint has timed out and is in the middle of ABORTING the connection. Here is the flow that causes the BUG_ON() to fire on the server side: 1) offload connection setup and endpoint timer started 2) MPA_START request received from peer, CONNECT_REQUEST passed to ULP 3) endpoint timer fires, and process_timeout() aborts the connection, this moves the endpoint state to ABORTING until HW sends up the ABORT_RPL_RSS. 4) application exits closing the CONNECT_REQUEST cm_id. The IWCM calls c4iw_reject_cr() to destroy this connection request. 5) WHAMO: BUG_ON() because the state is ABORTING. The fix is to change c4iw_reject_cr() and c4iw_accept_cr() to fail the operation if the state is not in MPA_REQ_RCVD vs in DEAD. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:09 -04:00
Hariprasad S	ceb110a8c1	RDMA/iw_cxgb4: Release ep for for FPDU_MODE and MPA_REQ_RCVD in process_timeout ARP failure may also happen when ep in FPDU_MODE and these failures need to be handled by process_timeout(). process_timeout() also has to handle case MPA_REQ_RCVD, setting abort to 1, leading to ep resource release. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:09 -04:00
Hariprasad S	c878b706ff	RDMA/iw_cxgb4: Free skb in case of arp failure in _c4iw_free_ep() Arp failure for send_mpa_reply/reject() is handled by freeing the mpa_skb in c4iw_free_ep() before releasing ep. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:08 -04:00
Hariprasad S	944661dd97	RDMA/iw_cxgb4: atomically lookup ep and get a reference There is a race between ULP threads calling c4iw_ep_disconnect() via c4iw_modify_rc_qp() and the ingress CPL thread where the ULP thread can free the endpoint just after the ingress CPL thread finds the ep pointer in the tid table. To avoid this, we now use the hwtid_idr table for lookups instead of the LLD tid table so we can lock around insert, remove, and lookup+get_ep to avoid the race. The CPL handlers now will either find the ep ptr and have a ref on it, or not find it and they can discard the CPL. Callers of get_ep_from_tid() will have a ref on the ep if found, and thus must deref when they are done. Negative advice in peer_abort_intr() need to dereference the ep. therefore peer_abort() is scheduled to dereference the ep later. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:08 -04:00
Hariprasad S	761e19a504	RDMA/iw_cxgb4: Handle return value of c4iw_ofld_send() in abort_arp_failure() In abort_arp_failure(), the return value from c4iw_ofld_send() is ignored and thus if the CPL isn't sent, the endpoint is stuck and never gets aborted. Failure of c4iw_ofld_send() is treated as fatal error, and the ep resources are released in a safer context through process_work(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:07 -04:00
Hariprasad S	8a70f812b1	RDMA/iw_cxgb4: in process_timeout() don't move ep state to ABORTING Moving the state to ABORTING causes the ep to get stuck because c4iw_ep_timeout() thinks the ABORT has already been done. So leave the state alone and let c4iw_ep_disconnect() do the right thing given the ep state. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:07 -04:00
Hariprasad S	caa6c9f247	RDMA/iw_cxgb4: handle return value of c4iw_l2t_send() and send_mpa_req() ->In act_open_rpl(), CPL_ERR_TCAM_FULL error handling branch, there is no handling of the return value of send_fw_act_open_req(). ->In send_fw_act_open_req(), there is no handling of return value of c4iw_l2t_send(), which may cause a ep leak and won't notify upper layers on connection establish failure. ->send_mpa_req() should act on the return from c4iw_l2t_send() and return the error to the caller. ->In case of c4iw_l2t_send() failure in send_mpa_req(), returns without starting the timer and not changing the ep state, which is further handled by act_establish() -> In act_establish()?if send_mpa_request's get_skb returns an error, may cause an ep leak. So handle return value of send_mpa_req() Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:07 -04:00
Hariprasad S	e4b76a2a26	RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiation ->Stop the ep timer after MPA negotiation so that the arp failures during send_mpa_reply/reject will be handled by process_timeout() after the ep timer expires. ->Added case MPA_REP_SENT in process_timeout(). ->For MPA reject, c4iw_ep_disconnect tries to start an already started timer, which leads to warning message "timer already started". -> In case of mpa reject stop the timer and call send_mpa_reject(). -> Added new ep flag STOP_MPA_TIMER to tell fw4_ack() to stop the timer only for send_mpa_reply(), which is set in c4iw_accept_cr(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:06 -04:00
Hariprasad S	da1cecdffc	RDMA/iw_cxgb4: Do not stop timer in case of incomplete messages In case of incomplete mpa messages we should not stop timer as it results in return with timeout for the next mpa message Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:06 -04:00
Hariprasad S	8d1f1a6b3f	RDMA/iw_cxgb4: parent_ep has to be dereferenced in case of passive accept failure -> On passive side of connection parent_ep referenced during connection request has to be dereferenced during the passive accept failure. -> As passive accept failure error handlinglogic runs in atomic context, the parent ep is dereferenced by scheduling work request. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:05 -04:00
Hariprasad S	92f850ec3a	RDMA/iw_cxgb4: set the correct FID value in DSGL commands The FID value in a ULP_MEMIO command needs to be set to an IQ ID of a queue configured for our PF. The FID/IQ id is used to index into the PCIE FID table, to find out on which function the DMA needs to be issued. Essentially, every DMA needs to have the ingress queue. The exact ingress queue doesn't matter, but it needs to be an ingress queue associated with the function you want to see the DMA on. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:05 -04:00
Hariprasad S	ccd2c30b45	RDMA/iw_cxgb4: Correct RFC number of MPA Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:05 -04:00
Hariprasad S	9ca6f7cf9c	RDMA/iw_cxgb4: Add few history bits for ep - add EP_DISC_FAIL history bit - add QP_REFED/DEREFED history bits - Add functions to ref/deref the cm_id and add history bit for the same - add CLOSE_CON_RPL history Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:04 -04:00
Colin Ian King	e9bb8af98a	i40iw: pass hw_stats by reference rather than by value passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 13:47:08 -04:00
Lars-Peter Clausen	e381b3bbd7	i40iw: Remove unnecessary synchronize_irq() before free_irq() Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 13:47:08 -04:00
Julia Lawall	dc1badf630	i40iw: constify i40iw_vf_cqp_ops structure The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 13:47:07 -04:00
Julia Lawall	a647040ea8	i40e: constify i40e_client_ops structure The i40e_client_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 13:47:07 -04:00
Bart Van Assche	9aa8b3217e	IB/core: Enhance ib_map_mr_sg() The SRP initiator allows to set max_sectors to a value that exceeds the largest amount of data that can be mapped at once with an mlx4 HCA using fast registration and a page size of 4 KB. Hence modify ib_map_mr_sg() such that it can map partial sg-elements. If an sg-element has been mapped partially, let the caller know which fraction has been mapped by adjusting *sg_offset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 13:37:57 -04:00
Christoph Hellwig	ff2ba99365	IB/core: Add passing an offset into the SG to ib_map_mr_sg Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 13:37:11 -04:00
David S. Miller	95aef7cecb	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-05-05 This series contains updates to i40e and i40evf. The theme behind this series is code reduction, yeah! Jesse provides most of the changes starting with a refactor of the interpretation of a tunnel which lets us start using the hardware's parsing. Removed the packet split receive routine and ancillary code in preparation for the Rx-refactor. The refactor of the receive routine, aligns the receive routine with the one in ixgbe which was highly optimized. The hardware supports a 16 byte descriptor for receive, but the driver was never using it in production. There was no performance benefit to the real driver of 16 byte descriptors, so drop a whole lot of complexity while getting rid of the code. Fixed a bug where while changing the number of descriptors using ethtool, the driver did not test the limits of the system memory before permanently assuming it would be able to get receive buffer memory. Mitch fixes a memory leak of one page each time the driver is opened by allocating the correct number of receive buffers and do not fiddle with next_to_use in the VF driver. Arnd Bergmann fixed a indentation issue by adding the appropriate curly braces in i40e_vc_config_promiscuous_mode_msg(). Julia Lawall fixed an issue found by Coccinelle, where i40e_client_ops structure can be const since it is never modified. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-06 15:55:30 -04:00
Julia Lawall	3949c4ac8c	i40e: constify i40e_client_ops structure The i40e_client_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2016-05-05 23:32:59 -07:00
Haggai Abramovsky	73898db043	net/mlx4: Avoid wrong virtual mappings The dma_alloc_coherent() function returns a virtual address which can be used for coherent access to the underlying memory. On some architectures, like arm64, undefined behavior results if this memory is also accessed via virtual mappings that are not coherent. Because of their undefined nature, operations like virt_to_page() return garbage when passed virtual addresses obtained from dma_alloc_coherent(). Any subsequent mappings via vmap() of the garbage page values are unusable and result in bad things like bus errors (synchronous aborts in ARM64 speak). The mlx4 driver contains code that does the equivalent of: vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the device is opened. Prevent Ethernet driver to run this problematic code by forcing it to allocate contiguous memory. As for the Infiniband driver, at first we are trying to allocate contiguous memory, but in case of failure roll back to work with fragmented memory. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reported-by: David Daney <david.daney@cavium.com> Tested-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-05 23:23:05 -04:00
Doug Ledford	94d7f1a255	Merge branches 'hfi1' and 'iw_cxgb4' into k.o/for-4.7	2016-05-05 16:42:09 -04:00
Hariprasad S	6973627968	RDMA/iw_cxgb4: remove abort_connection() usage from ep_timeout() Use c4iw_ep_disconnect() instead. This is part of getting rid of abort_connection() altogether so we properly clean up on send_abort() failures. This is the last user of abort_connection(), so remove it too. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-05 16:11:14 -04:00
Hariprasad S	c00dcbafac	RDMA/iw_cxgb4: move QP -> ERROR on fatal disconnect errors In c4iw_ep_disconnect(), if we fail to initiate a close operation, then move the qp to ERROR to disassociate the ep from the qp. Failure to do this will leak the ep resources. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-05 16:11:14 -04:00
Hariprasad S	fd6aabe48c	RDMA/iw_cxgb4: don't use abort_connection in process_mpa_request() Instead return whether the caller needs to disconnect. This is part of getting rid of abort_connection() altogether so we properly clean up on send_abort() failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-05 16:11:14 -04:00
Hariprasad S	eaf4c6d46a	RDMA/iw_cxgb4: remove abort_connection() usage from accept/reject Use c4iw_ep_disconnect() instead. This is part of getting rid of abort_connection() altogether so we properly clean up on send_abort() failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-05 16:11:14 -04:00
Hariprasad S	fef4422d00	RDMA/iw_cxgb4: free resources when send_flowc() fails Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-05 16:11:14 -04:00
Hariprasad S	f8e1e1d137	RDMA/iw_cxgb4: remove connection abort from process_mpa_reply Instead, have the caller, rx_data() handle the close/abort like it does for process_mpa_request(). This is part of getting rid of abort_connection() altogether so we properly clean up on send_abort() failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-05 16:11:14 -04:00
Hariprasad S	6e410d8f71	RDMA/iw_cxgb4: ensure eps don't get freed while the mutex is held In rx_data(), with the ep in FPDU_MODE, refcnt=2, if we get unexpected streaming data, we call c4iw_modify_rc_qp() and move the qp from RTS -> TERMINATE. In c4iw_modify_rc_qp(), if rdma_fini() returns an error, the ep will be dereferenced (refcnt=1). Then rx_data() calls c4iw_ep_disconnect() which starts the close operation. But if send_halfclose() fails in c4iw_ep_disconnect(), we will call release_ep_resources() derefing the ep which reduces the refcnt to 0 and and frees the ep. However we still has the ep mutex at that point, so we have a touch-after-free bug. There is a similar issue where peer_close() calls c4iw_ep_disconnect(). The solution is to add a reference to the ep in c4iw_ep_disconnect() after acquiring the mutex, and release it after releasing the mutex. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-05 16:11:14 -04:00
Hariprasad S	88bc230dc6	RDMA/iw_cxgb4: stop ep timer on close failure In c4iw_ep_disconnect(), if we start the ep timer to begin a close, but send_halfclose() fails, we need to stop the timer and send a CLOSE event up to the IWCM before releasing the resources. Otherwise, we can crash when the ep timer fires if the ep is referencing a previous instance of the device. This can happen as part of adapter reset/recovery, for instance. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-05 16:11:14 -04:00
Hariprasad S	9dec900c20	RDMA/iw_cxgb4: release ep resources on accept arp failure If ARP fails before the CPL_PASS_ACCEPT_RPL is seen by hardware, the tid will be stuck in SYN_PEND and never released. So create an arp failure handler specifically for this message to release the endpoint resources. In pass_accept_rpl_arp_failure(), put the parent endpoint so it will be freed when destroyed. Also we don't need to call release_tid() here because _c4iw_free_ep() calls cxgb4_remove_tid() which releases the hwtid. If we get an ABORT_REQ_RSS instead of a PASS_ESTABLISH (because the peer's ACK to our SYN is never received), then put the parent as well in peer_abort(). Treat accept_cr() failures just like arp failures: put the parent ep and release the ep resources destroying the tid The ARP failure handlers are called in an atomic context, so we need to schedule some of the processing which might block. Namely _c4iw_free_ep() which needs a mutex. So create a "special" CPL opcode and handler and schedule it via sched() to be run by process_work() in a blockable context. Also rework the active open arp failure handler to make use of release_ep_resources(). This allows both the active and passive arp failure handlers to use the same deferred cleanup function. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-05 16:11:14 -04:00
Florian Westphal	860e9538a9	treewide: replace dev->trans_start update with helper Replace all trans_start updates with netif_trans_update helper. change was done via spatch: struct net_device *d; @@ - d->trans_start = jiffies + netif_trans_update(d) Compile tested only. Cc: user-mode-linux-devel@lists.sourceforge.net Cc: linux-xtensa@linux-xtensa.org Cc: linux1394-devel@lists.sourceforge.net Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: MPT-FusionLinux.pdl@broadcom.com Cc: linux-scsi@vger.kernel.org Cc: linux-can@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linux-omap@vger.kernel.org Cc: linux-hams@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: devel@driverdev.osuosl.org Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: linux-bluetooth@vger.kernel.org Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 14:16:49 -04:00
David S. Miller	cba6532100	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv4/ip_gre.c Minor conflicts between tunnel bug fixes in net and ipv6 tunnel cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 00:52:29 -04:00
Aneesh Kumar K.V	8ffb4103f5	IB/qib: Use cache inhibitted and guarded mapping on powerpc The driver was requesting for a writethrough mapping. But with those flags we will end up with an SAO mapping because we now have memory conherence always enabled. ie, the existing mapping will end up with a WIMG value 0b1110 which is Strong Access Order. Update this to use cache inhibitted guarded mapping. Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:13 +10:00
Linus Torvalds	925d96a0c9	Final set of -rc fixes for 4.6 - A number of collected fixes for oopses, memory corruptions, deadlocks, etc. All of these fixes are small (many only 5-10 lines), obvious, and tested. - Fix for the security issue related to the use of write for bi-directional communications. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXIrZVAAoJELgmozMOVy/d/XEP/1A4Ohm7WiZMN09wvlFGSgLe 2z2tY9ILvFuiAF++VZRfYyRmHorVKHYB1tk0JTsW1Ts1DrkjExgr4LS1/YDLOC42 q8YlBZw2x7pdnD5W1MJm+HK6oNj7aZVVjEHG7QnfLUIXr57a2rBQIeeWLx24M+OS j1yvaY/v39qvf7dwHwVjs07rh2WW9QCZn2c/552G4xz1YDdTkYBTc2WNnl0eng1f 1NqqMXhnajmNyR+Q+0+Vbcp4YWv551l5E6j9M+5nebehNtPSRb0GEIjxT3KnSGEg AjFev3XwnRF9EkQOwgbsg7a784+UHXe15vbr1MvbzGygcQeq4NVzLl04WEHExQe1 Om0ES/i8zfRs6d5XYB5zMY8pJbdjSVM/20d+h21SQs//4JXXJrN35WVAyy8lgwrX M3oY4t21eBQlV7oezfEZQgEEbdtccr8LILfZZmRUWPHd2ymaTWg6e4pZwtn45rlD O/Gb11G/UT7SXgw+XiPLBj5xlQk7nRn0kGuaStR7PonkLQ9Zy1wSSptvJeGj0VWE W6TEJnIqtv0aiJLhIQn8Ee1pCxE/ds7UPW6wT5O9R8ccEdDeIB3BDgskTNg1xPuS I8e1o7iA9752YS3wMDhLA8PifwbmVbkGHxJUQecOBPiDcdukkM0q4/YoNYqXKLCc nLDv3fMztpinG1L1T2LM =MFdC -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "Final set of -rc fixes for 4.6. I've collected up a number of patches that are all pretty small with the exception of only a couple. The hfi1 driver has a number of important patches, and it is what really drives the line count of this pull request up. These are all small and I've got this kernel built and running in the test lab (I have most of the hardware, I think nes is the only thing in this patch set that I can't say I've personally tested and have up and running). Summary: - A number of collected fixes for oopses, memory corruptions, deadlocks, etc. All of these fixes are small (many only 5-10 lines), obvious, and tested. - Fix for the security issue related to the use of write for bi-directional communications" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: RDMA/nes: don't leak skb if carrier down IB/security: Restrict use of the write() interface IB/hfi1: Use kernel default llseek for ui device IB/hfi1: Don't attempt to free resources if initialization failed IB/hfi1: Fix missing lock/unlock in verbs drain callback IB/rdmavt: Fix send scheduling IB/hfi1: Prevent unpinning of wrong pages IB/hfi1: Fix deadlock caused by locking with wrong scope IB/hfi1: Prevent NULL pointer deferences in caching code MAINTAINERS: Update iser/isert maintainer contact info IB/mlx5: Expose correct max_sge_rd limit RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips iw_cxgb4: handle draining an idle qp iw_cxgb3: initialize ibdev.iwcm->ifname for port mapping iw_cxgb4: initialize ibdev.iwcm->ifname for port mapping IB/core: Don't drain non-existent rq queue-pair IB/core: Fix oops in ib_cache_gid_set_default_gid	2016-04-29 17:07:54 -07:00
Maor Gottlieb	d63cd28608	net/mlx5: Add user chosen levels when allocating flow tables Currently, consumers of the flow steering infrastructure can't choose their own flow table levels and are limited to one flow table per level. This just waste levels. Instead, we introduce here the possibility to use multiple flow tables in a level. The user is free to connect these flow tables, while following the rule (FTEs in FT of level x could only point to FTs of level y where y > x). In addition this patch switch the order of the create/destroy flow tables of the NIC(vlan and main). Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:09 -04:00
Florian Westphal	4c8bb95921	RDMA/nes: don't leak skb if carrier down Alternatively one could free the skb, OTOH I don't think this test is useful so just remove it. Cc: <linux-rdma@vger.kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 21:11:09 -04:00

... 3 4 5 6 7 ...

3630 Commits