Commit Graph

4040 Commits

Author SHA1 Message Date
Duan Jiong 0cc65dd691 RDMA/ocrdma: Convert to use simple_open()
This removes an open-coded duplicate of simple_open().

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-19 17:55:54 -07:00
Christoph Jaeger 65b302ad31 RDMA/cxgb4: Fix memory leaks in c4iw_alloc() error paths
c4iw_alloc() bails out without freeing the storage that 'devp' points to.

Picked up by Coverity - CID 1204241.

Fixes: fa658a98a2 ("RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices")
Signed-off-by: Christoph Jaeger <christophjaeger@linux.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-19 17:55:43 -07:00
Sagi Grimberg 9d49f5e284 Target/iser: Fix hangs in connection teardown
In ungraceful teardowns isert close flows seem racy such that
isert_wait_conn hangs as RDMA_CM_EVENT_DISCONNECTED never
gets invoked (no one called rdma_disconnect).

Both graceful and ungraceful teardowns will have rx flush errors
(isert posts a batch once connection is established). Once all
flush errors are consumed we invoke isert_wait_conn and it will
be responsible for calling rdma_disconnect. This way it can be
sure that rdma_disconnect was called and it won't wait forever.

This patch also removes the logout_posted indicator. either the
logout completion was consumed and no problem decrementing the
post_send_buf_count, or it was consumed as a flush error. no point
of keeping it for isert_wait_conn as there is no danger that
isert_conn will be accidentally removed while it is running.

(Drop unnecessary sleep_on_conn_wait_comp check in
 isert_cq_rx_comp_err - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-19 14:38:49 -07:00
Sagi Grimberg e346ab343f Target/iser: Bail from accept_np if np_thread is trying to close
In case np_thread state is in RESET/SHUTDOWN/EXIT states,
no point for isert to stall there as we may get a hang in
case no one will wake it up later.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-19 14:32:58 -07:00
Matan Barak 9433c18891 IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes
When we receive a netdev event indicating a netdev change and/or
a netdev address change, we must change the MAC index used by the
proxy QP1 (in the QP context), otherwise RoCE CM packets sent by the
VF will not carry the same source MAC address as the non-CM packets.

We use the UPDATE_QP command to perform this change.

In order to avoid modifying a QP context based on netdev event,
while the driver attempts to destroy this QP (e.g either the mlx4_ib
or ib_mad modules are unloaded), we use mutex locking in both flows.

Since the relevant mlx4 proxy GSI QP is created indirectly by the
mad module when they create their GSI QP, the mlx4 didn't need to
keep track on that QP prior to this change.

Now, when QP modifications are needed to this QP from within the
driver, we added refernece to it.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 15:12:45 -04:00
Sagi Grimberg 14f4b54fe3 Target/iscsi,iser: Avoid accepting transport connections during stop stage
When the target is in stop stage, iSER transport initiates RDMA disconnects.
The iSER initiator may wish to establish a new connection over the
still existing network portal. In this case iSER transport should not
accept and resume new RDMA connections. In order to learn that, iscsi_np
is added with enabled flag so the iSER transport can check when deciding
weather to accept and resume a new connection request.

The iscsi_np is enabled after successful transport setup, and disabled
before iscsi_np login threads are cleaned up.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:11 -07:00
Sagi Grimberg 531b7bf4bd Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
RDMA CM and iSCSI target flows are asynchronous and completely
uncorrelated. Relying on the fact that iscsi_accept_np will be called
after CM connection request event and will wait for it is a mistake.

When attempting to login to a few targets this flow is racy and
unpredictable, but for parallel login to dozens of targets will
race and hang every time.

The correct synchronizing mechanism in this case is pending on
a semaphore rather than a wait_for_event. We keep the pending
interruptible for iscsi_np cleanup stage.

(Squash patch to remove dead code into parent - nab)

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:10 -07:00
Sagi Grimberg 9fe63c88b1 Target/iser: Fix wrong connection requests list addition
Should be adding list_add_tail($new, $head) and not
the other way around.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:10 -07:00
Wilfried Klaebe 7ad24ea4bf net: get rid of SET_ETHTOOL_OPS
net: get rid of SET_ETHTOOL_OPS

Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.

Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
-       SET_ETHTOOL_OPS(dev, ops);
+       dev->ethtool_ops = ops;

Compile tested only, but I'd seriously wonder if this broke anything.

Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 17:43:20 -04:00
Hariprasad S 7d0a73a40c RDMA/cxgb4: Update Kconfig to include Chelsio T5 adapter
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-28 17:29:41 -07:00
Steve Wise c2f9da92f2 RDMA/cxgb4: Only allow kernel db ringing for T4 devs
The whole db drop avoidance stuff is for T4 only.  So we cannot allow
that to be enabled for T5 devices.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-28 17:29:41 -07:00
Steve Wise 92e5011ab0 RDMA/cxgb4: Force T5 connections to use TAHOE congestion control
This is required to work around a T5 HW issue.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-28 17:29:41 -07:00
Steve Wise cc18b939e1 RDMA/cxgb4: Fix endpoint mutex deadlocks
In cases where the cm calls c4iw_modify_rc_qp() with the endpoint
mutex held, they must be called with internal == 1.  rx_data() and
process_mpa_reply() are not doing this.  This causes a deadlock
because c4iw_modify_rc_qp() might call c4iw_ep_disconnect() in some
!internal cases, and c4iw_ep_disconnect() acquires the endpoint mutex.
The design was intended to only do the disconnect for !internal calls.

Change rx_data(), FPDU_MODE case, to call c4iw_modify_rc_qp() with
internal == 1, and then disconnect only after releasing the mutex.

Change process_mpa_reply() to call c4iw_modify_rc_qp(TERMINATE) with
internal == 1 and set a new attr flag telling it to send a TERMINATE
message.  Previously this was implied by !internal.

Change process_mpa_reply() to return whether the caller should
disconnect after releasing the endpoint mutex.  Now rx_data() will do
the disconnect in the cases where process_mpa_reply() wants to
disconnect after the TERMINATE is sent.

Change c4iw_modify_rc_qp() RTS->TERM to only disconnect if !internal,
and to send a TERMINATE message if attrs->send_term is 1.

Change abort_connection() to not aquire the ep mutex for setting the
state, and make all calls to abort_connection() do so with the mutex
held.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-28 17:29:41 -07:00
Linus Torvalds 38137a5188 InfiniBand/RDMA updates for 3.15-rc2:
- Mostly cxgb4 fixes unblocked by the merge of some prerequisites via
    the net tree.
 
  - Drop deprecated MSI-X API use.
 
  - A couple other miscellaneous things.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTUXCpAAoJEENa44ZhAt0h4tEP/1P1TqA/90sO7WeChze6H2ph
 kkGJPsCj8i1AWpef/Ax2hvoH1pjxK/VQbpwgkGJElv+rRaP99pandYGPBFluUoJK
 jI6D/mZSAsl6BXIZZCTcz9ttQNYacDdDL1I22WIdntkkCUsoP1awuHnXS4WLAiPA
 3D1XmiSHTGv9nbqd2wKA3sHWFaUN/s+RqsvIXUcNrN4qEUL2KXPOyt7i4Pz8eUq7
 /lxG1S854QUrZsW7qFR6Jc3clhNupUDWBBRrII3MOV98W3upHOVsFAdtcLZAkcTA
 FBCk90VVi0hsPeHqqCZjKUW/JE414vEDzOZfmYOpMj0xzc9sF1POxlvqMfRjvKTD
 fwYAkNLT7AW74jSU38KZl9pdLsQx187c75pcYUZ4ph6m6bozbtlPGkJaa3M3yyUb
 DmWUZL1dX8i69dV+w/ABz/4uK0dbCwMeAM9cCCUiedqI5wIuEV4Ld3NOLe7HnAXz
 O81UrVVnwrOVl8uPmGzY3VliU0OtFqhMBCxTgsZRxuBr9DSx/dUzb4Tzfw7e3Sc+
 +uxMMnbuCDysqEfzKsLG0+qu9P3hcwyafTJ9u8pq0D+BCB3tA8ZIKiqrKeMBua2h
 IJAy8M+zVG2HZCQFreH2oMnXmzVoyp+5Yl28izMDEfDptJ6hel4FLBiLLlYnU1DO
 zxUN5X1yOHxFkK4YYetE
 =/PTZ
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:

 - mostly cxgb4 fixes unblocked by the merge of some prerequisites via
   the net tree

 - drop deprecated MSI-X API use.

 - a couple other miscellaneous things.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cxgb4: Fix over-dereference when terminating
  RDMA/cxgb4: Use uninitialized_var()
  RDMA/cxgb4: Add missing debug stats
  RDMA/cxgb4: Initialize reserved fields in a FW work request
  RDMA/cxgb4: Use pr_warn_ratelimited
  RDMA/cxgb4: Max fastreg depth depends on DSGL support
  RDMA/cxgb4: SQ flush fix
  RDMA/cxgb4: rmb() after reading valid gen bit
  RDMA/cxgb4: Endpoint timeout fixes
  RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices
  IB/mlx5: Add block multicast loopback support
  IB/mthca: Use pci_enable_msix_exact() instead of pci_enable_msix()
  IB/qib: Use pci_enable_msix_range() instead of pci_enable_msix()
2014-04-18 13:49:42 -07:00
Linus Torvalds 141eaccd01 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Here are the target pending updates for v3.15-rc1.  Apologies in
  advance for waiting until the second to last day of the merge window
  to send these out.

  The highlights this round include:

   - iser-target support for T10 PI (DIF) offloads (Sagi + Or)
   - Fix Task Aborted Status (TAS) handling in target-core (Alex Leung)
   - Pass in transport supported PI at session initialization (Sagi + MKP + nab)
   - Add WRITE_INSERT + READ_STRIP T10 PI support in target-core (nab + Sagi)
   - Fix iscsi-target ERL=2 ASYNC_EVENT connection pointer bug (nab)
   - Fix tcm_fc use-after-free of ft_tpg (Andy Grover)
   - Use correct ib_sg_dma primitives in ib_isert (Mike Marciniszyn)

  Also, note the virtio-scsi + vhost-scsi changes to expose T10 PI
  metadata into KVM guest have been left-out for now, as there where a
  few comments from MST + Paolo that where not able to be addressed in
  time for v3.15.  Please expect this feature for v3.16-rc1"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (43 commits)
  ib_srpt: Use correct ib_sg_dma primitives
  target/tcm_fc: Rename ft_tport_create to ft_tport_get
  target/tcm_fc: Rename ft_{add,del}_lport to {add,del}_wwn
  target/tcm_fc: Rename structs and list members for clarity
  target/tcm_fc: Limit to 1 TPG per wwn
  target/tcm_fc: Don't export ft_lport_list
  target/tcm_fc: Fix use-after-free of ft_tpg
  target: Add check to prevent Abort Task from aborting itself
  target: Enable READ_STRIP emulation in target_complete_ok_work
  target/sbc: Add sbc_dif_read_strip software emulation
  target: Enable WRITE_INSERT emulation in target_execute_cmd
  target/sbc: Add sbc_dif_generate software emulation
  target/sbc: Only expose PI read_cap16 bits when supported by fabric
  target/spc: Only expose PI mode page bits when supported by fabric
  target/spc: Only expose PI inquiry bits when supported by fabric
  target: Pass in transport supported PI at session initialization
  target/iblock: Fix double bioset_integrity_free bug
  Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist
  target/rd: T10-Dif: RAM disk is allocating more space than required.
  iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug
  ...
2014-04-12 16:51:08 -07:00
Mike Marciniszyn b076808051 ib_srpt: Use correct ib_sg_dma primitives
The code was incorrectly using sg_dma_address() and
sg_dma_len() instead of ib_sg_dma_address() and
ib_sg_dma_len().

This prevents srpt from functioning with the
Intel HCA and indeed will corrupt memory
badly.

Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Vinod Kumar <vinod.kumar@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-11 15:31:20 -07:00
Roland Dreier 5ae2866f52 Merge branches 'cxgb4', 'misc', 'mlx5' and 'qib' into for-next 2014-04-11 11:36:15 -07:00
Steve Wise 1d1ca9b4fd RDMA/cxgb4: Fix over-dereference when terminating
Need to get the endpoint reference before calling rdma_fini(), which
might fail causing us to not get the reference.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:10 -07:00
Steve Wise 97df1c6736 RDMA/cxgb4: Use uninitialized_var()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:10 -07:00
Steve Wise 98a3e87990 RDMA/cxgb4: Add missing debug stats
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:09 -07:00
Steve Wise c3f98fa291 RDMA/cxgb4: Initialize reserved fields in a FW work request
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:09 -07:00
Hariprasad Shenai aec844df10 RDMA/cxgb4: Use pr_warn_ratelimited
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:08 -07:00
Steve Wise a03d9f94cc RDMA/cxgb4: Max fastreg depth depends on DSGL support
The max depth of a fastreg mr depends on whether the device supports
DSGL or not.  So compute it dynamically based on the device support
and the module use_dsgl option.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:08 -07:00
Steve Wise b4e2901c52 RDMA/cxgb4: SQ flush fix
There is a race when moving a QP from RTS->CLOSING where a SQ work
request could be posted after the FW receives the RDMA_RI/FINI WR.
The SQ work request will never get processed, and should be completed
with FLUSHED status.  Function c4iw_flush_sq(), however was dropping
the oldest SQ work request when in CLOSING or IDLE states, instead of
completing the pending work request. If that oldest pending work
request was actually complete and has a CQE in the CQ, then when that
CQE is proceessed in poll_cq, we'll BUG_ON() due to the inconsistent
SQ/CQ state.

This is a very small timing hole and has only been hit once so far.

The fix is two-fold:

1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED
   status regardless of the QP state.

2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue
   before moving the state out of RTS.  This ensures that the state
   transition will not happen while another thread is in
   post_rc_send(), because set_state() and post_rc_send() both aquire
   the qp spinlock.  Also, once we transition the state out of RTS,
   subsequent calls to post_rc_send() will fail because the "in error"
   bit is set.  I don't think this fully closes the race where the FW
   can get a FINI followed a SQ work request being posted (because
   they are posted to differente EQs), but the #1 fix will handle the
   issue by flushing the SQ work request.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:08 -07:00
Steve Wise def4771f4b RDMA/cxgb4: rmb() after reading valid gen bit
Some HW platforms can reorder read operations, so we must rmb() after
we see a valid gen bit in a CQE but before we read any other fields
from the CQE.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:07 -07:00
Steve Wise b33bd0cbfa RDMA/cxgb4: Endpoint timeout fixes
1) timedout endpoint processing can be starved. If there are continual
   CPL messages flowing into the driver, the endpoint timeout
   processing can be starved.  This condition exposed the other bugs
   below.

Solution: In process_work(), call process_timedout_eps() after each CPL
is processed.

2) Connection events can be processed even though the endpoint is on
   the timeout list.  If the endpoint is scheduled for timeout
   processing, then we must ignore MPA Start Requests and Replies.

Solution: Change stop_ep_timer() to return 1 if the ep has already been
queued for timeout processing.  All the callers of stop_ep_timer() need
to check this and act accordingly.  There are just a few cases where
the caller needs to do something different if stop_ep_timer() returns 1:

1) in process_mpa_reply(), ignore the reply and  process_timeout()
   will abort the connection.

2) in process_mpa_request, ignore the request and process_timeout()
   will abort the connection.

It is ok for callers of stop_ep_timer() to abort the connection since
that will leave the state in ABORTING or DEAD, and process_timeout()
now ignores timeouts when the ep is in these states.

3) Double insertion on the timeout list.  Since the endpoint timers
   are used for connection setup and teardown, we need to guard
   against the possibility that an endpoint is already on the timeout
   list.  This is a rare condition and only seen under heavy load and
   in the presense of the above 2 bugs.

Solution: In ep_timeout(), don't queue the endpoint if it is already on
the queue.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:07 -07:00
Steve Wise fa658a98a2 RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices
Signed-off-by: Steve Wise <swise@opengridcomputing.com>

[ Fix cast from u64* to integer.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:01 -07:00
Eli Cohen f360d88a2e IB/mlx5: Add block multicast loopback support
Add support for the block multicast loopback QP creation flag along
the proper firmware API for that.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-10 18:43:32 -07:00
Alexander Gordeev 9684c2ea6d IB/mthca: Use pci_enable_msix_exact() instead of pci_enable_msix()
As result of the deprecation of the MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block(), all drivers using these
two interfaces need to be updated to use the new
pci_enable_msi_range() or pci_enable_msi_exact() and
pci_enable_msix_range() or pci_enable_msix_exact() interfaces.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-10 18:41:34 -07:00
Alexander Gordeev bf3f043e7b IB/qib: Use pci_enable_msix_range() instead of pci_enable_msix()
As result of the deprecation of the MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block(), all drivers using these
two interfaces need to be updated to use the new pci_enable_msi_range()
and pci_enable_msix_range() interfaces.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-10 18:39:13 -07:00
Nicholas Bellinger e70beee783 target: Pass in transport supported PI at session initialization
In order to support local WRITE_INSERT + READ_STRIP operations for
non PI enabled fabrics, the fabric driver needs to be able signal
what protection offload operations are supported.

This is done at session initialization time so the modes can be
signaled by individual se_wwn + se_portal_group endpoints, as well
as optionally across different transports on the same endpoint.

For iser-target, set TARGET_PROT_ALL if the underlying ib_device
has already signaled PI offload support, and allow this to be
exposed via a new iscsit_transport->iscsit_get_sup_prot_ops()
callback.

For loopback, set TARGET_PROT_ALL to signal SCSI initiator mode
operation.

For all other drivers, set TARGET_PROT_NORMAL to disable fabric
level PI.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:54 -07:00
Sagi Grimberg f225225848 Target/iser: Use Fastreg only if device supports signature
Fastreg is mandatory for signature, so if the
device doesn't support it we don't need to use it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:52 -07:00
Nicholas Bellinger 03e7848a64 iser-target: Add missing se_cmd put for WRITE_PENDING in tx_comp_err
This patch fixes a bug where outstanding RDMA_READs with WRITE_PENDING
status require an extra target_put_sess_cmd() in isert_put_cmd() code
when called from isert_cq_tx_comp_err() + isert_cq_drain_comp_llist()
context during session shutdown.

The extra kref PUT is required so that transport_generic_free_cmd()
invokes the last target_put_sess_cmd() -> target_release_cmd_kref(),
which will complete(&se_cmd->cmd_wait_comp) the outstanding se_cmd
descriptor with WRITE_PENDING status, and awake the completion in
target_wait_for_sess_cmds() to invoke TFO->release_cmd().

The bug was manifesting itself in target_wait_for_sess_cmds() where
a se_cmd descriptor with WRITE_PENDING status would end up sleeping
indefinately.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:51 -07:00
Nicholas Bellinger 131e6abc67 target: Add TFO->abort_task for aborted task resources release
Now that TASK_ABORTED status is not generated for all cases by
TMR ABORT_TASK + LUN_RESET, a new TFO->abort_task() caller is
necessary in order to give fabric drivers a chance to unmap
hardware / software resources before the se_cmd descriptor is
released via the normal TFO->release_cmd() codepath.

This patch adds TFO->aborted_task() in core_tmr_abort_task()
in place of the original transport_send_task_abort(), and
also updates all fabric drivers to implement this caller.

The fabric drivers that include changes to perform cleanup
via ->aborted_task() are:

  - iscsi-target
  - iser-target
  - srpt
  - tcm_qla2xxx

The fabric drivers that currently set ->aborted_task() to
NOPs are:

  - loopback
  - tcm_fc
  - usb-gadget
  - sbp-target
  - vhost-scsi

For the latter five, there appears to be no additional cleanup
required before invoking TFO->release_cmd() to release the
se_cmd descriptor.

v2 changes:
  - Move ->aborted_task() call into transport_cmd_finish_abort (Alex)

Cc: Alex Leung <amleung21@yahoo.com>
Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Vu Pham <vu@mellanox.com>
Cc: Chris Boot <bootc@bootc.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Cc: Saurav Kashyap <saurav.kashyap@qlogic.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:51 -07:00
Nicholas Bellinger f46d6a8a01 iser-target: Match FRMR descriptors to available session tags
This patch changes isert_conn_create_fastreg_pool() to follow
logic in iscsi_target_locate_portal() for determining how many
FRMR descriptors to allocate based upon the number of possible
per-session command slots that are available.

This addresses an OOPs in isert_reg_rdma() where due to the
use of ISCSI_DEF_XMIT_CMDS_MAX could end up returning a bogus
fast_reg_descriptor when the number of active tags exceeded
the original hardcoded max.

Note this also includes moving isert_conn_create_fastreg_pool()
from isert_connect_request() to isert_put_login_tx() before
posting the final Login Response PDU in order to determine the
se_nacl->queue_depth (eg: number of tags) per session the target
will be enforcing.

v2 changes:
  - Move isert_conn->conn_fr_pool list_head init into
    isert_conn_request()
v3 changes:
  - Drop unnecessary list_empty() check in isert_reg_rdma()
    (Sagi)

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:50 -07:00
Sagi Grimberg 5bac4b1a1f Target/iser: Fail SCSI WRITE command if device detected integrity error
If during data-transfer a data-integrity error was detected we
must fail the command with CHECK_CONDITION and not execute
the command.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:49 -07:00
Sagi Grimberg 96b7973e1c Target/iser: Move check signature status to a function
Remove code duplication from RDMA_READ and RDMA_WRITE
completions that do basically the same check.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:49 -07:00
Sagi Grimberg 897bb2c916 Target/iser: Consider DIF and RDMA_READ completions when calculating post_send counter
If protection is involved, iSER target must wait for
completion of RDMA_READ before sending SCSI response.
So we must consider that when calculating post_send_buf_count
additions, also when processing good/error completions.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:48 -07:00
Sagi Grimberg c2caa20777 Target/iser: Fix signature work requests accounting
As REG_SIG_MR work request and it's LOCAL_INVALIDATE are
not accounted in post_send_buf_count we must color these
with ISER_FASTREG_LI_WRID in order to process their error
completions when the QP flushes.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:48 -07:00
Sagi Grimberg 9e961ae73c IB/isert: Support T10-PI protected transactions
In case the Target core passed transport T10 protection
operation:

1. Register data buffer (data memory region)
2. Register protection buffer if exsists (prot memory region)
3. Register signature region (signature memory region)
   - use work request IB_WR_REG_SIG_MR
4. Execute RDMA
5. Upon RDMA completion check the signature status
   - if succeeded send good SCSI response
   - if failed send SCSI bad response with appropriate sense buffer

(Fix up compile error in isert_reg_sig_mr, and fix up incorrect
 se_cmd->prot_type -> TARGET_PROT_NORMAL comparision - nab)
(Fix failed sector assignment in isert_completion_rdma_* - Sagi + nab)
(Fix enum assignements for protection type - Sagi)
(Fix devision on 32-bit in isert_completion_rdma_* - Sagi + Fengguang)
(Fix context change for v3.14-rc6 code - nab)
(Fix iscsit_build_rsp_pdu inc_statsn flag usage - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:47 -07:00
Sagi Grimberg f93f3a70da IB/isert: Accept RDMA_WRITE completions
In case of protected transactions, we will need to check the
protection status of the transaction before sending SCSI response.
So be ready for RDMA_WRITE completions. currently we don't ask
for these completions, but for T10-PI we will.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:46 -07:00
Sagi Grimberg d3e125dac1 IB/isert: Initialize T10-PI resources
Introduce pi_context to hold relevant RDMA protection resources.
We eliminate data_key_valid boolean and replace it with indicators
container to indicate:
- Is the descriptor protected (registered via signature MR)
- Is the data_mr key valid (can spare LOCAL_INV WR)
- Is the prot_mr key valid (can spare LOCAL_INV WR)
- Is the sig_mr key valid (can spare LOCAL_INV WR)

Upon connection establishment check if network portal is T10-PI
enabled and allocate T10-PI resources if necessary, allocate
signature enabled memory regions and mark connection queue-pair
as signature enabled.

(Fix context change for v3.14-rc6 code - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:46 -07:00
Sagi Grimberg e3d7e4c30c IB/isert: Introduce isert_map/unmap_data_buf
export map/unmap data buffer to a routine that may
be used in various places in the code and keep the
mapping data in a designated descriptor. Also, let
isert_fast_reg_mr to decide weather to use global
MR or do fast registration.

This commit does not change any functionality.

(Fix context change for v3.14-rc6 code - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:45 -07:00
Linus Torvalds 877f075aac Main batch of InfiniBand/RDMA changes for 3.15:
- The biggest change is core API extensions and mlx5 low-level driver
    support for handling DIF/DIX-style protection information, and the
    addition of PI support to the iSER initiator.  Target support will be
    arriving shortly through the SCSI target tree.
 
  - A nice simplification to the "umem" memory pinning library now that
    we have chained sg lists.  Kudos to Yishai Hadas for realizing our
    code didn't have to be so crazy.
 
  - Another nice simplification to the sg wrappers used by qib, ipath and
    ehca to handle their mapping of memory to adapter.
 
  - The usual batch of fixes to bugs found by static checkers etc. from
    intrepid people like Dan Carpenter and Yann Droneaud.
 
  - A large batch of cxgb4, ocrdma, qib driver updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTPYBnAAoJEENa44ZhAt0hGI4P/29eotGwpkANUQE6FQvxCUL2
 CXJtSg52lmYvGJrPK4IhihpbtQmHJz3iXEzlOOWidTw1dJgObR6vFaRymh7+vDLs
 CdzybMcXdasarqTuYeJbFzhkimpwtWWrMy/8Ik/Jj/5glGQ6cUSpdYZzVtFhYNqf
 hCGE8iLi+tuekJJj1htut5D6apXM7udcdc2yLJNOdsSj/VUXt1oqG1x9xAi9R8Tq
 7o8eFSStdlja0EBQ6Hli2zauCSnQkaUtr8h6EAFbcCtvBK8HqsHSc2gfq2ViFUiN
 ztt167oWoQnVkR0qCPL5nVt+CRQHHROprVXvbpcTI3aW61gNIl6OrUUOXefzHXac
 TNi+fdMpiEB/JQ4Z04Jzd1dGCSjYeTqPj4rO4meFjBmxRDdTgZHu7FWwejT1nYJ5
 d2abVdCOT+QWlIlM7m/pjdWJII5OYM+4/jtTayGepEaR4fTUzKtPZPBLNUBDBKE+
 4f92PC8LiuPkwJgb6XT96onPz1bDCOnPSEdwoKUFKPeGUcwgVOM/Wx5NU4Yf7rfg
 RxQwZ7mJXbjCYFlmGGo/0QDy6UEGkIFYlJSzooP+wlK1JvZ5h2M+9QKX2FtwzR+R
 I2kBxcTXWsM/h88R7MkNqbNIllmhssrJwmAE46OneZbfoBOB+JZjb4nLRTu0jEcS
 zn6f16GmJ37BKn2/qYY/
 =Ww6H
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "Main batch of InfiniBand/RDMA changes for 3.15:

   - The biggest change is core API extensions and mlx5 low-level driver
     support for handling DIF/DIX-style protection information, and the
     addition of PI support to the iSER initiator.  Target support will
     be arriving shortly through the SCSI target tree.

   - A nice simplification to the "umem" memory pinning library now that
     we have chained sg lists.  Kudos to Yishai Hadas for realizing our
     code didn't have to be so crazy.

   - Another nice simplification to the sg wrappers used by qib, ipath
     and ehca to handle their mapping of memory to adapter.

   - The usual batch of fixes to bugs found by static checkers etc.
     from intrepid people like Dan Carpenter and Yann Droneaud.

   - A large batch of cxgb4, ocrdma, qib driver updates"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (102 commits)
  RDMA/ocrdma: Unregister inet notifier when unloading ocrdma
  RDMA/ocrdma: Fix warnings about pointer <-> integer casts
  RDMA/ocrdma: Code clean-up
  RDMA/ocrdma: Display FW version
  RDMA/ocrdma: Query controller information
  RDMA/ocrdma: Support non-embedded mailbox commands
  RDMA/ocrdma: Handle CQ overrun error
  RDMA/ocrdma: Display proper value for max_mw
  RDMA/ocrdma: Use non-zero tag in SRQ posting
  RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr()
  RDMA/ocrdma: Increment abi version count
  RDMA/ocrdma: Update version string
  be2net: Add abi version between be2net and ocrdma
  RDMA/ocrdma: ABI versioning between ocrdma and be2net
  RDMA/ocrdma: Allow DPP QP creation
  RDMA/ocrdma: Read ASIC_ID register to select asic_gen
  RDMA/ocrdma: SQ and RQ doorbell offset clean up
  RDMA/ocrdma: EQ full catastrophe avoidance
  RDMA/cxgb4: Disable DSGL use by default
  RDMA/cxgb4: rx_data() needs to hold the ep mutex
  ...
2014-04-03 16:57:19 -07:00
Roland Dreier f7eaa7ed8f Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next 2014-04-03 08:30:17 -07:00
Selvin Xavier 2d8f57d56f RDMA/ocrdma: Unregister inet notifier when unloading ocrdma
Unregister the inet notifier during ocrdma unload to avoid a panic after
driver unload.

Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:07 -07:00
Roland Dreier 7a1e89d8b7 RDMA/ocrdma: Fix warnings about pointer <-> integer casts
We should cast pointers to and from unsigned long to turn them into ints.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:07 -07:00
Devesh Sharma fad51b7d36 RDMA/ocrdma: Code clean-up
Clean up code.  Also modifying GSI QP to error during ocrdma_close is fixed.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:06 -07:00
Selvin Xavier 334b8db3a6 RDMA/ocrdma: Display FW version
Adding a sysfs file for getting the FW version.

Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:06 -07:00
Selvin Xavier a51f06e167 RDMA/ocrdma: Query controller information
Issue mailbox commands to query ocrdma controller information and phy
information and print them while adding ocrdma device.

Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:05 -07:00
Selvin Xavier bbc5ec524e RDMA/ocrdma: Support non-embedded mailbox commands
Added a routine to issue non-embedded mailbox commands for handling
large mailbox request/response data.

Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:05 -07:00
Selvin Xavier 1228056bcf RDMA/ocrdma: Handle CQ overrun error
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:05 -07:00
Selvin Xavier ac578aef8b RDMA/ocrdma: Display proper value for max_mw
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:04 -07:00
Selvin Xavier cf5788ade7 RDMA/ocrdma: Use non-zero tag in SRQ posting
As part of SRQ receive buffers posting we populate a non-zero tag
which will be returned in SRQ receive completions.

Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:04 -07:00
Selvin Xavier 9d1878a369 RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr()
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:03 -07:00
Devesh Sharma 2e6e9f2bb8 RDMA/ocrdma: Increment abi version count
Increment the ABI version count for driver/library interface.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:30:02 -07:00
Devesh Sharma 0154410bd4 RDMA/ocrdma: Update version string
Update the driver vrsion string and node description string

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:29:59 -07:00
Devesh Sharma b6b87d2e69 RDMA/ocrdma: ABI versioning between ocrdma and be2net
While loading RoCE driver be2net driver should check for ABI version
to catch functional incompatibilities.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:29:51 -07:00
Devesh Sharma 1eebbb6ec3 RDMA/ocrdma: Allow DPP QP creation
Allow creating DPP QP even if inline-data is not requested.  This is an
optimization to lower latency.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:29:44 -07:00
Devesh Sharma 21c3391a9a RDMA/ocrdma: Read ASIC_ID register to select asic_gen
ocrdma driver selects execution path based on sli_family and asic
generation number.  This introduces code to read the asic gen number
from pci register instead of obtaining it from the Emulex NIC driver.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:29:40 -07:00
Devesh Sharma 2df84fa87f RDMA/ocrdma: SQ and RQ doorbell offset clean up
Introducing new macros to define SQ and RQ doorbell offset.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:29:36 -07:00
Devesh Sharma ea61762679 RDMA/ocrdma: EQ full catastrophe avoidance
Stale entries in the CQ being destroyed causes hardware to generate
EQEs indefinitely for a given CQ.  Thus causing uncontrolled execution
of irq_handler.  This patch fixes this using following sementics:

    * irq_handler will ring EQ doorbell atleast once and implement budgeting scheme.
    * cq_destroy will count number of valid entires during destroy and ring
      cq-db so that hardware does not generate uncontrolled EQE.
    * cq_destroy will synchronize with last running irq_handler instance.
    * arm_cq will always defer arming CQ till poll_cq, except for the first arm_cq call.
    * poll_cq will always ring cq-db with arm=SET if arm_cq was called prior to enter poll_cq.
    * poll_cq will always ring cq-db with arm=UNSET if arm_cq was not called prior to enter poll_cq.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-03 08:29:34 -07:00
Steve Wise 96bb2706c8 RDMA/cxgb4: Disable DSGL use by default
Current hardware doesn't correctly support DSGL.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-02 08:53:54 -07:00
Steve Wise c529fb5046 RDMA/cxgb4: rx_data() needs to hold the ep mutex
To avoid racing with other threads doing close/flush/whatever, rx_data()
should hold the endpoint mutex.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-02 08:53:54 -07:00
Steve Wise 977116c698 RDMA/cxgb4: Drop RX_DATA packets if the endpoint is gone
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-02 08:53:53 -07:00
Steve Wise a7db89eb89 RDMA/cxgb4: Lock around accept/reject downcalls
There is a race between ULP threads doing an accept/reject, and the
ingress processing thread handling close/abort for the same connection.
The accept/reject path needs to hold the lock to serialize these paths.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>

[ Fold in locking fix found by Dan Carpenter <dan.carpenter@oracle.com>.
  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-02 08:52:45 -07:00
Moni Shoua b2853fd6c2 IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler
The code that resolves the passive side source MAC within the rdma_cm
connection request handler was both redundant and buggy, so remove it.

It was redundant since later, when an RC QP is modified to RTR state,
the resolution will take place in the ib_core module.  It was buggy
because this callback also deals with UD SIDR exchange, for which we
incorrectly looked at the REQ member of the CM event and dereferenced
a random value.

Fixes: dd5f03beb4 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 14:05:26 -07:00
Mike Marciniszyn f3585a6ae3 IB/ehca: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
These methods appear to only mimic the sg_dma_address() and
sg_dma_len() behavior.

They can be safely removed.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:16:31 -07:00
Mike Marciniszyn 49c5c27e05 IB/ipath: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
The removal of these methods is compensated for by code changes to
.map_sg to insure that the vanilla sg_dma_address() and sg_dma_len()
will do the same thing as the equivalent former ib_sg_dma_address()
and ib_sg_dma_len() calls into the drivers.

The introduction of this patch required that the struct
ipath_dma_mapping_ops be converted to a C99 initializer.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:16:31 -07:00
Mike Marciniszyn 446bf432a9 IB/qib: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
Remove the overload for .dma_len and .dma_address

The removal of these methods is compensated for by code changes to
.map_sg to insure that the vanilla sg_dma_address() and sg_dma_len()
will do the same thing as the equivalent former ib_sg_dma_address()
and ib_sg_dma_len() calls into the drivers.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Vinod Kumar <vinod.kumar@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:16:31 -07:00
Or Gerlitz 5de2ad986d IB/iser: Bump driver version to 1.3
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Or Gerlitz 3ee07d27ac IB/iser: Update Mellanox copyright note
Update Mellanox copyrights for 2014 on the iser initiator driver.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Or Gerlitz 4f9208ad3f IB/iser: Print QP information once connection is established
Add an iser info print with the local/remote QP information carried
out when the connection is established.  While here, fix a little
leftover from the T10 work and set a debug print to be carried in
debug and not info level.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Ariel Nahum 4667f5dfb0 IB/iser: Remove struct iscsi_iser_conn
The iscsi stack has existing mechanisms to link back and forth between
the iscsi connection and the iscsi transport (e.g iser/tcp) connection.

This is done through a dd_data pointer field in struct iscsi_conn
which can be set to point to the transport connection, etc.

The iscsi_iser_conn structure was used to get this linking done in
another way, which is uneeded and adds extra complication to the iser
code, so we just remove it.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Roi Dayan 1d6c2b736f IB/iser: Drain the tx cq once before looping on the rx cq
The iser disconnection flow isn't done before all the inflight
recv/send buffers posted to the QP are either flushed or normally
completed to the CQ that serves this connection.  The condition check
is done in iser_handle_comp_error().

Currently, it's possible for the send buffer completion that makes the
posted send buffers counter reach zero to be polled in the drain tx
call, which is after the rx cq is fully drained.  Since this
completion might be not an error one (for example, it might be a
completion of the logout request iSCSI PDU) we will skip
iser_handle_comp_error().  So the connection will never terminate from
the iscsi stack point of view, and we hang.

To resolve this race, do the draining of the tx cq before the loop on
the rx cq.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Randy Dunlap 39c978cd17 IB/iser: Fix sector_t format warning
Fix pr_err (printk) format warning:

    drivers/infiniband/ulp/iser/iser_verbs.c:1181:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'sector_t' [-Wformat]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:01:08 -07:00
Dan Carpenter 4661bd798f mlx4_core: Make buffer larger to avoid overflow warning
My static checker complains that the sprintf() here can overflow.

	drivers/infiniband/hw/mlx4/main.c:1836 mlx4_ib_alloc_eqs()
	error: format string overflow. buf_size: 32 length: 69

This seems like a valid complaint.  The "dev->pdev->bus->name" string
can be 48 characters long.  I just made the buffer 80 characters instead
of 69 and I changed the sprintf() to snprintf().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 10:53:29 -07:00
Dan Carpenter 3839d8ac1b mlx4_core: Fix some indenting in mlx4_ib_add()
The code was indented too far and also kernel style says we should have
curly braces.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 10:52:18 -07:00
Yan Burman 2c34e68f42 IB/mad: Check and handle potential DMA mapping errors
Running with DMA_API_DEBUG enabled and not checking for DMA mapping
errors triggers a kernel stack trace with "DMA-API: device driver
failed to check map error" message.  Add these checks to the MAD
module, both to be be more robust and also eliminate these
false-positive stack traces.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 10:36:07 -07:00
Yann Droneaud 5bdb0f02ad IB/ehca: Returns an error on ib_copy_to_udata() failure
In case of error when writing to userspace, function ehca_create_cq()
does not set an error code before following its error path.

This patch sets the error code to -EFAULT when ib_copy_to_udata()
fails.

This was caught when using spatch (aka. coccinelle)
to rewrite call to ib_copy_{from,to}_udata().

Link: 75ebf2c103:ib_copy_udata.cocci
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 10:36:07 -07:00
Yann Droneaud 08e74c4b00 IB/mthca: Return an error on ib_copy_to_udata() failure
In case of error when writing to userspace, the function mthca_create_cq()
does not set an error code before following its error path.

This patch sets the error code to -EFAULT when ib_copy_to_udata() fails.

This was caught when using spatch (aka. coccinelle)
to rewrite call to ib_copy_{from,to}_udata().

Link: 75ebf2c103:ib_copy_udata.cocci
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 10:35:42 -07:00
Yann Droneaud bfd2793c95 RDMA/cxgb4: set error code on kmalloc() failure
If kmalloc() fails in c4iw_alloc_ucontext(), the function
leaves but does not set an error code in ret variable:
it will return 0 to the caller.

This patch set ret to -ENOMEM in such case.

Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Steve Wise <swise@chelsio.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 14:55:21 -04:00
Matan Barak e471b40321 mlx4: Use actual number of PCI functions (PF + VFs) for alias GUID logic
The code which is dealing with SRIOV alias GUIDs in the mlx4 IB driver has some
logic which operated according to the maximal possible active functions (PF + VFs).

After the single port VFs code integration this resulted in a flow of false-positive
warnings going to the kernel log after the PF driver started the alias GUID work.

Fix it by referring to the actual number of functions.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 20:48:05 -04:00
Steve Wise 9c88aa003d RDMA/cxgb4: Update snd_seq when sending MPA messages
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:07:35 -07:00
Steve Wise be13b2dff8 RDMA/cxgb4: Connect_request_upcall fixes
When processing an MPA Start Request, if the listening endpoint is
DEAD, then abort the connection.

If the IWCM returns an error, then we must abort the connection and
release resources.  Also abort_connection() should not post a CLOSE
event, so clean that up too.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:07:35 -07:00
Steve Wise 70b9c66053 RDMA/cxgb4: Ignore read reponse type 1 CQEs
These are generated by HW in some error cases and need to be
silently discarded.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:07:35 -07:00
Steve Wise 1ce1d471ac RDMA/cxgb4: Fix possible memory leak in RX_PKT processing
If cxgb4_ofld_send() returns < 0, then send_fw_pass_open_req() must
free the request skb and the saved skb with the tcp header.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:07:35 -07:00
Steve Wise dbb084cc5f RDMA/cxgb4: Don't leak skb in c4iw_uld_rx_handler()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:07:35 -07:00
Bart Van Assche b3fe628da2 IB/srp: Fix a race condition between failing I/O and I/O completion
Avoid that srp_terminate_io() can access req->scmnd after it has been
cleared by the I/O completion code. Do this by protecting req->scmnd
accesses from srp_terminate_io() via locking

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche ac72d766a8 IB/srp: Avoid that writing into "add_target" hangs due to a cable pull
If a cable is pulled while srp_connect_target() is in progress
that can result in that function never to return. That makes the
process, e.g. srp_daemon, that invoked this function unkillable.
Avoid this by letting srp_connect_target() finish if the event
IB_CM_TIMEWAIT_EXIT is received. This patch fixes a hang with the
following call trace:

 [<ffffffff814eae85>] schedule_timeout+0x215/0x2e0
 [<ffffffff814eab03>] wait_for_common+0x123/0x180
 [<ffffffff814eac1d>] wait_for_completion+0x1d/0x20
 [<ffffffffa03b398c>] srp_connect_target+0x1dc/0x410 [ib_srp]
 [<ffffffffa03b5809>] srp_create_target+0xba9/0xe70 [ib_srp]
 [<ffffffff8133e590>] dev_attr_store+0x20/0x30
 [<ffffffff811eb8f5>] sysfs_write_file+0xe5/0x170
 [<ffffffff811767c8>] vfs_write+0xb8/0x1a0
 [<ffffffff811770c1>] sys_write+0x51/0x90
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche a702adceec IB/srp: Make writing into the "add_target" sysfs attribute interruptible
Avoid that stopping srp_daemon takes unusually long due to a cable
pull by making writing into the "add_target" sysfs attribute
interruptible.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche 2d7091bcf6 IB/srp: Avoid duplicate connections
The connection uniqueness check is performed before a new connection
is added to the target list. This patch protects both actions by a
mutex such that simultaneous writes from two different threads into the
"add_target" variable do not result in duplicate connections.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche e7ffde0164 IB/srp: Add more logging
Log sgid and dgid when reporting that a login has been rejected or when
a host has been added. This makes it easy to figure out which initiator
and target ports these messages apply to.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:30 -07:00
Sagi Grimberg 2088ca66f5 IB/srp: Check ib_query_gid return value
Detected by Coverity.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:30 -07:00
Matan Barak 449fc48866 net/mlx4: Adapt code for N-Port VF
Adds support for N-Port VFs, this includes:
1. Adding support in the wrapped FW command
	In wrapped commands, we need to verify and convert
	the slave's port into the real physical port.
	Furthermore, when sending the response back to the slave,
	a reverse conversion should be made.
2. Adjusting sqpn for QP1 para-virtualization
	The slave assumes that sqpn is used for QP1 communication.
	If the slave is assigned to a port != (first port), we need
	to adjust the sqpn that will direct its QP1 packets into the
	correct endpoint.
3. Adjusting gid[5] to modify the port for raw ethernet
	In B0 steering, gid[5] contains the port. It needs
	to be adjusted into the physical port.
4. Adjusting number of ports in the query / ports caps in the FW commands
	When a slave queries the hardware, it needs to view only
	the physical ports it's assigned to.
5. Adjusting the sched_qp according to the port number
	The QP port is encoded in the sched_qp, thus in modify_qp we need
	to encode the correct port in sched_qp.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:18:30 -04:00
Matan Barak 82373701be IB/mlx4_ib: Adapt code to use caps.num_ports instead of a constant
Some code in the mlx4 IB driver stack assumed MLX4_MAX_PORTS ports.

Instead, we should only loop until the number of actual ports in i
the device, which is stored in dev->caps.num_ports.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:18:29 -04:00
Dan Carpenter 186f8ba062 IB/qib: Cleanup qib_register_observer()
Returning directly is easier to read than do-nothing gotos.  Remove the
duplicative check on "olp" and pull the code in one indent level.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:19:18 -07:00
CQ Tang 49c0e2414b IB/qib: Change SDMA progression mode depending on single- or multi-rail
Improve performance by changing the behavour of the driver when all
SDMA descriptors are in use, and the processes adding new descriptors
are single- or multi-rail.

For single-rail processes, the driver will block the call and finish
posting all SDMA descriptors onto the hardware queue before returning
back to PSM.  Repeated kernel calls are slower than blocking.

For multi-rail processes, the driver will return to PSM as quick as
possible so PSM can feed packets to other rail.  If all hardware
queues are full, PSM will buffer the remaining SDMA descriptors until
notified by interrupt that space is available.

This patch builds a red-black tree to track the number rails opened by
a particular PID. If the number is more than one, it is a multi-rail
PSM process, otherwise, it is a single-rail process.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: John A Gregor <john.a.gregor@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:19:12 -07:00
Steve Wise eda6d1d1b7 RDMA/cxgb4: Save the correct map length for fast_reg_page_lists
We cannot save the mapped length using the rdma max_page_list_len field
of the ib_fast_reg_page_list struct because the core code uses it.  This
results in an incorrect unmap of the page list in c4iw_free_fastreg_pbl().

I found this with dma mapping debugging enabled in the kernel.  The
fix is to save the length in the c4iw_fr_page_list struct.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:01:30 -07:00
Steve Wise df2d5130ec RDMA/cxgb4: Default peer2peer mode to 1
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:01:30 -07:00
Steve Wise ba32de9d8d RDMA/cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:01:30 -07:00
Steve Wise 8a9c399eee RDMA/cxgb4: Fix incorrect BUG_ON conditions
Based on original work from Jay Hernandez <jay@chelsio.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:01:30 -07:00
Steve Wise ebf00060c3 RDMA/cxgb4: Always release neigh entry
Always release the neigh entry in rx_pkt().

Based on original work by Santosh Rastapur <santosh@chelsio.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Steve Wise f8e819081f RDMA/cxgb4: Allow loopback connections
find_route() must treat loopback as a valid egress interface.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Steve Wise ffd435924c RDMA/cxgb4: Cap CQ size at T4_MAX_IQ_SIZE
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Dan Carpenter e24a72a330 RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()
There is a four byte hole at the end of the "uresp" struct after the
->qid_mask member.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Dan Carpenter ff1706f4fe RDMA/cxgb4: Fix underflows in c4iw_create_qp()
These sizes should be unsigned so we don't allow negative values and
have underflow bugs.  These can come from the user so there may be
security implications, but I have not tested this.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Sagi Grimberg 6192d4e6bb IB/iser: Publish T10-PI support to SCSI midlayer
After allocating a scsi_host we set protection types and guard type
supported.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg 0a7a08ad6f IB/iser: Implement check_protection
Once the iSCSI transaction is completed we must implement
check_protection in order to notify on DIF errors that may have
occured.

The routine boils down to calling ib_check_mr_status to get the
signature status of the transaction.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg 177e31bd5a IB/iser: Support T10-PI operations
Add logic to initialize protection information entities.  Upon each
iSCSI task, we keep the scsi_cmnd in order to query the scsi
protection operations and reference to protection buffers.

Modify iser_fast_reg_mr to receive indication whether it is
registering the data or protection buffers.

In addition introduce iser_reg_sig_mr which performs fast registration
work-request for a signature enabled memory region
(IB_WR_REG_SIG_MR).  In this routine we set all the protection
relevants for the device to offload protection data-transfer and
verification.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Alex Tabachnik 6b5a8fb0d2 IB/iser: Initialize T10-PI resources
During connection establishment we also initialize T10-PI resources
(QP, PI contexts) in order to support SCSI's protection operations.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Alex Tabachnik 7f73384752 IB/iser: Introduce pi_enable, pi_guard module parameters
Use modparams to activate protection information support.

pi_enable bool: Based on this parameter iSER will know if it should
support T10-PI.  We don't want to do this by default as it requires to
allocate and initialize extra resources.  In case pi_enable=N, iSER
won't publish to SCSI midlayer any DIF capabilities.

pi_guard int: Based on this parameter iSER will publish DIX guard type
support to SCSI midlayer.  0 means CRC is allowed to be passed in DIX
buffers, 1 (or non-zero) means IP-CSUM is allowed to be passed in DIX
buffers.  Note that over the wire, only CRC is allowed.

In the next phase, it is worth considering passing these parameters
from iscsid via nlmsg.  This will allow these parameters to be
connection based rather than global.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg 5f588e3d0c IB/iser: Generalize fall_to_bounce_buf routine
Unaligned SG-lists may also happen for protection information.
Generalize bounce buffer routine to handle any iser_data_buf which may
be data and/or protection.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg 9a8b08fad2 IB/iser: Generalize iser_unmap_task_data and finalize_rdma_unaligned_sg
This routines operates on data buffers and may also work with
protection infomation buffers.  So we generalize them to handle an
iser_data_buf which can be the command data or command protection
information.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg 73bc06b7ed IB/iser: Replace fastreg descriptor valid bool with indicators container
In T10-PI support we will have memory keys for protection buffers and
signature transactions.  We prefer to compact indicators rather than
keeping multiple bools.

This commit does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg 65198d6b84 IB/iser: Keep IB device attributes under iser_device
For T10-PI offload support, we will need to know the device signature
offload capability upon every connection establishment.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg 310b347c60 IB/iser: Move fast_reg_descriptor initialization to a function
fastreg descriptor will include protection information context.  In
order to place the logic in one place we introduce iser_create_fr_desc
function.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg d11ec4ecf0 IB/iser: Push the decision what memory key to use into fast_reg_mr routine
This is a preparation step for T10-PI offload support.  We prefer to
push the desicion of which mkey to use (global or fastreg) to
iser_fast_reg_mr.  We choose to do this since in T10-PI we may need to
register for protection buffers and in this case we wish to simplify
iser_fast_reg_mr instead of repeating the logic of which key to use.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg 7306b8fad4 IB/iser: Avoid FRWR notation, use fastreg instead
FRWR stands for "fast registration work request". We want to avoid
calling the fastreg pool with that name, instead we name it fastreg
which stands for "fast registration".

This pool will include more elements in the future, so it is a good
idea to generalize the name.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg db523b8de1 IB/iser: Suppress completions for fast registration work requests
In case iSER uses fast registration method, it should not request for
successful completions on fast registration nor local invalidate
requests.  We color wr_id with ISER_FRWR_LI_WRID in order to correctly
consume error completions.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:54 -07:00
Bart Van Assche 0e9855dbf4 IB/mlx4: Fix a sparse endianness warning
Fix the following warning for the mlx4 driver:

    $ make M=drivers/infiniband C=2 CF=-D__CHECK_ENDIAN__
    drivers/infiniband/hw/mlx4/qp.c:1885:31: warning: restricted __be16 degrades to integer

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:23:52 -07:00
Prarit Bhargava bc1b04ab34 RDMA/ocrdma: Fix compiler warning
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c: In function ‘_ocrdma_modify_qp’:
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1299:31: error: ‘old_qps’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  status = ocrdma_mbx_modify_qp(dev, qp, attr, attr_mask, old_qps);

ocrdma_mbx_modify_qp() (and subsequent calls) doesn't appear to use old_qps
so it doesn't need to be passed on.  Removing the variable results in the
warning going away.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Devesh Sharma (Devesh.sharma@emulex.com)
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:34:13 -07:00
Dan Carpenter 349850f0a9 RDMA/nes: Clean up a condition
We don't need to test "ret" twice and also the white space is messed up.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:29:37 -07:00
Dan Carpenter db498827ff IB/qib: Remove duplicate check in get_a_ctxt()
We already know "pusable" is non-zero, no need to check again.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:28:09 -07:00
Fabio Estevam 970918b32b IB/usnic: Remove '0x' when using %pa format
%pa format already prints in hexadecimal format, so remove the '0x' annotation
to avoid a double '0x0x' pattern.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:26:38 -07:00
Yann Droneaud 9d194d1025 IB/nes: Return an error on ib_copy_from_udata() failure instead of NULL
In case of error while accessing to userspace memory, function
nes_create_qp() returns NULL instead of an error code wrapped through
ERR_PTR().  But NULL is not expected by ib_uverbs_create_qp(), as it
check for error with IS_ERR().

As page 0 is likely not mapped, it is going to trigger an Oops when
the kernel will try to dereference NULL pointer to access to struct
ib_qp's fields.

In some rare cases, page 0 could be mapped by userspace, which could
turn this bug to a vulnerability that could be exploited: the function
pointers in struct ib_device will be under userspace total control.

This was caught when using spatch (aka. coccinelle)
to rewrite calls to ib_copy_{from,to}_udata().

Link: https://www.gitorious.org/opteya/ib-hw-nes-create-qp-null
Link: 75ebf2c103:ib_copy_udata.cocci
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:20:28 -07:00
Dennis Dalessandro 06064a103f IB/qib: Fix memory leak of recv context when driver fails to initialize.
In qib_create_ctxts() we allocate an array to hold recv contexts. Then attempt
to create data for those recv contexts. If that call to qib_create_ctxtdata()
fails then an error is returned but the previously allocated memory is not
freed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Yann Droneaud 8572de9732 IB/qib: fixup indentation in qib_ib_rcv()
Commit af061a644a add some code in qib_ib_rcv() which
trigger a warning from coccicheck (coccinelle/spatch):

$ make C=2 CHECK=scripts/coccicheck drivers/infiniband/hw/qib/

  CHECK   drivers/infiniband/hw/qib/qib_verbs.c
drivers/infiniband/hw/qib/qib_verbs.c:679:5-32: code aligned with following code on line 681
  CC [M]  drivers/infiniband/hw/qib/qib_verbs.o

In fact, according to similar code in qib_kreceive(),
qib_ib_rcv() code is correct but improperly indented.

This patch fix indentation for the misaligned portion.

Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: infinipath@intel.com
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: cocci@systeme.lip6.fr
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Yann Droneaud 37a967651c IB/qib: add missing braces in do_qib_user_sdma_queue_create()
Commit c804f07248 moved qib_assign_ctxt() to
do_qib_user_sdma_queue_create() but dropped the braces
around the statements.

This was spotted by coccicheck (coccinelle/spatch):

$ make C=2 CHECK=scripts/coccicheck drivers/infiniband/hw/qib/

  CHECK   drivers/infiniband/hw/qib/qib_file_ops.c
drivers/infiniband/hw/qib/qib_file_ops.c:1583:2-23: code aligned with following code on line 1587

This patch adds braces back.

Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: infinipath@intel.com
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: cocci@systeme.lip6.fr
Cc: stable@vger.kernel.org
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Mike Marciniszyn 7d7632add8 IB/qib: Modify software pma counters to use percpu variables
The counters, unicast_xmit, unicast_rcv, multicast_xmit, multicast_rcv
are now maintained as percpu variables.

The mad code is modified to add a z_ latch so that the percpu counters
monotonically increase with appropriate adjustments in the reset,
read logic to maintain the z_ latch.

This patch also corrects the fact the unitcast_xmit wasn't handled
at all for UC and RC QPs.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Mike Marciniszyn 1ed88dd7d0 IB/qib: Add percpu counter replacing qib_devdata int_counter
This patch replaces the dd->int_counter with a percpu counter.

The maintanance of qib_stats.sps_ints and int_counter are
combined into the new counter.

There are two new functions added to read the counter:
- qib_int_counter (for a particular qib_devdata)
- qib_sps_ints (for all HCAs)

A z_int_counter is added to allow the interrupt detection logic
to determine if interrupts have occured since z_int_counter
was "reset".

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Mike Marciniszyn f8b6c47a44 IB/qib: Fix debugfs ordering issue with multiple HCAs
The debugfs init code was incorrectly called before the idr mechanism
is used to get the unit number, so the dd->unit hasn't been
initialized.  This caused the unit relative directory creation to fail
after the first.

This patch moves the init for the debugfs stuff until after all of the
failures and after the unit number has been determined.

A bug in unwind code in qib_alloc_devdata() is also fixed.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Dennis Dalessandro a2cb0eb8a6 IB/ipath: Fix potential buffer overrun in sending diag packet routine
Guard against a potential buffer overrun.  The size to read from the
user is passed in, and due to the padding that needs to be taken into
account, as well as the place holder for the ICRC it is possible to
overflow the 32bit value which would cause more data to be copied from
user space than is allocated in the buffer.

Cc: <stable@vger.kernel.org>
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Dennis Dalessandro 1c20c81909 IB/qib: Fix potential buffer overrun in sending diag packet routine
Guard against a potential buffer overrun.  Right now the qib driver is
protected by the fact that the data structure in question is only 16
bits.  Should that ever change the problem will be exposed. There is a
similar defect in the ipath driver and this brings the two code paths
into sync.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Tatyana Nikolova 43adff3979 RDMA/nes: Fix for passing a valid QP pointer to the user space library
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 10:04:16 -07:00
Tatyana Nikolova 4ac79a7003 RDMA/nes: Fixes for IRD/ORD negotiation with MPA v2
Fixes to enable the negotiation of the supported IRD/ORD sizes with
the peer when exchanging MPA v2 messages in connection establishment.

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 10:03:17 -07:00
Steve Wise 05eb23893c cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes
The current logic suffers from a slow response time to disable user DB
usage, and also fails to avoid DB FIFO drops under heavy load. This commit
fixes these deficiencies and makes the avoidance logic more optimal.
This is done by more efficiently notifying the ULDs of potential DB
problems, and implements a smoother flow control algorithm in iw_cxgb4,
which is the ULD that puts the most load on the DB fifo.

Design:

cxgb4:

Direct ULD callback from the DB FULL/DROP interrupt handler.  This allows
the ULD to stop doing user DB writes as quickly as possible.

While user DB usage is disabled, the LLD will accumulate DB write events
for its queues.  Then once DB usage is reenabled, a single DB write is
done for each queue with its accumulated write count.  This reduces the
load put on the DB fifo when reenabling.

iw_cxgb4:

Instead of marking each qp to indicate DB writes are disabled, we create
a device-global status page that each user process maps.  This allows
iw_cxgb4 to only set this single bit to disable all DB writes for all
user QPs vs traversing the idr of all the active QPs.  If the libcxgb4
doesn't support this, then we fall back to the old approach of marking
each QP.  Thus we allow the new driver to work with an older libcxgb4.

When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
via the status page and transition the DB state to STOPPED.  As user
processes see that DB writes are disabled, they call into iw_cxgb4
to submit their DB write events.  Since the DB state is in STOPPED,
the QP trying to write gets enqueued on a new DB "flow control" list.
As subsequent DB writes are submitted for this flow controlled QP, the
amount of writes are accumulated for each QP on the flow control list.
So all the user QPs that are actively ringing the DB get put on this
list and the number of writes they request are accumulated.

When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
context, we change the DB state to FLOW_CONTROL, and begin resuming all
the QPs that are on the flow control list.  This logic runs on until
the flow control list is empty or we exit FLOW_CONTROL mode (due to
a DB DROP upcall, for example).  QPs are removed from this list, and
their accumulated DB write counts written to the DB FIFO.  Sets of QPs,
called chunks in the code, are removed at one time. The chunk size is 64.
So 64 QPs are resumed at a time, and before the next chunk is resumed, the
logic waits (blocks) for the DB FIFO to drain.  This prevents resuming to
quickly and overflowing the FIFO.  Once the flow control list is empty,
the db state transitions back to NORMAL and user QPs are again allowed
to write directly to the user DB register.

The algorithm is designed such that if the DB write load is high enough,
then all the DB writes get submitted by the kernel using this flow
controlled approach to avoid DB drops.  As the load lightens though, we
resume to normal DB writes directly by user applications.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:44:11 -04:00
Steve Wise 7a2cea2aaa cxgb4/iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice
Based on original work by Anand Priyadarshee <anandp@chelsio.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:44:11 -04:00
David S. Miller 85dcce7a73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	drivers/net/xen-netback/netback.c

Both the r8152 and netback conflicts were simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:31:55 -04:00
Nicholas Bellinger 5aad2145ac Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband into for-next 2014-03-13 12:01:58 -07:00
Jack Morgenstein aa9a2d51a3 mlx4: Activate RoCE/SRIOV
To activate RoCE/SRIOV, need to remove the following:
1. In mlx4_ib_add, need to remove the error return preventing
   initialization of a RoCE port under SRIOV.
2. In update_vport_qp_params (in resource_tracker.c) need to remove
   the error return when a RoCE RC or UD qp is detected.
   This error return causes the INIT-to-RTR qp transition to fail
   in the wrapper function under RoCE/SRIOV.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:16 -04:00
Shani Michaelli ceb5433b3a mlx4_ib: Fix SIDR support of for UD QPs under SRIOV/RoCE
* Handle CM_SIDR_REQ_ATTR_ID and CM_SIDR_REP_ATTR_ID
  in multiplex_cm_handler and demux_cm_handler.

* Handle Service ID Resolution messages and REQ messages
  separately, for their formats are different.

Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:16 -04:00
Jack Morgenstein 5ea8bbfc49 mlx4: Implement IP based gids support for RoCE/SRIOV
Since there is no connection between the MAC/VLAN and the GID
when using IP-based addressing, the proxy QP1 (running on the
slave) must pass the source-mac, destination-mac, and vlan_id
information separately from the GID. Additionally, the Host
must pass the remote source-mac and vlan_id back to the slave,

This is achieved as follows:
Outgoing MADs:
    1. Source MAC: obtained from the CQ completion structure
       (struct ib_wc, smac field).
    2. Destination MAC: obtained from the tunnel header
    3. vlan_id: obtained from the tunnel header.
Incoming MADs
    1. The source (i.e., remote) MAC and vlan_id are passed in
       the tunnel header to the proxy QP1.

VST mode support:
     For outgoing MADs,  the vlan_id obtained from the header is
        discarded, and the vlan_id specified by the Hypervisor is used
        instead.
     For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the
        "invalid" vlan (0xffff)  is substituted when forwarding to the slave.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:16 -04:00
Jack Morgenstein 2f5bb47368 mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.

To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.

This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.

Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.

The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified:  INIT-to-RTR, and SQD-to-SQD.

Similarly for the alternate path information.

Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.

For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
   MAC information to the operational information, and reset
   the candidate MAC info.
2. If there was a previous MAC, unregister it.  Then move
   the MAC information from candidate to operational, and
   reset the candidate info (as in 1. above).

The MAC address failure logic is the same for all cases:
 - Unregister the candidate MAC, and reset the candidate MAC info.

For Vlan registration, the logic is similar.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:15 -04:00
Jack Morgenstein b6ffaeffae mlx4: In RoCE allow guests to have multiple GIDS
The GIDs are statically distributed, as follows:
PF: gets 16 GIDs
VFs:  Remaining GIDS are divided evenly between VFs activated by the driver.
      If the division is not even, lower-numbered VFs get an extra GID.

For an IB interface, the number of gids per guest remains as before: one gid per guest.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:14 -04:00
Jack Morgenstein 6ee51a4e86 mlx4: Adjust QP1 multiplexing for RoCE/SRIOV
This requires the following modifications:
1. Fix build_mlx4_header to properly fill in the ETH fields
2. Adjust mux and demux QP1 flow to support RoCE.

This commit still assumes only one GID per slave for RoCE.
The commit enabling multiple GIDs is a subsequent commit, and
is done separately because of its complexity.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:12 -04:00
Linus Torvalds 66a523db70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "This series addresses a number of outstanding issues wrt to active I/O
  shutdown using iser-target.  This includes:

   - Fix a long standing tpg_state bug where a tpg could be referenced
     during explicit shutdown (v3.1+ stable)
   - Use list_del_init for iscsi_cmd->i_conn_node so list_empty checks
     work as expected (v3.10+ stable)
   - Fix a isert_conn->state related hung task bug + ensure outstanding
     I/O completes during session shutdown.  (v3.10+ stable)
   - Fix isert_conn->post_send_buf_count accounting for RDMA READ/WRITEs
     (v3.10+ stable)
   - Ignore FRWR completions during active I/O shutdown (v3.12+ stable)
   - Fix command leakage for interrupt coalescing during active I/O
     shutdown (v3.13+ stable)

  Also included is another DIF emulation fix from Sagi specific to
  v3.14-rc code"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix sbc_copy_prot for offset scatters
  iser-target: Fix command leak for tx_desc->comp_llnode_batch
  iser-target: Ignore completions for FRWRs in isert_cq_tx_work
  iser-target: Fix post_send_buf_count for RDMA READ/WRITE
  iscsi/iser-target: Fix isert_conn->state hung shutdown issues
  iscsi/iser-target: Use list_del_init for ->i_conn_node
  iscsi-target: Fix iscsit_get_tpg_from_np tpg_state bug
2014-03-09 13:50:14 -07:00
Sagi Grimberg 2dea909444 IB/mlx5: Expose support for signature MR feature
Currently support only T10-DIF types of signature handover operations
(types 1|2|3).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:40:04 -08:00
Sagi Grimberg d5436ba010 IB/mlx5: Collect signature error completion
This commit takes care of the generated signature error CQE generated
by the HW (if happened).  The underlying mlx5 driver will handle
signature error completions and will mark the relevant memory region
as dirty.

Once the consumer gets the completion for the transaction, it must
check for signature errors on signature memory region using a new
lightweight verb ib_check_mr_status().

In case the user doesn't check for signature error (i.e. doesn't call
ib_check_mr_status() with status check IB_MR_CHECK_SIG_STATUS), the
memory region cannot be used for another signature operation
(REG_SIG_MR work request will fail).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:40:04 -08:00
Sagi Grimberg e6631814fb IB/mlx5: Support IB_WR_REG_SIG_MR
This patch implements IB_WR_REG_SIG_MR posted by the user.

Baisically this WR involves 3 WQEs in order to prepare and properly
register the signature layout:

1. post UMR WR to register the sig_mr in one of two possible ways:
    * In case the user registered a single MR for data so the UMR data segment
      consists of:
      - single klm (data MR) passed by the user
      - BSF with signature attributes requested by the user.
    * In case the user registered 2 MRs, one for data and one for protection,
      the UMR consists of:
      - strided block format which includes data and protection MRs and
        their repetitive block format.
      - BSF with signature attributes requested by the user.

2. post SET_PSV in order to set the memory domain initial
   signature parameters passed by the user.
   SET_PSV is not signaled and solicited CQE.

3. post SET_PSV in order to set the wire domain initial
   signature parameters passed by the user.
   SET_PSV is not signaled and solicited CQE.

* After this compound WR we place a small fence for next WR to come.

This patch also introduces some helper functions to set the BSF correctly
and determining the signature format selectors.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:39:51 -08:00
Sagi Grimberg 2ac45934f8 IB/mlx5: Remove MTT access mode from umr flags helper function
get_umr_flags helper function might be used for types of access modes
other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM.  So remove it from
helper, and callers will add their own access mode flag.

This commit does not add/change functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg 6e5eadace1 IB/mlx5: Break up wqe handling into begin & finish routines
As a preliminary step for signature feature which will require posting
multiple (3) WQEs for a single WR, we break post_send routine WQE
indexing into begin and finish routines.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg e1e66cc264 IB/mlx5: Initialize mlx5_ib_qp signature-related members
If user requested signature enable we initialize relevant mlx5_ib_qp
members.  We mark the qp as sig_enable and we increase the effective
SQ size, but still limit the user max_send_wr to original size
computed.  We also allow the create_qp routine to accept sig_enable
create flag.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg 3121e3c441 mlx5: Implement create_mr and destroy_mr
Support create_mr and destroy_mr verbs.  Creating ib_mr may be done
for either ib_mr that will register regular page lists like
alloc_fast_reg_mr routine, or indirect ib_mrs that can register other
(pre-registered) ib_mrs in an indirect manner.

In addition user may request signature enable, that will mean that the
created ib_mr may be attached with signature attributes (BSF, PSVs).

Currently we only allow direct/indirect registration modes.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg 1b01d33560 IB/core: Introduce signature verbs API
Introduce a verbs interface for signature-related operations.  A
signature handover operation configures the layouts of data and
protection attributes both in memory and wire domains.

Signature operations are:

- INSERT:
  Generate and insert protection information when handing over
  data from input space to output space.
- validate and STRIP:
  Validate protection information and remove it when handing over
  data from input space to output space.
- validate and PASS:
  Validate protection information and pass it when handing over
  data from input space to output space.

Once the signature handover opration is done, the HCA will offload
data integrity generation/validation while performing the actual data
transfer.

Additions:

1. HCA signature capabilities in device attributes
    Verbs provider supporting signature handover operations fills
    relevant fields in device attributes structure returned by
    ib_query_device.

2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
    Creating a QP that will carry signature handover operations may
    require some special preparations from the verbs provider.  So we
    add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the
    created QP may carry out signature handover operations.  Expose
    signature support to verbs layer (no support for now).

3. New send work request IB_WR_REG_SIG_MR
    Signature handover work request. This WR will define the signature
    handover properties of the memory/wire domains as well as the
    domains layout. The purpose of this work request is to bind all
    the needed information for the signature operation:

    - data to be transferred:  wr->sg_list (ib_sge).
      * The raw data, pre-registered to a single MR (normally, before
        signature, this MR would have been used directly for the data
        transfer)
    - data protection guards: sig_handover.prot (ib_sge).
      * The data protection buffer, pre-registered to a single MR, which
        contains the data integrity guards of the raw data blocks.
        Note that it may not always exist, only in cases where the user is
        interested in storing protection guards in memory.
    - signature operation attributes: sig_handover.sig_attrs.
      * Tells the HCA how to validate/generate the protection information.

    Once the work request is executed, the memory region that will
    describe the signature transaction will be the sig_mr.  The
    application can now go ahead and send the sig_mr.rkey or use the
    sig_mr.lkey for data transfer.

4. New Verb ib_check_mr_status
    check_mr_status verb checks the status of the memory region post
    transaction.  The first check that may be used is
    IB_MR_CHECK_SIG_STATUS, which will indicate if any signature
    errors are pending for a specific signature-enabled ib_mr.  This
    verb is a lightwight check and is allowed to be taken from
    interrupt context.  An application must call this verb after it is
    known that the actual data transfer has finished.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg 17cd3a2db8 IB/core: Introduce protected memory regions
This commit introduces verbs for creating/destoying memory
regions which will allow new types of memory key operations such
as protected memory registration.

Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.

Protected Memory registration is registering a memory region
with various data integrity attributes which will describe protection
schemes that will be handled by the HCA in an offloaded manner.
These memory regions will be applicable for a new REG_SIG_MR
work request introduced later in this patchset.

In the future these routines may replace or implement current memory
regions creation routines existing today:
- ib_reg_user_mr
- ib_alloc_fast_reg_mr
- ib_get_dma_mr
- ib_dereg_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Nicholas Bellinger ebbe442183 iser-target: Fix command leak for tx_desc->comp_llnode_batch
This patch addresses a number of active I/O shutdown issues
related to isert_cmd descriptors being leaked that are part
of a completion interrupt coalescing batch.

This includes adding logic in isert_cq_tx_comp_err() to
drain any associated tx_desc->comp_llnode_batch, as well
as isert_cq_drain_comp_llist() to drain any associated
isert_conn->conn_comp_llist.

Also, set tx_desc->llnode_active in isert_init_send_wr()
in order to determine when work requests need to be skipped
in isert_cq_tx_work() exception path code.

Finally, update isert_init_send_wr() to only allow interrupt
coalescing when ISER_CONN_UP.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.13+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:10 -08:00
Nicholas Bellinger 9bb4ca68fc iser-target: Ignore completions for FRWRs in isert_cq_tx_work
This patch changes IB_WR_FAST_REG_MR + IB_WR_LOCAL_INV related
work requests to include a ISER_FRWR_LI_WRID value in order to
signal isert_cq_tx_work() that these requests should be ignored.

This is necessary because even though IB_SEND_SIGNALED is not
set for either work request, during a QP failure event the work
requests will be returned with exception status from the TX
completion queue.

v2 changes:
 - Rename ISER_FRWR_LI_WRID -> ISER_FASTREG_LI_WRID (Sagi)

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:10 -08:00
Nicholas Bellinger b6b87a1df6 iser-target: Fix post_send_buf_count for RDMA READ/WRITE
This patch fixes the incorrect setting of ->post_send_buf_count
related to RDMA WRITEs + READs where isert_rdma_rw->send_wr_num
was not being taken into account.

This includes incrementing ->post_send_buf_count within
isert_put_datain() + isert_get_dataout(), decrementing within
__isert_send_completion() + isert_response_completion(), and
clearing wr->send_wr_num within isert_completion_rdma_read()

This is necessary because even though IB_SEND_SIGNALED is
not set for RDMA WRITEs + READs, during a QP failure event
the work requests will be returned with exception status
from the TX completion queue.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Nicholas Bellinger defd884845 iscsi/iser-target: Fix isert_conn->state hung shutdown issues
This patch addresses a couple of different hug shutdown issues
related to wait_event() + isert_conn->state.  First, it changes
isert_conn->conn_wait + isert_conn->conn_wait_comp_err from
waitqueues to completions, and sets ISER_CONN_TERMINATING from
within isert_disconnect_work().

Second, it splits isert_free_conn() into isert_wait_conn() that
is called earlier in iscsit_close_connection() to ensure that
all outstanding commands have completed before continuing.

Finally, it breaks isert_cq_comp_err() into seperate TX / RX
related code, and adds logic in isert_cq_rx_comp_err() to wait
for outstanding commands to complete before setting ISER_CONN_DOWN
and calling complete(&isert_conn->conn_wait_comp_err).

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Nicholas Bellinger 5159d763f6 iscsi/iser-target: Use list_del_init for ->i_conn_node
There are a handful of uses of list_empty() for cmd->i_conn_node
within iser-target code that expect to return false once a cmd
has been removed from the per connect list.

This patch changes all uses of list_del -> list_del_init in order
to ensure that list_empty() returns false as expected.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Yishai Hadas eeb8461e36 IB: Refactor umem to use linear SG table
This patch refactors the IB core umem code and vendor drivers to use a
linear (chained) SG table instead of chunk list.  With this change the
relevant code becomes clearer—no need for nested loops to build and
use umem.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-04 10:34:28 -08:00
Amir Vadai 169a1d85d0 net,IB/mlx: Bump all Mellanox driver versions
Bump all Mellanox driver versions.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-25 17:34:44 -05:00
Linus Torvalds 946dd683af Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Mostly minor fixes this time to v3.14-rc1 related changes.  Also
  included is one fix for a free after use regression in persistent
  reservations UNREGISTER logic that is CC'ed to >= v3.11.y stable"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix protection copy routine
  IB/srpt: replace strict_strtoul() with kstrtoul()
  target: Simplify command completion by removing CMD_T_FAILED flag
  iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
  iscsi-target: Fix SNACK Type 1 + BegRun=0 handling
  target: Fix missing length check in spc_emulate_evpd_83()
  qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list
  target: Fix 32-bit + CONFIG_LBDAF=n link error w/ sector_div
  target: Fix free-after-use regression in PR unregister
2014-02-15 16:18:47 -08:00
Roland Dreier c9459388d8 Merge branches 'cma', 'cxgb4', 'iser', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'usnic' into for-next 2014-02-14 09:49:12 -08:00
Devesh Sharma 09de3f1313 RDMA/ocrdma: Fix load time panic during GID table init
We should use rdma_vlan_dev_real_dev() instead of using vlan_dev_real_dev()
when building the GID table for a vlan interface.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:49:04 -08:00
Devesh Sharma a61d93d92f RDMA/ocrdma: Fix traffic class shift
Use correct value for obtaining traffic class from device
response for Query QP request.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:49:00 -08:00
Dan Carpenter fd8b48b22a IB/iser: Fix use after free in iser_snd_completion()
We use "tx_desc" again after we free it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:48:09 -08:00
Roi Dayan 7d9eacf945 IB/iser: Avoid dereferencing iscsi_iser conn object when not bound to iser connection
Fix a possible NULL pointer dereference in disconnection flow. This
can happen if the target disconnected/rejected the connection request,
e.g before the binding stage between iscsi connection to the transport
connection.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:48:03 -08:00
Upinder Malhi f809309a25 IB/usnic: Fix smatch endianness error
Error reported at http://marc.info/?l=linux-rdma&m=138995755801039&w=2

Fix short to int cast for big endian systems.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:47:29 -08:00
Eli Cohen 0861565f50 IB/mlx5: Remove dependency on X86
Remove Kconfig dependency of mlx5_ib/mlx5_core on X86, since there is
no such dependency in reality.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 20:48:02 -08:00
Mike Marciniszyn 2f75e12c44 IB/qib: Add missing serdes init sequence
Research has shown that commit a77fcf8950 ("IB/qib: Use a single
txselect module parameter for serdes tuning") missed a key serdes init
sequence.

This patch add that sequence.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:52:44 -08:00
Kumar Sanghvi 0f0132001f RDMA/cxgb4: Add missing neigh_release in LE-Workaround path
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:46:40 -08:00
Moni Shoua b4a26a2728 IB: Report using RoCE IP based gids in port caps
For userspace RoCE UD QPs we need to know the GID format that the
kernel uses, e.g when working over older kernels. For that end, add a
new port capability IB_PORT_IP_BASED_GIDS and report it when query
port is issued.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:46:03 -08:00
Moni Shoua ad4885d279 IB/mlx4: Build the port IBoE GID table properly under bonding
When scanning netdevices we need to check a few more conditions and
cases to build the IBoE GID table properly.  For example, under
bonding we must make sure that when a port is down, the bond IP
address isn't programmed as a GID, since doing so will cause failure
with IB core flows that selects ports by GID.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:09 -08:00
Moni Shoua 5071456fe2 IB/mlx4: Do IBoE GID table resets per-port
The IBoE code used to reset the GID table did it for all Ethernet
ports of the device.  Since the whole architecture of generating GIDs
and responding to events is port-based, this is inefficient and can
lead to wrong content in the GID table.  Change the reset flow to be
per-port.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Moni Shoua ddf8bd3491 IB/mlx4: Do IBoE locking earlier when initializing the GID table
Updating the GID table under IBoE requires read/write from/to shared
data structures.  These data structures are protected with the device
iboe lock.  The flows that modify the GID table start from

    1. Initializing the GID table
    2. NETDEV events
    3. INET or INET6 events

This patch makes sure that the flow of initializing the GID table is
consistent with the other two flows w.r.t on what step the lock is taken.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Moni Shoua 4ce5a5744a IB/mlx4: Move rtnl locking to the right place
On the one hand, the invocation of netdev_master_upper_dev_get()
within mlx4_ib_scan_netdevs() must be done with rtnl lock held.  On
the other hand, it's wrong to call rtnl_lock() from within this
function since it's also called by our netdev notifier callback.
Therefore move the locking to mlx4_ib_add() so that both cases are
covered.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Moni Shoua acc4fccf4e IB/mlx4: Make sure GID index 0 is always occupied
Make sure that for Ethernet ports, the port GID table index 0 is always
occupied with a default GID of the relevant IPv6 link-local adderss.

This provides better user experience for legacy applications that don't use
the RDMA CM and were working on index 0 prior to the IP addressing change.

Also, as GIDs are generated from IP addresses of the network devices that
are associated with the port, it's basically possible that the GID table
will be empty if no IP address was assigned.  This doesn't comply with the
IB spec section 4.1.1 "GID usage and properties".

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Matan Barak 4196670be7 IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device
When the device has only Ethernet ports, don't try to allocate range
of steerable UD QPs since they aren't needed.  This fixes an issue
where mlx4 VFs tried to allocate a range of UD steerable QPs, but
failed to do so.

Fixes: c1c9850112 ("IB/mlx4: Add support for steerable IB UD QPs")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 09:00:18 -08:00
Jingoo Han 9d8abf4594 IB/srpt: replace strict_strtoul() with kstrtoul()
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12 15:14:40 -08:00
Nicholas Bellinger a80e21b3b2 iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
This patch fixes a memory leak for fr_desc upon failure of
isert_create_fr_desc() in isert_conn_create_fastreg_pool()
code.

As reported by Coverity 1166659:

*** CID 1166659:  Resource leak  (RESOURCE_LEAK)
/drivers/infiniband/ulp/isert/ib_isert.c: 470 in isert_conn_create_fastreg_pool()
464                      isert_conn, isert_conn->conn_fr_pool_size);
465
466             return 0;
467
468     err:
469             isert_conn_free_fastreg_pool(isert_conn);
>>>     CID 1166659:  Resource leak  (RESOURCE_LEAK)
>>>     Variable "fr_desc" going out of scope leaks the storage it points to.
470             return ret;
471     }
472
473     static int
474     isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
475     {

Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12 15:14:11 -08:00
Julia Lawall ab576627c8 RDMA/amso1100: Fix error return code
Set the return variable to an error code as done elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-12 11:11:46 -08:00
Julia Lawall d07875bd0d RDMA/nes: Fix error return code
Set the return variable to an error code as done elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-12 11:11:09 -08:00
Eli Cohen 1a4c3a3dc5 IB/mlx5: Don't set "block multicast loopback" capability
Currently Connect-IB does not support blocking multicast loopback, so
don't set IB_DEVICE_BLOCK_MULTICAST_LOOPBACK in the device caps.

Reported by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:09:48 -08:00
Eli Cohen 78c0f98cc9 IB/mlx5: Fix binary compatibility with libmlx5
Commit c1be5232d2 ("Fix micro UAR allocator") broke binary compatibility
between libmlx5 and mlx5_ib since it defines a different value to the number
of micro UARs per page, leading to wrong calculation in libmlx5. This patch
defines struct mlx5_ib_alloc_ucontext_req_v2 as an extension to struct
mlx5_ib_alloc_ucontext_req.  The extended size is determined in mlx5_ib_alloc_ucontext()
and in case of old library we use uuarn 0 which works fine -- this is
acheived due to create_user_qp() falling back from high to medium then to
low class where low class will return 0.  For new libraries we use the
more sophisticated allocation algorithm.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:00:48 -08:00
Eli Cohen 9e65dc371b IB/mlx5: Fix RC transport send queue overhead computation
Fix the RC QPs send queue overhead computation to take into account
two additional segments in the WQE which are needed for registration
operations.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:00:48 -08:00
Linus Torvalds 4e13c5d021 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

  - add support for SCSI Referrals (Hannes)
  - add support for T10 DIF into target core (nab + mkp)
  - add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab)
  - add support for T10 DIF -> bio_integrity passthrough in IBLOCK backend (nab)
  - prep changes to iser-target for >= v3.15 T10 DIF support (Sagi)
  - add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn)
  - allow percpu_ida_alloc() to receive task state bitmask (Kent)
  - fix >= v3.12 iscsi-target session reset hung task regression (nab)
  - fix >= v3.13 percpu_ref se_lun->lun_ref_active race (nab)
  - fix a long-standing network portal creation race (Andy)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  target: Fix percpu_ref_put race in transport_lun_remove_cmd
  target/iscsi: Fix network portal creation race
  target: Report bad sector in sense data for DIF errors
  iscsi-target: Convert gfp_t parameter to task state bitmask
  iscsi-target: Fix connection reset hang with percpu_ida_alloc
  percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
  iscsi-target: Pre-allocate more tags to avoid ack starvation
  qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport
  qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO.
  qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
  IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
  IB/isert: Move fastreg descriptor creation to a function
  IB/isert: Avoid frwr notation, user fastreg
  IB/isert: seperate connection protection domains and dma MRs
  tcm_loop: Enable DIF/DIX modes in SCSI host LLD
  target/rd: Add DIF protection into rd_execute_rw
  target/rd: Add support for protection SGL setup + release
  target/rd: Refactor rd_build_device_space + rd_release_device_space
  target/file: Add DIF protection support to fd_execute_rw
  target/file: Add DIF protection init/format support
  ...
2014-01-31 15:31:23 -08:00
Linus Torvalds 4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Nicholas Bellinger 676687c696 iscsi-target: Convert gfp_t parameter to task state bitmask
This patch propigates the use of task state bitmask now used by
percpu_ida_alloc() up the iscsi-target callchain, replacing the
use of GFP_ATOMIC for TASK_RUNNING, and GFP_KERNEL for
TASK_INTERRUPTIBLE.

Also, drop the unnecessary gfp_t parameter to isert_allocate_cmd(),
and just pass TASK_INTERRUPTIBLE into iscsit_allocate_cmd().

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-25 06:58:52 +00:00
Roland Dreier fb1b5034e4 Merge branch 'ip-roce' into for-next
Conflicts:
	drivers/infiniband/hw/mlx4/main.c
2014-01-22 23:24:21 -08:00
Roland Dreier 8f399921ea Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-01-22 23:24:13 -08:00
Eli Cohen 57761d8df8 IB/mlx5: Verify reserved fields are cleared
Verify that reserved fields in struct mlx5_ib_resize_cq are cleared
before continuing execution of the verb. This is required to allow
making use of this area in future revisions.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:54 -08:00
Eli Cohen 9e9c47d07d IB/mlx5: Allow creation of QPs with zero-length work queues
The current code attmepts to call ib_umem_get() even if the length is
zero, which causes a failure. Since the spec allows zero length work
queues, change the code so we don't call ib_umem_get() in those cases.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:53 -08:00
Eli Cohen bde51583f4 IB/mlx5: Add support for resize CQ
Implement resize CQ which is a mandatory verb in mlx5.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:50 -08:00
Eli Cohen 3bdb31f688 IB/mlx5: Implement modify CQ
Modify CQ is used by ULPs like IPoIB to change moderation parameters.  This
patch adds support in mlx5.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:49 -08:00
Eli Cohen ada388f7af IB/mlx5: Make sure doorbell record is visible before doorbell
Put a wmb() to make sure the doorbell record is visible to the HCA before we
hit doorbell.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:49 -08:00
Ding Tianhong 79adc5321e RDMA/nes: Slight optimization of Ethernet address compare
Use the possibly more efficient ether_addr_equal() instead of memcmp().

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:22:26 -08:00
Ira Weiny 6e0ea9e6cb IB/qib: Fix QP check when looping back to/from QP1
The GSI QP type is compatible with and should be allowed to send data
to/from any UD QP.  This was found when testing ibacm on the same node
as an SA.

Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:16:47 -08:00
Paul Bolle 298589b1cb RDMA/cxgb4: Fix gcc warning on 32-bit arch
Building mem.o for 32 bits x86 triggers a GCC warning:

    drivers/infiniband/hw/cxgb4/mem.c: In function '_c4iw_write_mem_dma_aligned':
    drivers/infiniband/hw/cxgb4/mem.c:79:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Silence that warning by casting "&wr_wait" to unsigned long before
casting it to __be64.  That's what _c4iw_write_mem_inline() already does.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:07:09 -08:00