Commit Graph

1018 Commits

Author SHA1 Message Date
Linus Torvalds e20d3ef540 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

   - Update vhost-scsi to support F_ANY_LAYOUT using mm/iov_iter.c
     logic, and signal VERSION_1 support (MST + Viro + nab)

   - Fix iscsi/iser-target to remove problematic active_ts_set usage
     (Gavin Guo)

   - Update iscsi/iser-target to support multi-sequence sendtargets
     (Sagi)

   - Fix original PR_APTPL_BUF_LEN 8k size limitation (Martin Svec)

   - Add missing WRITE_SAME end-of-device sanity check (Bart)

   - Check for LBA + sectors wrap-around in sbc_parse_cdb() (nab)

   - Other various minor SPC/SBC compliance fixes based upon Ronnie
     Sahlberg test suite (nab)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (32 commits)
  target: Set LBPWS10 bit in Logical Block Provisioning EVPD
  target: Fail UNMAP when emulate_tpu=0
  target: Fail WRITE_SAME w/ UNMAP=1 when emulate_tpws=0
  target: Add sanity checks for DPO/FUA bit usage
  target: Perform PROTECT sanity checks for WRITE_SAME
  target: Fail I/O with PROTECT bit when protection is unsupported
  target: Check for LBA + sectors wrap-around in sbc_parse_cdb
  target: Add missing WRITE_SAME end-of-device sanity check
  iscsi-target: Avoid IN_LOGOUT failure case for iser-target
  target: Fix PR_APTPL_BUF_LEN buffer size limitation
  iscsi-target: Drop problematic active_ts_list usage
  iscsi/iser-target: Support multi-sequence sendtargets text response
  iser-target: Remove duplicate function names
  vhost/scsi: potential memory corruption
  vhost/scsi: Global tcm_vhost -> vhost_scsi rename
  vhost/scsi: Drop left-over scsi_tcq.h include
  vhost/scsi: Set VIRTIO_F_ANY_LAYOUT + VIRTIO_F_VERSION_1 feature bits
  vhost/scsi: Add ANY_LAYOUT support in vhost_scsi_handle_vq
  vhost/scsi: Add ANY_LAYOUT iov -> sgl mapping prerequisites
  vhost/scsi: Change vhost_scsi_map_to_sgl to accept iov ptr + len
  ...
2015-02-21 13:21:19 -08:00
Ariel Nahum 9a3119e4b7 IB/iser: Release the iscsi endpoint if ep_disconnect wasn't called
In some cases, we might reach the iser connection termination without
ep_disconnect being invoked (for example if user-space daemon doesn't
exists. In this case, we need to free the iscsi endpoint when we
remove the iser connection.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-17 12:33:14 -08:00
Sagi Grimberg 6606e6a2ff IB/iser: Fix memory regions possible leak
When teardown process starts during live IO, we need to keep the
memory regions pool (frmr/fmr) until all in-flight tasks are properly
released, since each task may return a memory region to the pool. In
order to do this, we pass a destroy flag to iser_free_ib_conn_res to
indicate we can destroy the device and the memory regions
pool. iser_conn_release will pass it as true and also DEVICE_REMOVAL
event (we need to let the device to properly remove).

Also, Since we conditionally call iser_free_rx_descriptors,
remove the extra check on iser_conn->rx_descs.

Fixes: 5426b1711f ("IB/iser: Collapse cleanup and disconnect handlers")
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-17 12:32:46 -08:00
Roi Dayan c6c95ef4ce IB/iser: Use correct dma direction when unmapping SGs
We always unmap SGs with the same direction instead of unmapping
with the direction the mapping was done, fix that.

Fixes: 9a8b08fad2 ("IB/iser: Generalize iser_unmap_task_data and [...]")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-13 11:27:31 -08:00
Sagi Grimberg e4f4e8016e iscsi/iser-target: Support multi-sequence sendtargets text response
In case sendtargets response is larger than initiator MRDSL, we
send a partial sendtargets response (setting F=0, C=1, TTT!=0xffffffff),
accept a consecutive empty text message and send the rest of the payload.
In case we are done, we set F=1, C=0, TTT=0xffffffff.
We do that by storing the sendtargets response bytes done under
the session.

This patch also makes iscsit_find_cmd_from_itt public for isert.

(Re-add cmd->maxcmdsn_inc and clear in iscsit_build_text_rsp - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-02-12 11:24:29 -08:00
Rasmus Villemoes 11378cdbb6 iser-target: Remove duplicate function names
The macro isert_dbg already ensures that __func__ is part of the
output, so there's no reason to duplicate the function name in the
format string itself.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-02-05 22:45:41 -08:00
Sagi Grimberg b44a2b6790 iser-target: Fix wrong allocation in the case of an empty text message
if text message dlength is 0, don't allocate a buffer for it, pass
NULL to iscsit_process_text_cmd.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-02-04 10:55:34 -08:00
Sagi Grimberg 631af55062 iser-target: Use WQ_UNBOUND for completion workqueue
Bound workqueues might be too restrictive since they allow
only a single core per session for processing completions.
WQ_UNBOUND will allow bouncing to another CPU if the running
CPU is currently busy. Luckily, our workqueues are NUMA aware
and will first try to bounce within the same NUMA socket.
My measurements with NULL backend devices show that there is
no (noticeable) additional latency as a result of the change.
I'd expect even to gain performance when working with fast
devices that also allocate MSIX interrupt vectors.

While we're at it, make it WQ_HIGHPRI since processing
completions is really a high priority for performance.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reported-by: Moussa Ba <moussaba@micron.com>
Signed-off-by: Moussa Ba <moussaba@micron.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-02-04 10:55:03 -08:00
Roland Dreier 4143a9515d Revert "IPoIB: Consolidate rtnl_lock tasks in workqueue"
This reverts commit afe1de664e.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:39:36 -08:00
Roland Dreier c6a7ec7a0f Revert "IPoIB: Make the carrier_on_task race aware"
This reverts commit 67d7209e1f.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:39:29 -08:00
Roland Dreier e7a623d2df Revert "IPoIB: fix MCAST_FLAG_BUSY usage"
This reverts commit 016d9fb25c.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:39:20 -08:00
Roland Dreier 962121b4fc Revert "IPoIB: fix mcast_dev_flush/mcast_restart_task race"
This reverts commit e5d1dcf1b0.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:39:11 -08:00
Roland Dreier bb75963414 Revert "IPoIB: change init sequence ordering"
This reverts commit 3bcce487fd.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:39:02 -08:00
Roland Dreier 0306eda226 Revert "IPoIB: Use dedicated workqueues per interface"
This reverts commit 5141861cd5.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:38:55 -08:00
Roland Dreier 4e0ab200fa Revert "IPoIB: Make ipoib_mcast_stop_thread flush the workqueue"
This reverts commit bb42a6dd02.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:38:46 -08:00
Roland Dreier a84544a4fe Revert "IPoIB: No longer use flush as a parameter"
This reverts commit ce347ab90e.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:38:35 -08:00
Sagi Grimberg f64d2792dd iser-target: Fix typo in isert_put_text_rsp
We are sending text response and not reject.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-01-30 13:06:22 -08:00
Sagi Grimberg 45678b6b78 iser-target: Fix sparse warning
isert_debug_level should be static, hence no need
to initialize it.

Reported-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-01-30 13:06:22 -08:00
Nicholas Mc Guire ecc3f3edbf ib_srpt: wait_for_completion_timeout does not return negative status
This patch changes srpt_close_session() to properly use an unsigned long
for wait_for_completion_timeout()'s return value, and to update WARN_ON()
to only trigger for a zero return.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-01-20 15:48:17 -08:00
Linus Torvalds cdce6ac277 SCSI for-linus on 20141220
This is a much shorter set of patches that were on the go but didn't make it
 in to the early pull request for the merge window.  It's really a set of bug
 fixes plus some final cleanup work on the new tag queue API.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJUlaYEAAoJEDeqqVYsXL0MmXAH/2UUcE11p0KBHMR4cAn76xrG
 9093ZT9VZ4LH/Z7PbgwIWC4YHDqVpwA1+Trj1mh8PxiZz2SopWe27O2lQMRS5VUc
 MN28kbmK3L0jQj+OUez10Da6k0hU/KL8TlWT765MxFDKCaAuPZ4u541tyZEIGmLL
 olOQrn/fSlu+18QqqZ+D2rMaK7kGH6ZgbOadnRfYGkLjU4YeAMEC9L7UgnDxHiaN
 gZozoARkGeAnDJERVETRTtAiOXGRH8sGCpue0yYlhZXpAQ9cFUkS/hMqDWnaVC2S
 0x0w34RvbxSqO0gPT0K5XLoMiFyg04vnZ2xBVFBsANQTSEjQJO8USNAa4r74hf8=
 =D3eN
 -----END PGP SIGNATURE-----

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI update from James Bottomley:
 "This is a much shorter set of patches that were on the go but didn't
  make it in to the early pull request for the merge window.  It's
  really a set of bug fixes plus some final cleanup work on the new tag
  queue API"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  storvsc: ring buffer failures may result in I/O freeze
  ipr: set scsi_level correctly for disk arrays
  ipr: add support for async scanning to speed up boot
  scsi_debug: fix missing "break;" in SDEBUG_UA_CAPACITY_CHANGED case
  scsi_debug: take sdebug_host_list_lock when changing capacity
  scsi_debug: improve driver description in Kconfig
  scsi_debug: fix compare and write errors
  qla2xxx: fix race in handling rport deletion during recovery causes panic
  scsi: blacklist RSOC for Microsoft iSCSI target devices
  scsi: fix random memory corruption with scsi-mq + T10 PI
  Revert "[SCSI] mpt3sas: Remove phys on topology change"
  Revert "[SCSI] mpt2sas: Remove phys on topology change."
  esas2r: Correct typos of "validate" in a comment
  fc: FCP_PTA_SIMPLE is 0
  ibmvfc: remove unused tag variable
  scsi: remove MSG_*_TAG defines
  scsi: remove scsi_set_tag_type
  scsi: remove scsi_get_tag_type
  scsi: never drop to untagged mode during queue ramp down
  scsi: remove ->change_queue_type method
2014-12-20 13:42:57 -08:00
Linus Torvalds ed55635e2e Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "The highlights this merge window include:

   - Allow target fabric drivers to function as built-in.  (Roland)
   - Fix tcm_loop multi-TPG endpoint nexus bug.  (Hannes)
   - Move per device config_item_type into se_subsystem_api, allowing
     configfs attributes to be defined at module_init time.  (Jerome +
     nab)
   - Convert existing IBLOCK/FILEIO/RAMDISK/PSCSI/TCMU drivers to use
     external configfs attributes.  (nab)
   - A number of iser-target fixes related to active session + network
     portal shutdown stability during extended stress testing.  (Sagi +
     Slava)
   - Dynamic allocation of T10-PI contexts for iser-target, fixing a
     potentially bogus iscsi_np->tpg_np pointer reference in >= v3.14
     code.  (Sagi)
   - iser-target performance + scalability improvements.  (Sagi)
   - Fixes for SPC-4 Persistent Reservation AllRegistrants spec
     compliance.  (Ilias + James + nab)
   - Avoid potential short kern_sendmsg() in iscsi-target for now until
     Al's conversion to use msghdr iteration is merged post -rc1.
     (Viro)

  Also, Sagi has requested a number of iser-target patches (9) that
  address stability issues he's encountered during extended stress
  testing be considered for v3.10.y + v3.14.y code.  Given the amount of
  LOC involved, it will certainly require extra backporting effort.

  Apologies in advance to Greg-KH & Co on this.  Sagi and I will be
  working post-merge to ensure they each get applied correctly"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (53 commits)
  target: Allow AllRegistrants to re-RESERVE existing reservation
  uapi/linux/target_core_user.h: fix headers_install.sh badness
  iscsi-target: Fail connection on short sendmsg writes
  iscsi-target: nullify session in failed login sequence
  target: Avoid dropping AllRegistrants reservation during unregister
  target: Fix R_HOLDER bit usage for AllRegistrants
  iscsi-target: Drop left-over bogus iscsi_np->tpg_np
  iser-target: Fix wc->wr_id cast warning
  iser-target: Remove code duplication
  iser-target: Adjust log levels and prettify some prints
  iser-target: Use debug_level parameter to control logging level
  iser-target: Fix logout sequence
  iser-target: Don't wait for session commands from completion context
  iser-target: Reduce CQ lock contention by batch polling
  iser-target: Introduce isert_poll_budget
  iser-target: Remove an atomic operation from the IO path
  iser-target: Remove redundant call to isert_conn_terminate
  iser-target: Use single CQ for TX and RX
  iser-target: Centralize completion elements to a context
  iser-target: Cast wr_id with uintptr_t instead of unsinged long
  ...
2014-12-19 18:02:22 -08:00
James Bottomley e617457691 Merge remote-tracking branch 'scsi-queue/drivers-for-3.19' into for-linus 2014-12-18 05:56:29 -08:00
Roland Dreier a7cfef21e3 Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4', 'ocrdma', 'odp' and 'srp' into for-next 2014-12-15 18:19:20 -08:00
Sagi Grimberg 7dcf9c193b IB/srp: Allow newline separator for connection string
In case the last argument of the connection string is processed as a
string (destination GID for example).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:15:23 -08:00
Or Gerlitz 056da88f2e IB/iser: Bump version to 1.5
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:47 -08:00
Sagi Grimberg 5bb6e543d2 IB/iser: DIX update
Following few recent Block integrity updates, we align the iSER data
integrity offload settings with:

- Deprecate pi_guard module param
- Expose support for DIX type 0.
- Use scsi_transfer_length for the transfer length
- Get pi_interval, ref_tag, ref_remap, bg_type and
  check_mask setting from scsi_cmnd

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:46 -08:00
Sagi Grimberg 06c7fb6776 IB/iser: Micro-optimize iser_handle_wc
Use likely() for wc.status == IB_WC_SUCCESS

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:46 -08:00
Sagi Grimberg 60e20908c5 IB/iser: Micro-optimize iser logging
And fix a checkpatch warning.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:46 -08:00
Sagi Grimberg da64bdb25b IB/iser: Use more completion queues
No reason to settle with four, can use the min between device max comp
vectors and number of cores.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:45 -08:00
Sagi Grimberg 7e1fd4d1e3 IB/iser: Remove redundant is_mr indicator
It is enough to check mem_h pointer assignment, mem_h == NULL will
indicate that buffer is not registered using mr.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:45 -08:00
Sagi Grimberg a11b3e6935 IB/iser: Centralize memory region invalidation to a function
Eliminates code duplication.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:45 -08:00
Sagi Grimberg f0caef6d40 IB/iser: Terminate connection before cleaning inflight tasks
When closing the connection, we should first terminate the connection
(in case it was not previously terminated) to guarantee the QP is in
error state and we are done with servicing IO. Only then go ahead with
tasks cleanup via iscsi_conn_stop.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Sagi Grimberg 7414dde0a6 IB/iser: Fix race between iser connection teardown and scsi TMFs
In certain scenarios (target kill with live IO) scsi TMFs may race
with iser RDMA teardown, which might cause NULL dereference on iser IB
device handle (which might have been freed). In this case we take a
conditional lock for TMFs and check the connection state (avoid
introducing lock contention in the IO path). This is indeed best
effort approach, but sufficient to survive multi targets sudden death
while heavy IO is inflight.

While we are on it, add a nice kernel-doc style documentation.

Reported-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Ariel Nahum 3f562a0b8f IB/iser: Fix possible NULL derefernce ib_conn->device in session_create
If rdma_cm error event comes after ep_poll but before conn_bind, we
should protect against dereferncing the device (which may have been
terminated) in session_create and conn_create (already protected)
callbacks.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Sagi Grimberg 49df2781b1 IB/iser: Fix sparse warnings
Use uintptr_t to handle wr_id casting, which was found by Kbuild test
robot and smatch.  Also remove an internal definition of variable which
potentially shadows an external one (and make sparse happy).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Max Gurtovoy 6ec9d4d231 IB/iser: Fix possible SQ overflow
Fix a regression was introduced in commit 6df5a128f0 ("IB/iser:
Suppress scsi command send completions").

The sig_count was wrongly set to be static variable, thus it is
possible that we won't reach to (sig_count % ISER_SIGNAL_BATCH) == 0
condition (due to races) and the send queue will be overflowed.

Instead keep sig_count per connection. We don't need it to be atomic
as we are safe under the iscsi session frwd_lock taken by libiscsi on
the queuecommand path.

Fixes: 6df5a128f0 ("IB/iser: Suppress scsi command send completions")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Sagi Grimberg 93acb7bbc7 IB/iser: Decrement CQ's active QPs accounting when QP creation fails
When creating a connection QP we choose the least used CQ and inc the
number of active QPs on that. If we fail to create the QP, we need to
decrement the active QPs counter.

Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Ariel Nahum 5426b1711f IB/iser: Collapse cleanup and disconnect handlers
No real need to wait for TIMEWAIT_EXIT before we destroy the RDMA
resources (also TIMEAWAIT_EXIT is not guarenteed to always arrive).  As
for the cma_id, only destroy it if the state is not DOWN where in this
case, conn_release is already running and we don't want to compete.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Sagi Grimberg 16df2a26fb IB/iser: Fix catastrophic error flow hang
In case of the HCA going into catasrophic error flow, the
beacon post_send is likely to fail, so surely there will
be no completion for it.

In this case, use a best effort approach and don't wait for beacon
completion if we failed to post the send.

Reported-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Minh Tran f4641ef701 IB/iser: Re-adjust CQ and QP send ring sizes to HW limits
Re-adjust max CQEs per CQ and max send_wr per QP according
to the resource limits supported by underlying hardware.

Signed-off-by: Minh Tran <minhduc.tran@emulex.com>
Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Doug Ledford ce347ab90e IPoIB: No longer use flush as a parameter
Various places in the IPoIB code had a deadlock related to flushing
the ipoib workqueue.  Now that we have per device workqueues and a
specific flush workqueue, there is no longer a deadlock issue with
flushing the device specific workqueues and we can do so unilaterally.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:15 -08:00
Doug Ledford bb42a6dd02 IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
We used to pass a flush variable to mcast_stop_thread to indicate if
we should flush the workqueue or not.  This was due to some code
trying to flush a workqueue that it was currently running on which is
a no-no.  Now that we have per-device work queues, and now that
ipoib_mcast_restart_task has taken the fact that it is queued on a
single thread workqueue with all of the ipoib_mcast_join_task's and
therefore has no need to stop the join task while it runs, we can do
away with the flush parameter and unilaterally flush always.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:15 -08:00
Doug Ledford 5141861cd5 IPoIB: Use dedicated workqueues per interface
During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up.  It turns
out that this is because no matter how well we protect against races
on a single device, the fact that all devices use the same workqueue,
and flush_workqueue() flushes *everything* from that workqueue, we can
have one device in the middle of a down and holding the rtnl lock and
another totally unrelated device needing to run mcast_restart_task,
which wants the rtnl lock and will loop trying to take it unless is
sees its own FLAG_ADMIN_UP flag go away.  Because the unrelated
interface will never see its own ADMIN_UP flag drop, the interface
going down will deadlock trying to flush the queue.  There are several
possible solutions to this problem:

Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail.  This runs the
real risk of dropping work on the floor, which can end up being its
own separate kind of deadlock.

Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail.  Again, this can
drop work on the floor.  I suppose if our own ADMIN_UP flag doesn't go
away, then maybe after a few tries on the rtnl lock we can queue our
own task back up as a delayed work and return and avoid dropping work
on the floor that way.  But I'm not 100% convinced that we won't cause
other problems.

Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface.  In addition, keep the global
workqueue, but now limit it to only flush tasks.  In this way, the
flush tasks can always flush the device specific work queues without
having deadlock issues.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:15 -08:00
Doug Ledford 3bcce487fd IPoIB: change init sequence ordering
In preparation for using per device work queues, we need to move the
start of the neighbor thread task to after ipoib_ib_dev_init and move
the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
Otherwise we will end up freeing our workqueue with work possibly
still on it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:15 -08:00
Doug Ledford e5d1dcf1b0 IPoIB: fix mcast_dev_flush/mcast_restart_task race
Our mcast_dev_flush routine and our mcast_restart_task can race
against each other.  In particular, they both hold the priv->lock
while manipulating the rbtree and while removing mcast entries from
the multicast_list and while adding entries to the remove_list, but
they also both drop their locks prior to doing the actual removes.
The mcast_dev_flush routine is run entirely under the rtnl lock and so
has at least some locking.  The actual race condition is like this:

Thread 1                                Thread 2
ifconfig ib0 up
  start multicast join for broadcast
  multicast join completes for broadcast
  start to add more multicast joins
    call mcast_restart_task to add new entries
                                        ifconfig ib0 down
					  mcast_dev_flush
					    mcast_leave(mcast A)
    mcast_leave(mcast A)

As mcast_leave calls ib_sa_multicast_leave, and as member in
core/multicast.c is ref counted, we run into an unbalanced refcount
issue.  To avoid stomping on each others removes, take the rtnl lock
specifically when we are deleting the entries from the remove list.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:14 -08:00
Doug Ledford 016d9fb25c IPoIB: fix MCAST_FLAG_BUSY usage
Commit a9c8ba5884 ("IPoIB: Fix usage of uninitialized multicast
objects") added a new flag MCAST_JOIN_STARTED, but was not very strict
in how it was used.  We didn't always initialize the completion struct
before we set the flag, and we didn't always call complete on the
completion struct from all paths that complete it.  This made it less
than totally effective, and certainly made its use confusing.  And in
the flush function we would use the presence of this flag to signal
that we should wait on the completion struct, but we never cleared
this flag, ever.  This is further muddied by the fact that we overload
the MCAST_FLAG_BUSY flag to mean two different things: we have a join
in flight, and we have succeeded in getting an ib_sa_join_multicast.

In order to make things clearer and aid in resolving the rtnl deadlock
bug I've been chasing, I cleaned this up a bit.

 1) Remove the MCAST_JOIN_STARTED flag entirely
 2) Un-overload MCAST_FLAG_BUSY so it now only means a join is in-flight
 3) Test on mcast->mc directly to see if we have completed
    ib_sa_join_multicast (using IS_ERR_OR_NULL)
 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize
    the mcast->done completion struct
 5) Make sure that before calling complete(&mcast->done), we always clear
    the MCAST_FLAG_BUSY bit
 6) Take the mcast_mutex before we call ib_sa_multicast_join and also
    take the mutex in our join callback.  This forces
    ib_sa_multicast_join to return and set mcast->mc before we process
    the callback.  This way, our callback can safely clear mcast->mc
    if there is an error on the join and we will do the right thing as
    a result in mcast_dev_flush.
 7) Because we need the mutex to synchronize mcast->mc, we can no
    longer call mcast_sendonly_join directly from mcast_send and
    instead must add sendonly join processing to the mcast_join_task

A number of different races are resolved with these changes.  These
races existed with the old MCAST_FLAG_BUSY usage, the
MCAST_JOIN_STARTED flag was an attempt to address them, and while it
helped, a determined effort could still trip things up.

One race looks something like this:

Thread 1                             Thread 2
ib_sa_join_multicast (as part of running restart mcast task)
  alloc member
  call callback
                                     ifconfig ib0 down
				     wait_for_completion
    callback call completes
                                     wait_for_completion in
				     mcast_dev_flush completes
				       mcast->mc is PTR_ERR_OR_NULL
				       so we skip ib_sa_leave_multicast
    return from callback
  return from ib_sa_join_multicast
set mcast->mc = return from ib_sa_multicast

We now have a permanently unbalanced join/leave issue that trips up the
refcounting in core/multicast.c

Another like this:

Thread 1                   Thread 2         Thread 3
ib_sa_multicast_join
                                            ifconfig ib0 down
					    priv->broadcast = NULL
                           join_complete
			                    wait_for_completion
			   mcast->mc is not yet set, so don't clear
return from ib_sa_join_multicast and set mcast->mc
			   complete
			   return -EAGAIN (making mcast->mc invalid)
			   		    call ib_sa_multicast_leave
					    on invalid mcast->mc, hang
					    forever

By holding the mutex around ib_sa_multicast_join and taking the mutex
early in the callback, we force mcast->mc to be valid at the time we
run the callback.  This allows us to clear mcast->mc if there is an
error and the join is going to fail.  We do this before we complete
the mcast.  In this way, mcast_dev_flush always sees consistent state
in regards to mcast->mc membership at the time that the
wait_for_completion() returns.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:14 -08:00
Doug Ledford 67d7209e1f IPoIB: Make the carrier_on_task race aware
We blindly assume that we can just take the rtnl lock and that will
prevent races with downing this interface.  Unfortunately, that's not
the case.  In ipoib_mcast_stop_thread() we will call flush_workqueue()
in an attempt to clear out all remaining instances of ipoib_join_task.
But, since this task is put on the same workqueue as the join task,
the flush_workqueue waits on this thread too.  But this thread is
deadlocked on the rtnl lock.  The better thing here is to use trylock
and loop on that until we either get the lock or we see that
FLAG_ADMIN_UP has been cleared, in which case we don't need to do
anything anyway and we just return.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:14 -08:00
Doug Ledford afe1de664e IPoIB: Consolidate rtnl_lock tasks in workqueue
Setting the MTU can safely be moved to the carrier_on_task, which keeps
us from needing to take the rtnl lock in the join_finish section.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:14 -08:00
Nicholas Bellinger ed4520ae9b iser-target: Fix wc->wr_id cast warning
CC [M]  drivers/infiniband/ulp/isert/ib_isert.o
drivers/infiniband/ulp/isert/ib_isert.c: In function ‘isert_cq_comp_err’:
drivers/infiniband/ulp/isert/ib_isert.c:1979:42: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:35 -08:00
Sagi Grimberg 10633c37bf iser-target: Remove code duplication
- Fall-through in switch case instead in do_control_comp.
- Move rkey invalidation to a function.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:35 -08:00