Commit Graph

1180 Commits

Author SHA1 Message Date
Bart Van Assche 8de9fe3a1d IB/srp: Fix a connection setup race
Avoid that receiving a DREQ while RDMA channels are being
established causes target->qp_in_error to be reset.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.19
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18 13:35:55 -04:00
Bart Van Assche fb49c8bbaa IB/srp: Remove an extraneous scsi_host_put() from an error path
Fix a scsi_get_host() / scsi_host_put() imbalance in the error
path of srp_create_target(). See also patch "IB/srp: Avoid that
I/O hangs due to a cable pull during LUN scanning" (commit ID
34aa654ecb).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.19
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18 13:35:55 -04:00
Michael Wang 8e37ab68fe IB/Verbs: Reform IB-ulp ipoib
Use raw management helpers to reform IB-ulp ipoib.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18 13:35:04 -04:00
Bart Van Assche 8f71c1a27b IPoIB/CM: Fix indentation level
See also patch "IPoIB/cm: Add connected mode support for devices
without SRQs" (commit ID 68e995a295). Detected by smatch.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-05 13:21:27 -04:00
Linus Torvalds c6668726d2 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Lots of activity in target land the last months.

  The highlights include:

   - Convert fabric drivers tree-wide to target_register_template() (hch
     + bart)

   - iser-target hardening fixes + v1.0 improvements (sagi)

   - Convert iscsi_thread_set usage to kthread.h + kill
     iscsi_target_tq.c (sagi + nab)

   - Add support for T10-PI WRITE_STRIP + READ_INSERT operation (mkp +
     sagi + nab)

   - DIF fixes for CONFIG_DEBUG_SG=y + UNMAP file emulation (akinobu +
     sagi + mkp)

   - Extended TCMU ABI v2 for future BIDI + DIF support (andy + ilias)

   - Fix COMPARE_AND_WRITE handling for NO_ALLLOC drivers (hch + nab)

  Thanks to everyone who contributed this round with new features,
  bug-reports, fixes, cleanups and improvements.

  Looking forward, it's currently shaping up to be a busy v4.2 as well"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (69 commits)
  target: Put TCMU under a new config option
  target: Version 2 of TCMU ABI
  target: fix tcm_mod_builder.py
  target/file: Fix UNMAP with DIF protection support
  target/file: Fix SG table for prot_buf initialization
  target/file: Fix BUG() when CONFIG_DEBUG_SG=y and DIF protection enabled
  target: Make core_tmr_abort_task() skip TMFs
  target/sbc: Update sbc_dif_generate pr_debug output
  target/sbc: Make internal DIF emulation honor ->prot_checks
  target/sbc: Return INVALID_CDB_FIELD if DIF + sess_prot_type disabled
  target: Ensure sess_prot_type is saved across session restart
  target/rd: Don't pass incomplete scatterlist entries to sbc_dif_verify_*
  target: Remove the unused flag SCF_ACK_KREF
  target: Fix two sparse warnings
  target: Fix COMPARE_AND_WRITE with SG_TO_MEM_NOALLOC handling
  target: simplify the target template registration API
  target: simplify target_xcopy_init_pt_lun
  target: remove the unused SCF_CMD_XCOPY_PASSTHROUGH flag
  target/rd: reduce code duplication in rd_execute_rw()
  tcm_loop: fixup tpgt string to integer conversion
  ...
2015-04-24 10:22:09 -07:00
Linus Torvalds 7c034dfd58 InfiniBand/RDMA updates for 4.1:
- IPoIB fixes from Doug Ledford and Erez Shitrit
  - iSER updates from Sagi Grimberg
  - mlx4 GUID handling changes from Yishai Hadas
  - other misc fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJVN9SzAAoJEENa44ZhAt0hWq4QAJRFrwoe9ubTextSHeTU0FkY
 CydiQtGWrhyAHTX/KtdB1Uv9FzGHc6gqkAOXImouacYTM9ffypMF6Oj4xIYIMQtz
 MvNlNm07KOtQYlubiaZWcP5BjdLfMZjQxb03/9smygLTBjm80dAEt5X1znx7YrqI
 ZfE+ibPdvRqVEvFZKfT2U0kGU6oEVKrbJEiUCoJPwwcghDZQl18YmGOxt5qdI2uO
 V+71ozwozT8utSIl7S2YTJZBdkJ7tLrqrX2D/D2jUAmh1rqHIDrsXXiZ44UJj82i
 oXuwqmHXfq1LfuC9kxCX5JJpGeLE7E3OoxM1zIev31710zPA0v57rNKKweCi2Tj6
 Z36B0SIRV4ipWr/sBhVDr1Ffc/uap3DOIEU9Z+t8rwhELCEVuxmNaNb0K1e5nPiy
 YOQYp/ctC0NslM4mqQJLhGMVl6H8PjodbM1whnYZLsF1+8clNvdtLYzy/cA5fGbO
 tngUGXu0YZGdwvfuQhi5FB45XLaErJaPcMH0QRI5G0JgtjvbzXiMlqWtekTUBi7W
 DJNQlVRI4S1RYRBYkq709ymXiWwTeh3rhH+ZJpM+aY8b0NR/lx+dNyesNG+7GBJH
 y5UOOUck0w+JbQzZo264I6a5e8pXq3kMi3BH8pF4Jbo5WvxSF6uriXb6Q1JzfH20
 Jn0J6W9ghCSfrhMI1zgQ
 =v1jB
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA updates from Roland Dreier:

 - IPoIB fixes from Doug Ledford and Erez Shitrit

 - iSER updates from Sagi Grimberg

 - mlx4 GUID handling changes from Yishai Hadas

 - other misc fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (51 commits)
  mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures
  IB/iser: Rewrite bounce buffer code path
  IB/iser: Bump version to 1.6
  IB/iser: Remove code duplication for a single DMA entry
  IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr
  IB/iser: Modify struct iser_mem_reg members
  IB/iser: Make fastreg pool cache friendly
  IB/iser: Move PI context alloc/free to routines
  IB/iser: Move fastreg descriptor pool get/put to helper functions
  IB/iser: Merge build page-vec into register page-vec
  IB/iser: Get rid of struct iser_rdma_regd
  IB/iser: Remove redundant assignments in iser_reg_page_vec
  IB/iser: Move memory reg/dereg routines to iser_memory.c
  IB/iser: Don't pass ib_device to fall_to_bounce_buff routine
  IB/iser: Remove a redundant struct iser_data_buf
  IB/iser: Remove redundant cmd_data_len calculation
  IB/iser: Fix wrong calculation of protection buffer length
  IB/iser: Handle fastreg/local_inv completion errors
  IB/iser: Fix unload during ep_poll wrong dereference
  ib_srpt: convert printk's to pr_* functions
  ...
2015-04-22 11:50:05 -07:00
Erez Shitrit 2c15395974 IB/ipoib: Fix ndo_get_iflink
Currently, iflink of the parent interface was always accessed, even
when interface didn't have a parent and hence we crashed there.

Handle the interface types properly: for a child interface, return
the ifindex of the parent, for parent interface, return its ifindex.

For child devices, make sure to set the parent pointer prior to
invoking register_netdevice(), this allows the new ndo to be called
by the stack immediately after the child device is registered.

Fixes: 5aa7add8f1 ('infiniband/ipoib: implement ndo_get_iflink')
Reported-by: Honggang Li <honli@redhat.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Honggang Li <honli@redhat.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>+
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17 15:21:04 -04:00
Doug Ledford c1c2fef6cf Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 'srp' into for-4.1 2015-04-15 16:24:49 -04:00
Sagi Grimberg ba943fb237 IB/iser: Rewrite bounce buffer code path
In some rare cases, IO operations may be not aligned to page
boundaries. This prevents iser from performing fast memory
registration. In order to overcome that iser uses a bounce
buffer to carry the transaction. We basically allocate a buffer
in the size of the transaction and perform a copy.

The buffer allocation using kmalloc is too restrictive since it
requires higher order (atomic) allocations for large transactions
(which may result in memory exhaustion fairly fast for some workloads).
We rewrite the bounce buffer code path to allocate scattered pages
and perform a copy between the transaction sg and the bounce sg.

Reported-by: Alex Lyakas <alex@zadarastorage.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:14 -04:00
Sagi Grimberg 4fcd1470a0 IB/iser: Bump version to 1.6
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg ad1e567242 IB/iser: Remove code duplication for a single DMA entry
In singleton scatterlists, DMA memory registration code
is taken both for Fastreg and FMR code paths. Move it to
a function.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg 6ef8bb837d IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr
Instead of passing ib_sge as output variable, we pass the mem_reg
pointer to have the routines fill the rkey as well. This reduces
code duplication and extra assignments. This is a preparation step
to unify some registration logics together. Also, pass iser_fast_reg_mr
the fastreg descriptor directly.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg 90a6684c30 IB/iser: Modify struct iser_mem_reg members
No need to keep lkey, va, len variables, we can keep
them as struct ib_sge. This will help when we change the
memory registration logic.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg 8b95aa2c1b IB/iser: Make fastreg pool cache friendly
Memory regions are resources that are saved
in the device caches. Increase the probability for
a cache hit by adding the MRU descriptor to pool
head.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg 4dec2a27e3 IB/iser: Move PI context alloc/free to routines
Make iser_[create|destroy]_fastreg_desc shorter, more
readable and easily extendable.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg bd8b944eee IB/iser: Move fastreg descriptor pool get/put to helper functions
Instead of open-coding connection fastreg pool get/put,
we introduce iser_reg_desc[get|put] helpers.

We aren't setting these static as this will be a per-device
routine later on. Also, cleanup iser_unreg_rdma_mem_fastreg
a bit.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg f0e35c27a5 IB/iser: Merge build page-vec into register page-vec
No need for these two separate. Keep it in a single routine
like in the fastreg case. This will also make iser_reg_page_vec
closer to iser_fast_reg_mr arguments. This is a preparation
step for registration flow refactor.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg b130ededff IB/iser: Get rid of struct iser_rdma_regd
This struct members other than struct iser_mem_reg are unused,
so remove it altogether.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg 6847fdeb0b IB/iser: Remove redundant assignments in iser_reg_page_vec
Buffer length was assigned twice, and no reason to set va to
io_addr and then add the offset, just set va to io_addr + offset.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg d03e61d036 IB/iser: Move memory reg/dereg routines to iser_memory.c
As memory registration/de-registration methods, lets
move them to their natural location. While we're at it,
make iser_reg_page_vec routine static.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg 5640832590 IB/iser: Don't pass ib_device to fall_to_bounce_buff routine
No need to pass that, we can take it from the task.
In a later stage, this function will be invoked
according to a device capability.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg e3784bd1d9 IB/iser: Remove a redundant struct iser_data_buf
No need to keep two iser_data_buf structures just in case we use
mem copy. We can avoid that just by adding a pointer to the original
sg. So keep only two iser_data_buf per command (data and protection)
and pass the relevant data_buf to bounce buffer routine.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg ecc3993a2a IB/iser: Remove redundant cmd_data_len calculation
This code was added before we had protection data length
calculation (in iser_send_command), so we needed to calc
the sg data length from the sg itself. This is not needed
anymore.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg a065fe6aa2 IB/iser: Fix wrong calculation of protection buffer length
This length miss-calculation may cause a silent data corruption
in the DIX case and cause the device to reference unmapped area.

Fixes: d77e65350f ('libiscsi, iser: Adjust data_length to include protection information')
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg 30bf1d58ae IB/iser: Handle fastreg/local_inv completion errors
Fast registration and local invalidate work requests can
also fail. We should call error completion handler for them.

Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg c4de4663e0 IB/iser: Fix unload during ep_poll wrong dereference
In case the user unloaded ib_iser while ep_connect is in
progress, we need to destroy the endpoint although ep_disconnect
wasn't invoked (we detect this by the iser conn state != DOWN).
However, if we got an REJECTED/UNREACHABLE CM event we move the
connection state to DOWN which will prevent us from destroying
the endpoint in the module unload stage. Fix this by setting the
connection state to TERMINATING in iser_conn_error so we can still
destroy the endpoint at unload stage.

Reported-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Doug Ledford 9f5d32af09 ib_srpt: convert printk's to pr_* functions
The driver already defined the pr_format, it just hadn't
been converted to use pr_info, pr_warn, and pr_err instead
of the equivalent printks.  Convert so that messages from
the driver are now properly tagged with their driver name
and can be more easily debugged.

In addition, a number of these printk's were not newline
terminated, so fix that at the same time.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:54 -04:00
Bart Van Assche 56b5390caf IB/srp: Use P_Key cache for P_Key lookups
This change slightly reduces the time needed to log in.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: David Dillow <dave@thedillows.org>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:54 -04:00
Erez Shitrit 0e5544d9bf IB/ipoib: Remove IPOIB_MCAST_RUN bit
After Doug Ledford's changes there is no need in that bit, it's
semantic becomes subset of the IPOIB_FLAG_OPER_UP bit.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:19 -04:00
Erez Shitrit 1e85b806f9 IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's
Whenever there is no path->ah to the destination, keep only defined
number of skb's. Otherwise there are cases that the driver can keep
infinite list of skb's.

For example, when one device want to send unicast arp to the destination,
and from some reason the SM doesn't respond, the driver currently keeps
all the skb's. If that unicast arp traffic stopped, all  these skb's
are kept by the path object till the interface is down.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:19 -04:00
Erez Shitrit 2c01073095 IB/ipoib: Handle QP in SQE state
As the result of a completion error the QP can moved to SQE state by
the hardware. Since it's not the Error state, there are no flushes
and hence the driver doesn't know about that.

The fix creates a task that after completion with error which is not a
flush tracks the QP state and if it is in SQE state moves it back to RTS.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:19 -04:00
Erez Shitrit 3fd0605caa IB/ipoib: Update broadcast record values after each successful join request
Update the cached broadcast record in the priv object after every new
join of this broadcast domain group.

These values are needed for the port configuration (MTU size) and to
all the new multicast (non-broadcast) join requests initial parameters.

For example, SM starts with 2K MTU for all the fabric, and after that it
restarts (or handover to new SM) with new port configuration of 4K MTU.
Without using the new values, the driver will keep its old configuration
of 2K and will not apply the new configuration of 4K.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Erez Shitrit a44878d100 IB/ipoib: Use one linear skb in RX flow
The current code in the RX flow uses two sg entries for each incoming
packet, the first one was for the IB headers and the second for the rest
of the data, that causes two  dma map/unmap and two allocations, and few
more actions that were done at the data path.

Use only one linear skb on each incoming packet, for the data (IB
headers and payload), that reduces the packet processing in the
data-path (only one skb, no frags, the first frag was not used anyway,
less memory allocations) and the dma handling (only one dma map/unmap
over each incoming packet instead of two map/unmap per each incoming packet).

After commit 73d3fe6d1c ("gro: fix aggregation for skb using frag_list") from
Eric Dumazet, we will get full aggregation for large packets.

When running bandwidth tests before and after the (over the card's numa node),
using "netperf -H 1.1.1.3 -T -t TCP_STREAM", the results before are ~12Gbs before
and after ~16Gbs on my setup (Mellanox's ConnectX3).

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford 1c0453d64a IB/ipoib: drop mcast_mutex usage
We needed the mcast_mutex when we had to prevent the join completion
callback from having the value it stored in mcast->mc overwritten
by a delayed return from ib_sa_join_multicast.  By storing the return
of ib_sa_join_multicast in an intermediate variable, we prevent a
delayed return from ib_sa_join_multicast overwriting the valid
contents of mcast->mc, and we no longer need a mutex to force the
join callback to run after the return of ib_sa_join_multicast.  This
allows us to do away with the mutex entirely and protect our critical
sections with a just a spinlock instead.  This is highly desirable
as there were some places where we couldn't use a mutex because the
code was not allowed to sleep, and so we were currently using a mix
of mutex and spinlock to protect what we needed to protect.  Now we
only have a spin lock and the locking complexity is greatly reduced.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford d2fe937ce6 IB/ipoib: deserialize multicast joins
Allow the ipoib layer to attempt to join all outstanding multicast
groups at once.  The ib_sa layer will serialize multiple attempts to
join the same group, but will process attempts to join different groups
in parallel.  Take advantage of that.

In order to make this happen, change the mcast_join_thread to loop
through all needed joins, sending a join request for each one that we
still need to join.  There are a few special cases we handle though:

1) Don't attempt to join anything but the broadcast group until the join
of the broadcast group has succeeded.
2) No longer restart the join task at the end of completion handling.
If we completed successfully, we are done.  The join task now needs kicked
either by mcast_send or mcast_restart_task or mcast_start_thread, but
should not need started anytime else except when scheduling a backoff
attempt to rejoin.
3) No longer use separate join/completion routines for regular and
sendonly joins, pass them all through the same routine and just do the
right thing based on the SENDONLY join flag.
4) Only try to join a SENDONLY join twice, then drop the packets and
quit trying.  We leave the mcast group in the list so that if we get a
new packet, all that we have to do is queue up the packet and restart
the join task and it will automatically try to join twice and then
either send or flush the queue again.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford 69911416d8 IB/ipoib: fix MCAST_FLAG_BUSY usage
Commit a9c8ba5884 ("IPoIB: Fix usage of uninitialized multicast
objects") added a new flag MCAST_JOIN_STARTED, but was not very strict
in how it was used.  We didn't always initialize the completion struct
before we set the flag, and we didn't always call complete on the
completion struct from all paths that complete it.  And when we did
complete it, sometimes we continued to touch the mcast entry after
the completion, opening us up to possible use after free issues.

This made it less than totally effective, and certainly made its use
confusing.  And in the flush function we would use the presence of this
flag to signal that we should wait on the completion struct, but we never
cleared this flag, ever.

In order to make things clearer and aid in resolving the rtnl deadlock
bug I've been chasing, I cleaned this up a bit.

 1) Remove the MCAST_JOIN_STARTED flag entirely
 2) Change MCAST_FLAG_BUSY so it now only means a join is in-flight
 3) Test mcast->mc directly to see if we have completed
    ib_sa_join_multicast (using IS_ERR_OR_NULL)
 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize
    the mcast->done completion struct
 5) Make sure that before calling complete(&mcast->done), we always clear
    the MCAST_FLAG_BUSY bit
 6) Take the mcast_mutex before we call ib_sa_multicast_join and also
    take the mutex in our join callback.  This forces
    ib_sa_multicast_join to return and set mcast->mc before we process
    the callback.  This way, our callback can safely clear mcast->mc
    if there is an error on the join and we will do the right thing as
    a result in mcast_dev_flush.
 7) Because we need the mutex to synchronize mcast->mc, we can no
    longer call mcast_sendonly_join directly from mcast_send and
    instead must add sendonly join processing to the mcast_join_task
 8) Make MCAST_RUN mean that we have a working mcast subsystem, not that
    we have a running task.  We know when we need to reschedule our
    join task thread and don't need a flag to tell us.
 9) Add a helper for rescheduling the join task thread

A number of different races are resolved with these changes.  These
races existed with the old MCAST_FLAG_BUSY usage, the
MCAST_JOIN_STARTED flag was an attempt to address them, and while it
helped, a determined effort could still trip things up.

One race looks something like this:

Thread 1                             Thread 2
ib_sa_join_multicast (as part of running restart mcast task)
  alloc member
  call callback
                                     ifconfig ib0 down
				     wait_for_completion
    callback call completes
                                     wait_for_completion in
				     mcast_dev_flush completes
				       mcast->mc is PTR_ERR_OR_NULL
				       so we skip ib_sa_leave_multicast
    return from callback
  return from ib_sa_join_multicast
set mcast->mc = return from ib_sa_multicast

We now have a permanently unbalanced join/leave issue that trips up the
refcounting in core/multicast.c

Another like this:

Thread 1                   Thread 2         Thread 3
ib_sa_multicast_join
                                            ifconfig ib0 down
					    priv->broadcast = NULL
                           join_complete
			                    wait_for_completion
			   mcast->mc is not yet set, so don't clear
return from ib_sa_join_multicast and set mcast->mc
			   complete
			   return -EAGAIN (making mcast->mc invalid)
			   		    call ib_sa_multicast_leave
					    on invalid mcast->mc, hang
					    forever

By holding the mutex around ib_sa_multicast_join and taking the mutex
early in the callback, we force mcast->mc to be valid at the time we
run the callback.  This allows us to clear mcast->mc if there is an
error and the join is going to fail.  We do this before we complete
the mcast.  In this way, mcast_dev_flush always sees consistent state
in regards to mcast->mc membership at the time that the
wait_for_completion() returns.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford efc82eeeae IB/ipoib: No longer use flush as a parameter
Various places in the IPoIB code had a deadlock related to flushing
the ipoib workqueue.  Now that we have per device workqueues and a
specific flush workqueue, there is no longer a deadlock issue with
flushing the device specific workqueues and we can do so unilaterally.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford 0b39578bcd IB/ipoib: Use dedicated workqueues per interface
During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up.  It turns
out that this is because no matter how well we protect against races
on a single device, the fact that all devices use the same workqueue,
and flush_workqueue() flushes *everything* from that workqueue means
that we would also have to prevent all races between different devices
(for instance, ipoib_mcast_restart_task on interface ib0 can race with
ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on
the rtnl_lock).

There are several possible solutions to this problem:

Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail.  This runs the
real risk of dropping work on the floor, which can end up being its
own separate kind of deadlock.

Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail.  Again, this can
drop work on the floor.

Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface.  In addition, keep the global
workqueue, but now limit it to only flush tasks.  In this way, the
flush tasks can always flush the device specific work queues without
having deadlock issues.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford 894021a752 IB/ipoib: Make the carrier_on_task race aware
We blindly assume that we can just take the rtnl lock and that will
prevent races with downing this interface.  Unfortunately, that's not
the case.  In ipoib_mcast_stop_thread() we will call flush_workqueue()
in an attempt to clear out all remaining instances of ipoib_join_task.
But, since this task is put on the same workqueue as the join task,
the flush_workqueue waits on this thread too.  But this thread is
deadlocked on the rtnl lock.  The better thing here is to use trylock
and loop on that until we either get the lock or we see that
FLAG_OPER_UP has been cleared, in which case we don't need to do
anything anyway and we just return.

While investigating which flag should be used, FLAG_ADMIN_UP or
FLAG_OPER_UP, it was determined that FLAG_OPER_UP was the more
appropriate flag to use.  However, there was a mix of these two flags in
use in the existing code.  So while we check for that flag here as part
of this race fix, also cleanup the two places that had used the less
appropriate flag for their tests.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:17 -04:00
Doug Ledford c84ca6d2b1 IB/ipoib: Consolidate rtnl_lock tasks in workqueue
The ipoib_mcast_flush_dev routine is called with the rtnl_lock held and
needs to keep it held.  It also needs to call flush_workqueue() to flush
out any outstanding work.  In the past, we've had to try and make sure
that we didn't flush out any outstanding join completions because they
also wanted to grab rtnl_lock() and that would deadlock.  It turns out
that the only thing in the join completion handler that needs this lock
can be safely moved to our carrier_on_task, thereby reducing the
potential for the join completion code and the flush code to deadlock
against each other.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:17 -04:00
Doug Ledford be7aa663fc IB/ipoib: change init sequence ordering
In preparation for using per device work queues, we need to move the
start of the neighbor thread task to after ipoib_ib_dev_init and move
the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
Otherwise we will end up freeing our workqueue with work possibly
still on it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:17 -04:00
Doug Ledford e135106fac IB/ipoib: factor out ah flushing
Create a an ipoib_flush_ah and ipoib_stop_ah routines to use at
appropriate times to flush out all remaining ah entries before we shut
the device down.

Because neighbors and mcast entries can each have a reference on any
given ah, we must make sure to free all of those first before our ah
will actually have a 0 refcount and be able to be reaped.

This factoring is needed in preparation for having per-device work
queues.  The original per-device workqueue code resulted in the following
error message:

<ibdev>: ib_dealloc_pd failed

That error was tracked down to this issue.  With the changes to which
workqueues were flushed when, there were no flushes of the per device
workqueue after the last ah's were freed, resulting in an attempt to
dealloc the pd with outstanding resources still allocated.  This code
puts the explicit flushes in the needed places to avoid that problem.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:17 -04:00
Christoph Hellwig 9ac8928e6a target: simplify the target template registration API
Instead of calling target_fabric_configfs_init() +
target_fabric_configfs_register() / target_fabric_configfs_deregister()
target_fabric_configfs_free() from every target driver, rewrite the API
so that we have simple register/unregister functions that operate on
a const operations vector.

This patch also fixes a memory leak in several target drivers. Several
target drivers namely called target_fabric_configfs_deregister()
without calling target_fabric_configfs_free().

A large part of this patch is based on earlier changes from
Bart Van Assche <bart.vanassche@sandisk.com>.

(v2: Add a new TF_CIT_SETUP_DRV macro so that the core configfs code
can declare attributes as either core only or for drivers)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-14 12:28:41 -07:00
Sagi Grimberg 9e35eff449 iser-target: Bump version to 1.0
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:57 -07:00
Sagi Grimberg dac6ab305d iser-target: Remove conn_ prefix from struct isert_conn members
These variables are always accessed via struct isert_conn so
no need to have a "conn_" prefix for them.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:57 -07:00
Sagi Grimberg 992607e813 iser-target: Remove un-needed rdma_listen backlog
iser target can handle as many connect request as
the fabric sends to it. This backlog should not set as
a back-pressure mechanism (which is not very useful).

isert does need a back-pressure mechanism, but it should
be added in isert by monitoring the number of pending
established connections (will be added in a later stage).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:56 -07:00
Sagi Grimberg 57df81e3b1 iser-target: Remove redundant check on the device
In iser_connect_release there is no chance that
the iser device is set to NULL, if this happens
we have a BUG. So use BUG_ON.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:56 -07:00
Sagi Grimberg c6b8e9180d iser-target: Get rid of redundant max_accept
Not sure what it was used for, but there is
no real need for it now as I see it. Go ahead
and get rid of it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:55 -07:00
Sagi Grimberg ae9ea9ed38 iser-target: Split some logic in isert_connect_request to routines
Move login buffer alloc/free code to dedicated
routines and introduce isert_conn_init which
initializes the connection lists and locks.

Simplifies and cleans up the code a little bit.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:55 -07:00
Sagi Grimberg cf8ae95823 iser-target: Rename device find/release routines
isert_device_find_by_ib_dev and isert_device_try_release
can have a better, more common name like isert_device_[get|put].

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:54 -07:00
Sagi Grimberg 7748681bb8 iser-target: Rename rend/recv completion routines
Make receive/send completion handling routines symmetrical.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:54 -07:00
Sagi Grimberg fd8205e883 iser-target: Remove redundant assignment to local variable
No need to keep a local ib_dev as a device pointer.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:53 -07:00
Sagi Grimberg 172369c570 iser-target: Introduce isert_[alloc|free]_comps
Move the code for completion context handling to dedicated
routines. This simplifies the code and removes code duplication.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:52 -07:00
Sagi Grimberg 40fc069ad8 iser-target: Split isert_setup_qp
Simplify iser QP creation by splitting some unrelated
logic bulks to routines.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:52 -07:00
Sagi Grimberg 6700425eb0 iser-target: Remove redundant casting on void pointers
No need to cast void pointers.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:51 -07:00
Sagi Grimberg fb14027141 iser-target: Remove redundant local variable
No need for this assignment.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:51 -07:00
Sagi Grimberg b859203473 iser-target: Remove dead code
unmap_list is unused.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:50 -07:00
Sagi Grimberg e26e6ef703 iser-target: Remove redundant check on recv completion
We have a switch default for this.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:50 -07:00
Sagi Grimberg 67cb394925 iser-target: Use a single DMA MR and PD per device
This is to favor the HCA cache hit rate using less MRs
and PDs. This commit partially reverts commit:
"eb6ab13 IB/isert: separate connection protection domains and dma MRs"

At the time I thought this would be needed.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:49 -07:00
Sagi Grimberg 4a579da258 iser-target: Fix possible deadlock in RDMA_CM connection error
Before we reach to connection established we may get an
error event. In this case the core won't teardown this
connection (never established it), so we take care of freeing
it ourselves.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:49 -07:00
Sagi Grimberg 364189f0ad iser-target: Fix session hang in case of an rdma read DIF error
This hang was a result of a missing command put when
a DIF error occurred during a rdma read (and we sent
an CHECK_CONDITION error without passing it to the
backend).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-04-07 23:27:48 -07:00
Nicolas Dichtel 5aa7add8f1 infiniband/ipoib: implement ndo_get_iflink
Don't use dev->iflink anymore.

CC: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 14:05:01 -04:00
Linus Torvalds e20d3ef540 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

   - Update vhost-scsi to support F_ANY_LAYOUT using mm/iov_iter.c
     logic, and signal VERSION_1 support (MST + Viro + nab)

   - Fix iscsi/iser-target to remove problematic active_ts_set usage
     (Gavin Guo)

   - Update iscsi/iser-target to support multi-sequence sendtargets
     (Sagi)

   - Fix original PR_APTPL_BUF_LEN 8k size limitation (Martin Svec)

   - Add missing WRITE_SAME end-of-device sanity check (Bart)

   - Check for LBA + sectors wrap-around in sbc_parse_cdb() (nab)

   - Other various minor SPC/SBC compliance fixes based upon Ronnie
     Sahlberg test suite (nab)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (32 commits)
  target: Set LBPWS10 bit in Logical Block Provisioning EVPD
  target: Fail UNMAP when emulate_tpu=0
  target: Fail WRITE_SAME w/ UNMAP=1 when emulate_tpws=0
  target: Add sanity checks for DPO/FUA bit usage
  target: Perform PROTECT sanity checks for WRITE_SAME
  target: Fail I/O with PROTECT bit when protection is unsupported
  target: Check for LBA + sectors wrap-around in sbc_parse_cdb
  target: Add missing WRITE_SAME end-of-device sanity check
  iscsi-target: Avoid IN_LOGOUT failure case for iser-target
  target: Fix PR_APTPL_BUF_LEN buffer size limitation
  iscsi-target: Drop problematic active_ts_list usage
  iscsi/iser-target: Support multi-sequence sendtargets text response
  iser-target: Remove duplicate function names
  vhost/scsi: potential memory corruption
  vhost/scsi: Global tcm_vhost -> vhost_scsi rename
  vhost/scsi: Drop left-over scsi_tcq.h include
  vhost/scsi: Set VIRTIO_F_ANY_LAYOUT + VIRTIO_F_VERSION_1 feature bits
  vhost/scsi: Add ANY_LAYOUT support in vhost_scsi_handle_vq
  vhost/scsi: Add ANY_LAYOUT iov -> sgl mapping prerequisites
  vhost/scsi: Change vhost_scsi_map_to_sgl to accept iov ptr + len
  ...
2015-02-21 13:21:19 -08:00
Ariel Nahum 9a3119e4b7 IB/iser: Release the iscsi endpoint if ep_disconnect wasn't called
In some cases, we might reach the iser connection termination without
ep_disconnect being invoked (for example if user-space daemon doesn't
exists. In this case, we need to free the iscsi endpoint when we
remove the iser connection.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-17 12:33:14 -08:00
Sagi Grimberg 6606e6a2ff IB/iser: Fix memory regions possible leak
When teardown process starts during live IO, we need to keep the
memory regions pool (frmr/fmr) until all in-flight tasks are properly
released, since each task may return a memory region to the pool. In
order to do this, we pass a destroy flag to iser_free_ib_conn_res to
indicate we can destroy the device and the memory regions
pool. iser_conn_release will pass it as true and also DEVICE_REMOVAL
event (we need to let the device to properly remove).

Also, Since we conditionally call iser_free_rx_descriptors,
remove the extra check on iser_conn->rx_descs.

Fixes: 5426b1711f ("IB/iser: Collapse cleanup and disconnect handlers")
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-17 12:32:46 -08:00
Roi Dayan c6c95ef4ce IB/iser: Use correct dma direction when unmapping SGs
We always unmap SGs with the same direction instead of unmapping
with the direction the mapping was done, fix that.

Fixes: 9a8b08fad2 ("IB/iser: Generalize iser_unmap_task_data and [...]")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-13 11:27:31 -08:00
Sagi Grimberg e4f4e8016e iscsi/iser-target: Support multi-sequence sendtargets text response
In case sendtargets response is larger than initiator MRDSL, we
send a partial sendtargets response (setting F=0, C=1, TTT!=0xffffffff),
accept a consecutive empty text message and send the rest of the payload.
In case we are done, we set F=1, C=0, TTT=0xffffffff.
We do that by storing the sendtargets response bytes done under
the session.

This patch also makes iscsit_find_cmd_from_itt public for isert.

(Re-add cmd->maxcmdsn_inc and clear in iscsit_build_text_rsp - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-02-12 11:24:29 -08:00
Rasmus Villemoes 11378cdbb6 iser-target: Remove duplicate function names
The macro isert_dbg already ensures that __func__ is part of the
output, so there's no reason to duplicate the function name in the
format string itself.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-02-05 22:45:41 -08:00
Sagi Grimberg b44a2b6790 iser-target: Fix wrong allocation in the case of an empty text message
if text message dlength is 0, don't allocate a buffer for it, pass
NULL to iscsit_process_text_cmd.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-02-04 10:55:34 -08:00
Sagi Grimberg 631af55062 iser-target: Use WQ_UNBOUND for completion workqueue
Bound workqueues might be too restrictive since they allow
only a single core per session for processing completions.
WQ_UNBOUND will allow bouncing to another CPU if the running
CPU is currently busy. Luckily, our workqueues are NUMA aware
and will first try to bounce within the same NUMA socket.
My measurements with NULL backend devices show that there is
no (noticeable) additional latency as a result of the change.
I'd expect even to gain performance when working with fast
devices that also allocate MSIX interrupt vectors.

While we're at it, make it WQ_HIGHPRI since processing
completions is really a high priority for performance.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reported-by: Moussa Ba <moussaba@micron.com>
Signed-off-by: Moussa Ba <moussaba@micron.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-02-04 10:55:03 -08:00
Roland Dreier 4143a9515d Revert "IPoIB: Consolidate rtnl_lock tasks in workqueue"
This reverts commit afe1de664e.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:39:36 -08:00
Roland Dreier c6a7ec7a0f Revert "IPoIB: Make the carrier_on_task race aware"
This reverts commit 67d7209e1f.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:39:29 -08:00
Roland Dreier e7a623d2df Revert "IPoIB: fix MCAST_FLAG_BUSY usage"
This reverts commit 016d9fb25c.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:39:20 -08:00
Roland Dreier 962121b4fc Revert "IPoIB: fix mcast_dev_flush/mcast_restart_task race"
This reverts commit e5d1dcf1b0.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:39:11 -08:00
Roland Dreier bb75963414 Revert "IPoIB: change init sequence ordering"
This reverts commit 3bcce487fd.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:39:02 -08:00
Roland Dreier 0306eda226 Revert "IPoIB: Use dedicated workqueues per interface"
This reverts commit 5141861cd5.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:38:55 -08:00
Roland Dreier 4e0ab200fa Revert "IPoIB: Make ipoib_mcast_stop_thread flush the workqueue"
This reverts commit bb42a6dd02.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:38:46 -08:00
Roland Dreier a84544a4fe Revert "IPoIB: No longer use flush as a parameter"
This reverts commit ce347ab90e.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-01-30 15:38:35 -08:00
Sagi Grimberg f64d2792dd iser-target: Fix typo in isert_put_text_rsp
We are sending text response and not reject.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-01-30 13:06:22 -08:00
Sagi Grimberg 45678b6b78 iser-target: Fix sparse warning
isert_debug_level should be static, hence no need
to initialize it.

Reported-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-01-30 13:06:22 -08:00
Nicholas Mc Guire ecc3f3edbf ib_srpt: wait_for_completion_timeout does not return negative status
This patch changes srpt_close_session() to properly use an unsigned long
for wait_for_completion_timeout()'s return value, and to update WARN_ON()
to only trigger for a zero return.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-01-20 15:48:17 -08:00
Linus Torvalds cdce6ac277 SCSI for-linus on 20141220
This is a much shorter set of patches that were on the go but didn't make it
 in to the early pull request for the merge window.  It's really a set of bug
 fixes plus some final cleanup work on the new tag queue API.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJUlaYEAAoJEDeqqVYsXL0MmXAH/2UUcE11p0KBHMR4cAn76xrG
 9093ZT9VZ4LH/Z7PbgwIWC4YHDqVpwA1+Trj1mh8PxiZz2SopWe27O2lQMRS5VUc
 MN28kbmK3L0jQj+OUez10Da6k0hU/KL8TlWT765MxFDKCaAuPZ4u541tyZEIGmLL
 olOQrn/fSlu+18QqqZ+D2rMaK7kGH6ZgbOadnRfYGkLjU4YeAMEC9L7UgnDxHiaN
 gZozoARkGeAnDJERVETRTtAiOXGRH8sGCpue0yYlhZXpAQ9cFUkS/hMqDWnaVC2S
 0x0w34RvbxSqO0gPT0K5XLoMiFyg04vnZ2xBVFBsANQTSEjQJO8USNAa4r74hf8=
 =D3eN
 -----END PGP SIGNATURE-----

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI update from James Bottomley:
 "This is a much shorter set of patches that were on the go but didn't
  make it in to the early pull request for the merge window.  It's
  really a set of bug fixes plus some final cleanup work on the new tag
  queue API"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  storvsc: ring buffer failures may result in I/O freeze
  ipr: set scsi_level correctly for disk arrays
  ipr: add support for async scanning to speed up boot
  scsi_debug: fix missing "break;" in SDEBUG_UA_CAPACITY_CHANGED case
  scsi_debug: take sdebug_host_list_lock when changing capacity
  scsi_debug: improve driver description in Kconfig
  scsi_debug: fix compare and write errors
  qla2xxx: fix race in handling rport deletion during recovery causes panic
  scsi: blacklist RSOC for Microsoft iSCSI target devices
  scsi: fix random memory corruption with scsi-mq + T10 PI
  Revert "[SCSI] mpt3sas: Remove phys on topology change"
  Revert "[SCSI] mpt2sas: Remove phys on topology change."
  esas2r: Correct typos of "validate" in a comment
  fc: FCP_PTA_SIMPLE is 0
  ibmvfc: remove unused tag variable
  scsi: remove MSG_*_TAG defines
  scsi: remove scsi_set_tag_type
  scsi: remove scsi_get_tag_type
  scsi: never drop to untagged mode during queue ramp down
  scsi: remove ->change_queue_type method
2014-12-20 13:42:57 -08:00
Linus Torvalds ed55635e2e Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "The highlights this merge window include:

   - Allow target fabric drivers to function as built-in.  (Roland)
   - Fix tcm_loop multi-TPG endpoint nexus bug.  (Hannes)
   - Move per device config_item_type into se_subsystem_api, allowing
     configfs attributes to be defined at module_init time.  (Jerome +
     nab)
   - Convert existing IBLOCK/FILEIO/RAMDISK/PSCSI/TCMU drivers to use
     external configfs attributes.  (nab)
   - A number of iser-target fixes related to active session + network
     portal shutdown stability during extended stress testing.  (Sagi +
     Slava)
   - Dynamic allocation of T10-PI contexts for iser-target, fixing a
     potentially bogus iscsi_np->tpg_np pointer reference in >= v3.14
     code.  (Sagi)
   - iser-target performance + scalability improvements.  (Sagi)
   - Fixes for SPC-4 Persistent Reservation AllRegistrants spec
     compliance.  (Ilias + James + nab)
   - Avoid potential short kern_sendmsg() in iscsi-target for now until
     Al's conversion to use msghdr iteration is merged post -rc1.
     (Viro)

  Also, Sagi has requested a number of iser-target patches (9) that
  address stability issues he's encountered during extended stress
  testing be considered for v3.10.y + v3.14.y code.  Given the amount of
  LOC involved, it will certainly require extra backporting effort.

  Apologies in advance to Greg-KH & Co on this.  Sagi and I will be
  working post-merge to ensure they each get applied correctly"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (53 commits)
  target: Allow AllRegistrants to re-RESERVE existing reservation
  uapi/linux/target_core_user.h: fix headers_install.sh badness
  iscsi-target: Fail connection on short sendmsg writes
  iscsi-target: nullify session in failed login sequence
  target: Avoid dropping AllRegistrants reservation during unregister
  target: Fix R_HOLDER bit usage for AllRegistrants
  iscsi-target: Drop left-over bogus iscsi_np->tpg_np
  iser-target: Fix wc->wr_id cast warning
  iser-target: Remove code duplication
  iser-target: Adjust log levels and prettify some prints
  iser-target: Use debug_level parameter to control logging level
  iser-target: Fix logout sequence
  iser-target: Don't wait for session commands from completion context
  iser-target: Reduce CQ lock contention by batch polling
  iser-target: Introduce isert_poll_budget
  iser-target: Remove an atomic operation from the IO path
  iser-target: Remove redundant call to isert_conn_terminate
  iser-target: Use single CQ for TX and RX
  iser-target: Centralize completion elements to a context
  iser-target: Cast wr_id with uintptr_t instead of unsinged long
  ...
2014-12-19 18:02:22 -08:00
James Bottomley e617457691 Merge remote-tracking branch 'scsi-queue/drivers-for-3.19' into for-linus 2014-12-18 05:56:29 -08:00
Roland Dreier a7cfef21e3 Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4', 'ocrdma', 'odp' and 'srp' into for-next 2014-12-15 18:19:20 -08:00
Sagi Grimberg 7dcf9c193b IB/srp: Allow newline separator for connection string
In case the last argument of the connection string is processed as a
string (destination GID for example).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:15:23 -08:00
Or Gerlitz 056da88f2e IB/iser: Bump version to 1.5
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:47 -08:00
Sagi Grimberg 5bb6e543d2 IB/iser: DIX update
Following few recent Block integrity updates, we align the iSER data
integrity offload settings with:

- Deprecate pi_guard module param
- Expose support for DIX type 0.
- Use scsi_transfer_length for the transfer length
- Get pi_interval, ref_tag, ref_remap, bg_type and
  check_mask setting from scsi_cmnd

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:46 -08:00
Sagi Grimberg 06c7fb6776 IB/iser: Micro-optimize iser_handle_wc
Use likely() for wc.status == IB_WC_SUCCESS

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:46 -08:00
Sagi Grimberg 60e20908c5 IB/iser: Micro-optimize iser logging
And fix a checkpatch warning.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:46 -08:00
Sagi Grimberg da64bdb25b IB/iser: Use more completion queues
No reason to settle with four, can use the min between device max comp
vectors and number of cores.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:45 -08:00
Sagi Grimberg 7e1fd4d1e3 IB/iser: Remove redundant is_mr indicator
It is enough to check mem_h pointer assignment, mem_h == NULL will
indicate that buffer is not registered using mr.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:45 -08:00
Sagi Grimberg a11b3e6935 IB/iser: Centralize memory region invalidation to a function
Eliminates code duplication.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:45 -08:00
Sagi Grimberg f0caef6d40 IB/iser: Terminate connection before cleaning inflight tasks
When closing the connection, we should first terminate the connection
(in case it was not previously terminated) to guarantee the QP is in
error state and we are done with servicing IO. Only then go ahead with
tasks cleanup via iscsi_conn_stop.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Sagi Grimberg 7414dde0a6 IB/iser: Fix race between iser connection teardown and scsi TMFs
In certain scenarios (target kill with live IO) scsi TMFs may race
with iser RDMA teardown, which might cause NULL dereference on iser IB
device handle (which might have been freed). In this case we take a
conditional lock for TMFs and check the connection state (avoid
introducing lock contention in the IO path). This is indeed best
effort approach, but sufficient to survive multi targets sudden death
while heavy IO is inflight.

While we are on it, add a nice kernel-doc style documentation.

Reported-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Ariel Nahum 3f562a0b8f IB/iser: Fix possible NULL derefernce ib_conn->device in session_create
If rdma_cm error event comes after ep_poll but before conn_bind, we
should protect against dereferncing the device (which may have been
terminated) in session_create and conn_create (already protected)
callbacks.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Sagi Grimberg 49df2781b1 IB/iser: Fix sparse warnings
Use uintptr_t to handle wr_id casting, which was found by Kbuild test
robot and smatch.  Also remove an internal definition of variable which
potentially shadows an external one (and make sparse happy).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Max Gurtovoy 6ec9d4d231 IB/iser: Fix possible SQ overflow
Fix a regression was introduced in commit 6df5a128f0 ("IB/iser:
Suppress scsi command send completions").

The sig_count was wrongly set to be static variable, thus it is
possible that we won't reach to (sig_count % ISER_SIGNAL_BATCH) == 0
condition (due to races) and the send queue will be overflowed.

Instead keep sig_count per connection. We don't need it to be atomic
as we are safe under the iscsi session frwd_lock taken by libiscsi on
the queuecommand path.

Fixes: 6df5a128f0 ("IB/iser: Suppress scsi command send completions")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Sagi Grimberg 93acb7bbc7 IB/iser: Decrement CQ's active QPs accounting when QP creation fails
When creating a connection QP we choose the least used CQ and inc the
number of active QPs on that. If we fail to create the QP, we need to
decrement the active QPs counter.

Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Ariel Nahum 5426b1711f IB/iser: Collapse cleanup and disconnect handlers
No real need to wait for TIMEWAIT_EXIT before we destroy the RDMA
resources (also TIMEAWAIT_EXIT is not guarenteed to always arrive).  As
for the cma_id, only destroy it if the state is not DOWN where in this
case, conn_release is already running and we don't want to compete.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Sagi Grimberg 16df2a26fb IB/iser: Fix catastrophic error flow hang
In case of the HCA going into catasrophic error flow, the
beacon post_send is likely to fail, so surely there will
be no completion for it.

In this case, use a best effort approach and don't wait for beacon
completion if we failed to post the send.

Reported-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Minh Tran f4641ef701 IB/iser: Re-adjust CQ and QP send ring sizes to HW limits
Re-adjust max CQEs per CQ and max send_wr per QP according
to the resource limits supported by underlying hardware.

Signed-off-by: Minh Tran <minhduc.tran@emulex.com>
Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:44 -08:00
Doug Ledford ce347ab90e IPoIB: No longer use flush as a parameter
Various places in the IPoIB code had a deadlock related to flushing
the ipoib workqueue.  Now that we have per device workqueues and a
specific flush workqueue, there is no longer a deadlock issue with
flushing the device specific workqueues and we can do so unilaterally.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:15 -08:00
Doug Ledford bb42a6dd02 IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
We used to pass a flush variable to mcast_stop_thread to indicate if
we should flush the workqueue or not.  This was due to some code
trying to flush a workqueue that it was currently running on which is
a no-no.  Now that we have per-device work queues, and now that
ipoib_mcast_restart_task has taken the fact that it is queued on a
single thread workqueue with all of the ipoib_mcast_join_task's and
therefore has no need to stop the join task while it runs, we can do
away with the flush parameter and unilaterally flush always.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:15 -08:00
Doug Ledford 5141861cd5 IPoIB: Use dedicated workqueues per interface
During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up.  It turns
out that this is because no matter how well we protect against races
on a single device, the fact that all devices use the same workqueue,
and flush_workqueue() flushes *everything* from that workqueue, we can
have one device in the middle of a down and holding the rtnl lock and
another totally unrelated device needing to run mcast_restart_task,
which wants the rtnl lock and will loop trying to take it unless is
sees its own FLAG_ADMIN_UP flag go away.  Because the unrelated
interface will never see its own ADMIN_UP flag drop, the interface
going down will deadlock trying to flush the queue.  There are several
possible solutions to this problem:

Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail.  This runs the
real risk of dropping work on the floor, which can end up being its
own separate kind of deadlock.

Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail.  Again, this can
drop work on the floor.  I suppose if our own ADMIN_UP flag doesn't go
away, then maybe after a few tries on the rtnl lock we can queue our
own task back up as a delayed work and return and avoid dropping work
on the floor that way.  But I'm not 100% convinced that we won't cause
other problems.

Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface.  In addition, keep the global
workqueue, but now limit it to only flush tasks.  In this way, the
flush tasks can always flush the device specific work queues without
having deadlock issues.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:15 -08:00
Doug Ledford 3bcce487fd IPoIB: change init sequence ordering
In preparation for using per device work queues, we need to move the
start of the neighbor thread task to after ipoib_ib_dev_init and move
the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
Otherwise we will end up freeing our workqueue with work possibly
still on it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:15 -08:00
Doug Ledford e5d1dcf1b0 IPoIB: fix mcast_dev_flush/mcast_restart_task race
Our mcast_dev_flush routine and our mcast_restart_task can race
against each other.  In particular, they both hold the priv->lock
while manipulating the rbtree and while removing mcast entries from
the multicast_list and while adding entries to the remove_list, but
they also both drop their locks prior to doing the actual removes.
The mcast_dev_flush routine is run entirely under the rtnl lock and so
has at least some locking.  The actual race condition is like this:

Thread 1                                Thread 2
ifconfig ib0 up
  start multicast join for broadcast
  multicast join completes for broadcast
  start to add more multicast joins
    call mcast_restart_task to add new entries
                                        ifconfig ib0 down
					  mcast_dev_flush
					    mcast_leave(mcast A)
    mcast_leave(mcast A)

As mcast_leave calls ib_sa_multicast_leave, and as member in
core/multicast.c is ref counted, we run into an unbalanced refcount
issue.  To avoid stomping on each others removes, take the rtnl lock
specifically when we are deleting the entries from the remove list.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:14 -08:00
Doug Ledford 016d9fb25c IPoIB: fix MCAST_FLAG_BUSY usage
Commit a9c8ba5884 ("IPoIB: Fix usage of uninitialized multicast
objects") added a new flag MCAST_JOIN_STARTED, but was not very strict
in how it was used.  We didn't always initialize the completion struct
before we set the flag, and we didn't always call complete on the
completion struct from all paths that complete it.  This made it less
than totally effective, and certainly made its use confusing.  And in
the flush function we would use the presence of this flag to signal
that we should wait on the completion struct, but we never cleared
this flag, ever.  This is further muddied by the fact that we overload
the MCAST_FLAG_BUSY flag to mean two different things: we have a join
in flight, and we have succeeded in getting an ib_sa_join_multicast.

In order to make things clearer and aid in resolving the rtnl deadlock
bug I've been chasing, I cleaned this up a bit.

 1) Remove the MCAST_JOIN_STARTED flag entirely
 2) Un-overload MCAST_FLAG_BUSY so it now only means a join is in-flight
 3) Test on mcast->mc directly to see if we have completed
    ib_sa_join_multicast (using IS_ERR_OR_NULL)
 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize
    the mcast->done completion struct
 5) Make sure that before calling complete(&mcast->done), we always clear
    the MCAST_FLAG_BUSY bit
 6) Take the mcast_mutex before we call ib_sa_multicast_join and also
    take the mutex in our join callback.  This forces
    ib_sa_multicast_join to return and set mcast->mc before we process
    the callback.  This way, our callback can safely clear mcast->mc
    if there is an error on the join and we will do the right thing as
    a result in mcast_dev_flush.
 7) Because we need the mutex to synchronize mcast->mc, we can no
    longer call mcast_sendonly_join directly from mcast_send and
    instead must add sendonly join processing to the mcast_join_task

A number of different races are resolved with these changes.  These
races existed with the old MCAST_FLAG_BUSY usage, the
MCAST_JOIN_STARTED flag was an attempt to address them, and while it
helped, a determined effort could still trip things up.

One race looks something like this:

Thread 1                             Thread 2
ib_sa_join_multicast (as part of running restart mcast task)
  alloc member
  call callback
                                     ifconfig ib0 down
				     wait_for_completion
    callback call completes
                                     wait_for_completion in
				     mcast_dev_flush completes
				       mcast->mc is PTR_ERR_OR_NULL
				       so we skip ib_sa_leave_multicast
    return from callback
  return from ib_sa_join_multicast
set mcast->mc = return from ib_sa_multicast

We now have a permanently unbalanced join/leave issue that trips up the
refcounting in core/multicast.c

Another like this:

Thread 1                   Thread 2         Thread 3
ib_sa_multicast_join
                                            ifconfig ib0 down
					    priv->broadcast = NULL
                           join_complete
			                    wait_for_completion
			   mcast->mc is not yet set, so don't clear
return from ib_sa_join_multicast and set mcast->mc
			   complete
			   return -EAGAIN (making mcast->mc invalid)
			   		    call ib_sa_multicast_leave
					    on invalid mcast->mc, hang
					    forever

By holding the mutex around ib_sa_multicast_join and taking the mutex
early in the callback, we force mcast->mc to be valid at the time we
run the callback.  This allows us to clear mcast->mc if there is an
error and the join is going to fail.  We do this before we complete
the mcast.  In this way, mcast_dev_flush always sees consistent state
in regards to mcast->mc membership at the time that the
wait_for_completion() returns.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:14 -08:00
Doug Ledford 67d7209e1f IPoIB: Make the carrier_on_task race aware
We blindly assume that we can just take the rtnl lock and that will
prevent races with downing this interface.  Unfortunately, that's not
the case.  In ipoib_mcast_stop_thread() we will call flush_workqueue()
in an attempt to clear out all remaining instances of ipoib_join_task.
But, since this task is put on the same workqueue as the join task,
the flush_workqueue waits on this thread too.  But this thread is
deadlocked on the rtnl lock.  The better thing here is to use trylock
and loop on that until we either get the lock or we see that
FLAG_ADMIN_UP has been cleared, in which case we don't need to do
anything anyway and we just return.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:14 -08:00
Doug Ledford afe1de664e IPoIB: Consolidate rtnl_lock tasks in workqueue
Setting the MTU can safely be moved to the carrier_on_task, which keeps
us from needing to take the rtnl lock in the join_finish section.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:11:14 -08:00
Nicholas Bellinger ed4520ae9b iser-target: Fix wc->wr_id cast warning
CC [M]  drivers/infiniband/ulp/isert/ib_isert.o
drivers/infiniband/ulp/isert/ib_isert.c: In function ‘isert_cq_comp_err’:
drivers/infiniband/ulp/isert/ib_isert.c:1979:42: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:35 -08:00
Sagi Grimberg 10633c37bf iser-target: Remove code duplication
- Fall-through in switch case instead in do_control_comp.
- Move rkey invalidation to a function.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:35 -08:00
Sagi Grimberg 4c22e07ffd iser-target: Adjust log levels and prettify some prints
debug_level 1 (warn): Include warning messages.
debug_level 2 (info): Include relevant info for control plane.
debug_level 3 (debug): Include relevant info in the IO path.

Also, added/removed some logging messages.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:34 -08:00
Sagi Grimberg 24f412dd38 iser-target: Use debug_level parameter to control logging level
Personal preference, easier control of the log level with
a single modparam which can be changed dynamically. Allows
better saparation of control and IO plains.

Replaced throughout ib_isert.c:
s/pr_debug/isert_dbg/g
s/pr_info/isert_info/g
s/pr_warn/isert_warn/g
s/pr_err/isert_err/g

Plus nit checkpatch warning change.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:34 -08:00
Sagi Grimberg 991bb7640d iser-target: Fix logout sequence
We don't want to wait for conn_logout_comp from isert_comp_wq
context as this blocks further completions from being processed.
Instead we wait for it conditionally (if logout response was
actually posted) in wait_conn. This wait should normally happen
immediately as it occurs after we consumed all the completions
(including flush errors) and conn_logout_comp should have been
completed.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:33 -08:00
Sagi Grimberg c7e160ee09 iser-target: Don't wait for session commands from completion context
Might result in a deadlock where completion context waits for
session commands release where the later might need a final
completion for it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:32 -08:00
Sagi Grimberg 36ea63b523 iser-target: Reduce CQ lock contention by batch polling
In order to reduce the contention on CQ locking (present
in some LLDDs) we poll in batches of 16 work completion items.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:32 -08:00
Sagi Grimberg 37d9fe80a3 iser-target: Introduce isert_poll_budget
In case the CQ is packed with completions, we can't just
hog the CPU forever. Poll until a sufficient budget (currently
hard-coded to 64k completions) and if budget is exhausted, bailout
and give a chance to other threads.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:31 -08:00
Sagi Grimberg bdf20e7254 iser-target: Remove an atomic operation from the IO path
In order to know that we consumed all the connection completions
we maintain atomic post_send_buf_count for each IO post send. But
we can know that if we post a "beacon" (zero length RECV work request)
after we move the QP into error state and the target does not serve
any new IO. When we consume it, we know we finished all the connection
completion and we can go ahead and destroy stuff.

In error completion handler we now just need to check for ISERT_BEACON_WRID
to arrive and then wait for session commands to cleanup and complete
conn_wait_comp_err.

We reserve another CQ and QP entries to fit the zero length post recv.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:31 -08:00
Sagi Grimberg 377d8f5c3e iser-target: Remove redundant call to isert_conn_terminate
We are calling session reinstatement, wait_conn will start
connection termination.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:30 -08:00
Sagi Grimberg 6f0fae3d77 iser-target: Use single CQ for TX and RX
Using TX and RX CQs attached to the same vector might
create a throttling effect coming from the serial processing
of a work-queue. Use one CQ instead, it will do better in interrupt
processing and it provides a simpler code. Also, We get rid of
redundant isert_rx_wq.

Next we can remove the atomic post_send_buf_count from the IO path.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:30 -08:00
Sagi Grimberg 4a295bae7e iser-target: Centralize completion elements to a context
A pre-step before going to a single CQ.
Also this makes the code a little more simple to
read.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:29 -08:00
Sagi Grimberg b0a191e703 iser-target: Cast wr_id with uintptr_t instead of unsinged long
Nit, uintptr_t is designed for pointer casting, use it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:28 -08:00
Sagi Grimberg df43debdfd iser-target: Unite error completion handler for RX and TX
As a pre-step to a single CQ, we unite the error completion
handlers to a single handler.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:28 -08:00
Sagi Grimberg 68a86dee8a iser-target: Remove interrupt coalescing
It is disabled at the moment, we will get that back
in once the target is more stable.

This reverts commit 95b60f0
"Add support for completion interrupt coalescing"

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:27 -08:00
Sagi Grimberg 268e6811f5 iser-target: Work-around live target stack shutdown resource cleanup
Currently we have no way to tell that the target stack is in shutdown
sequence. In case we have open connections, the initiator immediately
attempts to reconnect in a DDOS attack style, so we may end up
terminating the iser enabled network portal while it's np_accept_list
still have pending connections.

The workaround is simply release all the connections in the list.
A proper fix will be to start shutdown sequence by shutting the
network portal to avoid initiator immediate reconnect attempts.

But the temporary work around seems to work at this point, so I think
we can do this for now...

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:27 -08:00
Sagi Grimberg 23a548ee65 iscsi,iser-target: Expose supported protection ops according to t10_pi
iSER will report supported protection operations based on
the tpg attribute t10_pi settings and HCA PI offload capabilities.
If the HCA does not support PI offload or tpg attribute t10_pi is
not set, we fall to SW PI mode.

In order to do that, we move iscsit_get_sup_prot_ops after connection
tpg assignment.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:26 -08:00
Sagi Grimberg 302cc7c3ca iser-target: Fix NULL dereference in SW mode DIF
Fallback to software mode DIF if HCA does not support
PI (without crashing obviously). It is still possible to
run with backend protection and an unprotected frontend,
so looking at the command prot_op is not enough. Check
device PI capability on a per-IO basis (isert_prot_cmd
inline static) to determine if we need to handle protection
information.

Trace:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa037f8b1>] isert_reg_sig_mr+0x351/0x3b0 [ib_isert]
Call Trace:
 [<ffffffff812b003a>] ? swiotlb_map_sg_attrs+0x7a/0x130
 [<ffffffffa038184d>] isert_reg_rdma+0x2fd/0x370 [ib_isert]
 [<ffffffff8108f2ec>] ? idle_balance+0x6c/0x2c0
 [<ffffffffa0382b68>] isert_put_datain+0x68/0x210 [ib_isert]
 [<ffffffffa02acf5b>] lio_queue_data_in+0x2b/0x30 [iscsi_target_mod]
 [<ffffffffa02306eb>] target_complete_ok_work+0x21b/0x310 [target_core_mod]
 [<ffffffff8106ece2>] process_one_work+0x182/0x3b0
 [<ffffffff8106fda0>] worker_thread+0x120/0x3c0
 [<ffffffff8106fc80>] ? maybe_create_worker+0x190/0x190
 [<ffffffff8107594e>] kthread+0xce/0xf0
 [<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff8159a22c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:32:26 -08:00
Sagi Grimberg 570db170f3 iser-target: Allocate PI contexts dynamically
This patch converts to allocate PI contexts dynamically in order
avoid a potentially bogus np->tpg_np and associated NULL pointer
dereference in isert_connect_request() during iser-target endpoint
shutdown with multiple network portals.

Also, there is really no need to allocate these at connection
establishment since it is not guaranteed that all the IOs on
that connection will be to a PI formatted device.

We can do it in a lazy fashion so the initial burst will have a
transient slow down, but very fast all IOs will allocate a PI
context.

Squashed:

iser-target: Centralize PI context handling code

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:31:40 -08:00
Sagi Grimberg b02efbfc9a iser-target: Fix implicit termination of connections
In situations such as bond failover, The new session establishment
implicitly invokes the termination of the old connection.

So, we don't want to wait for the old connection wait_conn to completely
terminate before we accept the new connection and post a login response.

The solution is to deffer the comp_wait completion and the conn_put to
a work so wait_conn will effectively be non-blocking (flush errors are
assumed to come very fast).

We allocate isert_release_wq with WQ_UNBOUND and WQ_UNBOUND_MAX_ACTIVE
to spread the concurrency of release works.

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:18:09 -08:00
Sagi Grimberg ca6c1d82d1 iser-target: Handle ADDR_CHANGE event for listener cm_id
The np listener cm_id will also get ADDR_CHANGE event
upcall (in case it is bound to a specific IP). Handle
it correctly by creating a new cm_id and implicitly
destroy the old one.

Since this is the second event a listener np cm_id may
encounter, we move the np cm_id event handling to a
routine.

Squashed:

iser-target: Move cma_id setup to a function

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:18:09 -08:00
Sagi Grimberg 19e2090fb2 iser-target: Fix connected_handler + teardown flow race
Take isert_conn pointer from cm_id->qp->qp_context. This
will allow us to know that the cm_id context is always
the network portal. This will make the cm_id event check
(connection or network portal) more reliable.

In order to avoid a NULL dereference in cma_id->qp->qp_context
we destroy the qp after we destroy the cm_id (and make the
dereference safe). session stablishment/teardown sequences
can happen in parallel, we should take into account that
connected_handler might race with connection teardown flow.

Also, protect isert_conn->conn_device->active_qps decrement
within the error patch during QP creation failure and the
normal teardown path in isert_connect_release().

Squashed:

iser-target: Decrement completion context active_qps in error flow

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 23:17:28 -08:00
Sagi Grimberg 2371e5da8c iser-target: Parallelize CM connection establishment
There is no point in accepting a new CM request only
when we are completely done with the last iscsi login.
Instead we accept immediately, this will also cause the
CM connection to reach connected state and the initiator
is allowed to send the first login. We mark that we got
the initial login and let iscsi layer pick it up when it
gets there.

This reduces the parallel login sequence by a factor of
more then 4 (and more for multi-login) and also prevents
the initiator (who does all logins in parallel) from
giving up on login timeout expiration.

In order to support multiple login requests sequence (CHAP)
we call isert_rx_login_req from isert_rx_completion insead
of letting isert_get_login_rx call it.

Squashed:

iser-target: Use kref_get_unless_zero in connected_handler
iser-target: Acquire conn_mutex when changing connection state
iser-target: Reject connect request in failure path

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 22:58:52 -08:00
Sagi Grimberg 128e9cc845 iser-target: Fix flush + disconnect completion handling
ISER_CONN_UP state is not sufficient to know if
we should wait for completion of flush errors and
disconnected_handler event.

Instead, split it to 2 states:
- ISER_CONN_UP: Got to CM connected phase, This state
indicates that we need to wait for a CM disconnect
event before going to teardown.

- ISER_CONN_FULL_FEATURE: Got to full feature phase
after we posted login response, This state indicates
that we posted recv buffers and we need to wait for
flush completions before going to teardown.

Also avoid deffering disconnected handler to a work,
and handle it within disconnected handler.
More work here is needed to handle DEVICE_REMOVAL event
correctly (cleanup all resources).

Squashed:

iser-target: Don't deffer disconnected handler to a work

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 22:57:28 -08:00
Sagi Grimberg 954f23722b iscsi,iser-target: Initiate termination only once
Since commit 0fc4ea701f ("Target/iser: Don't put isert_conn inside
disconnected handler") we put the conn kref in isert_wait_conn, so we
need .wait_conn to be invoked also in the error path.

Introduce call to isert_conn_terminate (called under lock)
which transitions the connection state to TERMINATING and calls
rdma_disconnect. If the state is already teminating, just bail
out back (temination started).

Also, make sure to destroy the connection when getting a connect
error event if didn't get to connected (state UP). Same for the
handling of REJECTED and UNREACHABLE cma events.

Squashed:

iscsi-target: Add call to wait_conn in establishment error flow

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12 22:55:22 -08:00
James Bottomley 096cbc35ea Merge remote-tracking branch 'scsi-queue/drivers-for-3.19' into for-linus
Conflicts:
	drivers/scsi/scsi_debug.c

Agreed and tested resolution to a merge problem between a fix in scsi_debug
and a driver update

Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-12-08 07:42:25 -08:00
James Bottomley dc843ef00e Merge remote-tracking branch 'scsi-queue/core-for-3.19' into for-linus 2014-12-08 07:40:20 -08:00
Christoph Hellwig 68d81f4004 scsi: remove MSG_*_TAG defines
For SPI drivers use the message definitions from scsi.h, and for target
drivers introduce a new TCM_*_TAG namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com
2014-12-04 09:58:33 +01:00
Christoph Hellwig efc3c1df5f scsi: remove ->change_queue_type method
Since we got rid of ordered tag support in 2010 the prime use case of
switching on and off ordered tags has been obsolete.  The other function
of enabling/disabling tagging entirely has only been correctly implemented
by the 53c700 driver and isn't generally useful.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-12-04 09:55:45 +01:00
Christoph Hellwig db5ed4dfd5 scsi: drop reason argument from ->change_queue_depth
Drop the now unused reason argument from the ->change_queue_depth method.
Also add a return value to scsi_adjust_queue_depth, and rename it to
scsi_change_queue_depth now that it can be used as the default
->change_queue_depth implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-24 14:45:27 +01:00
Christoph Hellwig 1e6f241604 scsi: don't allow setting of queue_depth bigger than can_queue
We won't ever queue more commands than the host allows.  Instead of
letting drivers either reject or ignore this case handle it in
common code.  Note that various driver use internal constant or
variables that are assigned to both shost->can_queue and checked
in ->change_queue_depth - I did remove those checks as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-24 14:45:26 +01:00
Christoph Hellwig c40ecc12cf scsi: avoid ->change_queue_depth indirection for queue full tracking
All drivers use the implementation for ramping the queue up and down, so
instead of overloading the change_queue_depth method call the
implementation diretly if the driver opts into it by setting the
track_queue_depth flag in the host template.

Note that a few drivers validated the new queue depth in their
change_queue_depth method, but as we never go over the queue depth
set during slave_configure or the sysfs file this isn't nessecary
and can safely be removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
2014-11-24 14:45:12 +01:00
Chris Moore b1a5ad006b IB/isert: Adjust CQ size to HW limits
isert has an issue of trying to create a CQ with more CQEs than are
supported by the hardware, that currently results in failures during
isert_device creation during first session login.

This is the isert version of the patch that Minh Tran submitted for
iser, and is simple a workaround required to function with existing
ocrdma hardware.

Signed-off-by: Chris Moore <chris.moore@emulex.com>
Reviewied-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-11-19 22:28:42 -08:00
Bart Van Assche 7dad6b2e44 IB/srp: Fix a race condition triggered by destroying a queue pair
At least LID reassignment can trigger a race condition in the SRP
initiator driver, namely the receive completion handler trying to
post a request on a QP during or after QP destruction and before
the CQ's have been destroyed. Avoid this race by modifying a QP
into the error state and by waiting until all receive completions
have been processed before destroying a QP.

Reported-by: Max Gurtuvoy <maxg@mellanox.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 12:05:24 +01:00
Bart Van Assche d92c0da71a IB/srp: Add multichannel support
Improve performance by using multiple RDMA/RC channels per SCSI
host for communication with an SRP target. About the
implementation:
- Introduce a loop over all channels in the code that uses
  target->ch.
- Set the SRP_MULTICHAN_MULTI flag during login for the creation
  of the second and subsequent channels.
- RDMA completion vectors are chosen such that RDMA completion
  interrupts are handled by the CPU socket that submitted the I/O
  request. As one can see in this patch it has been assumed if a
  system contains n CPU sockets and m RDMA completion vectors
  have been assigned to an RDMA HCA that IRQ affinity has been
  configured such that completion vectors [i*m/n..(i+1)*m/n) are
  bound to CPU socket i with 0 <= i < n.
- Modify srp_free_ch_ib() and srp_free_req_data() such that it
  becomes safe to invoke these functions after the corresponding
  allocation function failed.
- Add a ch_count sysfs attribute per target port.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 12:05:24 +01:00
Bart Van Assche 77f2c1a40e IB/srp: Use block layer tags
Since the block layer already contains functionality to assign
a tag to each request, use that functionality instead of
reimplementing that functionality in the SRP initiator driver.
This change makes the free_reqs list superfluous. Hence remove
that list.

[hch: updated to use .use_blk_tags instead scsi_activate_tcq]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 12:05:23 +01:00
Bart Van Assche 509c07bc18 IB/srp: Separate target and channel variables
Changes in this patch:
- Move channel variables into a new structure (struct srp_rdma_ch).
- Add an srp_target_port pointer, 'lock' and 'comp_vector' members
  in struct srp_rdma_ch.
- Add code to initialize these three new member variables.
- Many boring "target->" into "ch->" changes.
- The cm_id and completion handler context pointers are now of type
  srp_rdma_ch * instead of srp_target_port *.
- Three kzalloc(a * b, f) calls have been changed into kcalloc(a, b, f)
  to avoid that this patch would trigger a checkpatch warning.
- Two casts from u64 into unsigned long long have been left out
  because these are superfluous. Since considerable time u64 is
  defined as unsigned long long for all architectures supported by
  the Linux kernel.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:32:03 +01:00
Bart Van Assche 747fe000ef IB/srp: Introduce two new srp_target_port member variables
Introduce the srp_target_port member variables 'sgid' and 'pkey'.
Change the type of 'orig_dgid' from __be16[8] into union ib_gid.
This patch does not change any functionality but makes the
"Separate target and channel variables" patch easier to verify.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:32:03 +01:00
Bart Van Assche 34aa654ecb IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning
If a cable is pulled during LUN scanning it can happen that the
SRP rport and the SCSI host have been created but no LUNs have been
added to the SCSI host. Since multipathd only sends SCSI commands
to a SCSI target if one or more SCSI devices are present and since
there is no keepalive mechanism for IB queue pairs this means that
after a LUN scan failed and after a reconnect has succeeded no
data will be sent over the QP and hence that a subsequent cable
pull will not be detected. Avoid this by not creating an rport or
SCSI host if a cable is pulled during a SCSI LUN scan.

Note: so far the above behavior has only been observed with the
kernel module parameter ch_count set to a value >= 2.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:32:02 +01:00
Bart Van Assche 205619f2f8 IB/srp: Remove stale connection retry mechanism
Attempting to connect three times may be insufficient after an
initiator system tries to relogin, especially if the relogin
attempt occurs before the SRP target service ID has been
registered. Since the srp_daemon retries a failed login attempt
anyway, remove the stale connection retry mechanism.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:32:02 +01:00