This will solve a possible condition where we might miss TX completion
(flush error) during session teardown. Since we are using a single
CQ, we don't need to actively drain the TX CQ, instead just wait for
flush_completion (when counters reach zero) and remove iser_poll_for_flush_errors().
This patch might introduce a minor performance regression on its own,
but the next patches will enhance performance using a single CQ for RX
and TX.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We need a way to guarentee that we don't stay in soft-IRQ context for
too long. We might starve other pending CQ tasklets or worse lock
against application trying to issue IO on the running CPU.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Introduce iser_comp which centralizes all iser completion related
items and is referenced by iser_device and each ib_conn.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In case iscsid was violently killed (SIGKILL) during its error
recovery stage, we may never get a connection teardown sequence for
some of the old connections. No harm done, but when we try to unload
the module we will need to cleanup all these connections. So we
actually may end-up here - so it's not a BUG_ON(), just give a relaxed
warning that this happened and continue with normal unload. BUG_ON()
will cause segfault on module_exit and we don't want that.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Previously we notified iscsi layer about the connection layer when
we consumed all of our flush errors. This was racy as there
was no guarentee that iscsi_conn wasn't terminated by then (which ends
up in an invalid memory access). In case we got a non FLUSH error
completion, we are guarenteed that iscsi_conn is still alive. We should
notify iSCSI layer with iscsi_conn_failure to initiate error handling.
While we are at it, add a nice kernel-doc style documentation.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We no longer rely on iscsi connection teardown sequence, so no need to
give a grace period and continue cleanup if it expired. Have
iser_conn_release wait for full completion before freeing iser_conn.
ib_completion:
Guaranteed to come when:
- Got DISCONNECTED/ADDR_CHANGE event or
- iSCSI called ep_disconnect/conn_stop
Guaranteed to finish when:
- Got TIMEWAIT_EXIT/DEVICE_REMOVAL event
- All Flush errors are consumed
- IB related resources are destroyed
stop_completion:
Guaranteed to come when:
- iSCSI calls conn_stop
Guaranteed to finish when:
- All inflight tasks were cleaned up
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
iscsi daemon is in user-space, thus we can't rely on it to be invoked
at connection teardown (if not running or does not receive CPU time).
This patch addresses the issue by re-structuring iSER connection
teardown logic and CM events handling.
The CM events will dictate the RDMA resources destruction (ib_conn)
and iser_conn is kept around as long as iscsi_conn is left around
allowing iscsi/iser callbacks to continue after RDMA transport was
destroyed.
This patch introduces a separation in logic when handling CM events:
- DISCONNECTED_HANDLER, ADDR_CHANGED
This events indicate the start of teardown process.
Actions:
1. Terminate the connection: rdma_disconnect (send DREQ/DREP)
2. Notify iSCSI of connection failure
3. Change state to TERMINATING
4. Poll for all flush errors to be consumed
- TIMEWAIT_EXIT, DEVICE_REMOVAL
These events indicate the final stage of termination process and
we can free RDMA related resources.
Actions:
1. Call disconnected handler (we are not guaranteed that DISCONNECTED
event was invoked in the past)
2. Cleanup RDMA related resources
3. For DEVICE_REMOVAL return non-zero rc from cma_handler to
implicitly destroy the cm_id (Can't rely on user-space, make sure
we have forward progress)
We replace flush_completion (indicate all flushes were consumed) with
ib_completion (rdma resources were cleaned up).
The iser_conn_release_work will wait for teardown completions:
- conn_stop was completed (tasks were cleaned-up) - stop_completion
- RDMA resources were destroyed - ib_completion
And then will continue to free iser connection representation (iser_conn).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Put all connection IB related resources release in this routine. One
exception is the cm_id which cannot be destroyed as the routine is
protected by the state mutex. Also move its position to avoid forward
declaration. While at it fix qp NULL assignment.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Structure that describes the RDMA relates connection objects. Static
member of iser_conn.
This patch does not change any functionality
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Two reasons why we choose to do this:
1. No point today calling struct iser_conn by another name ib_conn
2. In the next patches we will restructure iser control plane representation
- struct iser_conn: connection logical representation
- struct ib_conn: connection RDMA layout representation
This patch does not change any functionality.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When failing to allocate TX CQ we already allocated RX CQ, so we need to make
sure we release it. Also, when failing to register notification to the RX CQ
we currently leak both RX and TX CQs of the current index, fix that too.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This is to prevent someone from thinking that this code section is
redundant.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Instead of waiting for events and condition changes of the iser
connection state, we wait for explicit completion of connection
establishment and teardown.
Separate connection establishment wait object from the teardown object
to avoid a situation where racing connection establishment and
teardown may concurrently wakeup each other.
ep_poll will wait for up_completion invoked by
iser_connected_handler() and iser release worker will wait for
flush_completion before releasing the connection.
Bound the completion wait with a 30 seconds timeout for cases where
iscsid (the user space iscsi daemon) is too slow or gone.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The iser connection state lookups and transitions are not fully protected.
Some transitions are protected with a spinlock, and in some cases the
state is accessed unprotected due to specific assumptions of the flow.
Introduce a new mutex to protect the connection state access. We use a
mutex since we need to also include a scheduling operations executed
under the state lock.
Each state transition/condition and its corresponding action will be
protected with the state mutex.
The rdma_cm events handler acquires the mutex when handling connection
events. Since iser connection state can transition to DOWN
concurrently during connection establishment, we bailout from
addr/route resolution events when the state is not PENDING.
This addresses a scenario where ep_poll retries expire during CMA
connection establishment. In this case ep_disconnect is invoked while
CMA events keep coming (address/route resolution, connected, etc...).
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Make it void.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
iser connection needs asynchronous cleanup completions which are
triggered in ep_disconnect. As a result we are keeping the
corresponding iscsi_endpoint structure hanging for no good reason. In
order to avoid that, we seperate iser_conn from iscsi_endpoint storage
space to have their destruction being independent.
iscsi_endpoint will be destroyed at ep_disconnect stage, while the
iser connection will wait for asynchronous completions to be released
in an orderly fashion.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The iser initiator is the RDMA responder so it should publish to the
target the max inflight rdma read requests its local HCA can handle in
responder_resources (max_qp_rd_atom).
The iser target should take the min of that and its local HCA max
inflight oustanding rdma read requests (max_qp_init_rd_atom).
We keep initiator_depth set to 1 in order to compat with old targets.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In case the DISCONNECTED event is not delivered after rdma_disconnect
is called, the CM waits TIMEWAIT seconds and delivers the
TIMEWAIT_EXIT local event. We use this as the notification needed to
continue in the teardown and release sequence.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Replace struct sockaddr_in with struct sockaddr which supports both
IPv4 and IPv6, and print using the %pIS format directive.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In some circumstances (multiple targets), RDMA_CM ESTABLISHED event
and ep_disconnect may race. In this case, the iser connection state
may transition to UP (after ep_disconnect transitioned it to
TERMINATING), while the connection is being torn down.
Upon RDMA_CM event ESTABLISHED we allow iser connection state to
transition to UP only from PENDING. We also make sure to protect this
state change (done under the connection lock).
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
iSER relies on refcounting to manage iser connections establishment
and teardown.
Following commit 39ff05dbbb ("IB/iser: Enhance disconnection logic
for multi-pathing"), iser connection maintain 3 references:
- iscsi_endpoint (at creation stage)
- cma_id (at connection request stage)
- iscsi_conn (at bind stage)
We can avoid taking explicit refcounts by correctly serializing iser
teardown flows (graceful and non-graceful).
Our approach is to trigger a scheduled work to handle ordered teardown
by gracefully waiting for 2 cleanup stages to complete:
1. Cleanup of live pending tasks indicated by iscsi_conn_stop completion
2. Flush errors processing
Each completed stage will notify a waiting worker thread when it is
done to allow teardwon continuation.
Since iSCSI connection establishment may trigger endpoint disconnect
without a successful endpoint connect, we rely on the iscsi <-> iser
binding (.conn_bind) to learn about the teardown policy we should take
wrt cleanup stages.
Since all cleanup worker threads are scheduled (release_wq) in
.ep_disconnect it is safe to assume that when module_exit is called,
all cleanup workers are already scheduled. Thus proper module unload
shall flush all scheduled works before allowing safe exit, to
guarantee no resources got left behind.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Update Mellanox copyrights for 2014 on the iser initiator driver.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add an iser info print with the local/remote QP information carried
out when the connection is established. While here, fix a little
leftover from the T10 work and set a debug print to be carried in
debug and not info level.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The iscsi stack has existing mechanisms to link back and forth between
the iscsi connection and the iscsi transport (e.g iser/tcp) connection.
This is done through a dd_data pointer field in struct iscsi_conn
which can be set to point to the transport connection, etc.
The iscsi_iser_conn structure was used to get this linking done in
another way, which is uneeded and adds extra complication to the iser
code, so we just remove it.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The iser disconnection flow isn't done before all the inflight
recv/send buffers posted to the QP are either flushed or normally
completed to the CQ that serves this connection. The condition check
is done in iser_handle_comp_error().
Currently, it's possible for the send buffer completion that makes the
posted send buffers counter reach zero to be polled in the drain tx
call, which is after the rx cq is fully drained. Since this
completion might be not an error one (for example, it might be a
completion of the logout request iSCSI PDU) we will skip
iser_handle_comp_error(). So the connection will never terminate from
the iscsi stack point of view, and we hang.
To resolve this race, do the draining of the tx cq before the loop on
the rx cq.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix pr_err (printk) format warning:
drivers/infiniband/ulp/iser/iser_verbs.c:1181:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'sector_t' [-Wformat]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Once the iSCSI transaction is completed we must implement
check_protection in order to notify on DIF errors that may have
occured.
The routine boils down to calling ib_check_mr_status to get the
signature status of the transaction.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
During connection establishment we also initialize T10-PI resources
(QP, PI contexts) in order to support SCSI's protection operations.
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use modparams to activate protection information support.
pi_enable bool: Based on this parameter iSER will know if it should
support T10-PI. We don't want to do this by default as it requires to
allocate and initialize extra resources. In case pi_enable=N, iSER
won't publish to SCSI midlayer any DIF capabilities.
pi_guard int: Based on this parameter iSER will publish DIX guard type
support to SCSI midlayer. 0 means CRC is allowed to be passed in DIX
buffers, 1 (or non-zero) means IP-CSUM is allowed to be passed in DIX
buffers. Note that over the wire, only CRC is allowed.
In the next phase, it is worth considering passing these parameters
from iscsid via nlmsg. This will allow these parameters to be
connection based rather than global.
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In T10-PI support we will have memory keys for protection buffers and
signature transactions. We prefer to compact indicators rather than
keeping multiple bools.
This commit does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
For T10-PI offload support, we will need to know the device signature
offload capability upon every connection establishment.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
fastreg descriptor will include protection information context. In
order to place the logic in one place we introduce iser_create_fr_desc
function.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
FRWR stands for "fast registration work request". We want to avoid
calling the fastreg pool with that name, instead we name it fastreg
which stands for "fast registration".
This pool will include more elements in the future, so it is a good
idea to generalize the name.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In case iSER uses fast registration method, it should not request for
successful completions on fast registration nor local invalidate
requests. We color wr_id with ISER_FRWR_LI_WRID in order to correctly
consume error completions.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix a possible NULL pointer dereference in disconnection flow. This
can happen if the target disconnected/rejected the connection request,
e.g before the binding stage between iscsi connection to the transport
connection.
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix leak where desc is not being freed in error flows.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Newer HCAs and Virtual functions may not support FMRs but rather a fast
registration model, which we call FRWR - "Fast Registration Work Requests".
This model was introduced in 00f7ec36c ("RDMA/core: Add memory management
extensions support") and works when the IB device supports the
IB_DEVICE_MEM_MGT_EXTENSIONS capability.
Upon creating the iser device iser will test whether the HCA supports
FMRs. If no support for FMRs, check if IB_DEVICE_MEM_MGT_EXTENSIONS
is supported and assign function pointers that handle fast
registration and allocation of appropriate resources (fast_reg
descriptors).
Registration is done using posting IB_WR_FAST_REG_MR to the QP and
invalidations using posting IB_WR_LOCAL_INV.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This is preparation step for other memory registration methods to be
added. In addition, change reg/unreg routines signature to indicate
they use FMRs.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently the driver uses FMRs as the only means to register the
memory pointed by SG provided by the SCSI mid-layer with the RDMA
device.
As preparation step for adding more methods for fast path memory
registration, make the alloc/free and reg/unreg calls function
pointers, which are for now just set to the existing FMR ones.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use cmds_max passed from user space to be the number of PDUs to be
supported for the session instead of hard-coded ISCSI_DEF_XMIT_CMDS_MAX.
This allow controlling the max number of SCSI commands for the session.
Also don't ignore the qdepth passed from user space.
Derive from session->cmds_max the actual number of RX buffers and FMR
pool size to allocate during the connection bind phase.
Since the iser transport connection is established before the iscsi
session/connection are created and bound, we still use one hard-coded
quantity ISER_DEF_XMIT_CMDS_MAX to compute the maximum number of
work-requests to be supported by the RC QP used for the connection.
The above quantity is made to be a power of two between ISCSI_TOTAL_CMDS_MIN
(16) and ISER_DEF_XMIT_CMDS_MAX (512) inclusive.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This is a preparation step to a patch that accepts the number of max
SCSI commands to be supported a session from user space iSCSI tools.
Move the allocation of the login buffer, FMR pool and its associated
page vector from iser_create_ib_conn_res() (which is called prior when
we actually know how many commands should be supported) to
iser_alloc_rx_descriptors() (which is called during the iscsi
connection bind step where this quantity is known).
Also do small refactoring around the deallocation to make that path
similar to the allocation one.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add Mellanox copyright to the iser initiator source code which I maintain.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Change the code to destroy the "last opened" rdma_cm id after making
sure we released all other objects (QP, CQs, PD, etc) associated with
the IB device.
Since iser accesses the IB device using the rdma_cm id, we need to
free any objects that are related to the device that is associated
with the rdma_cm id prior to destroying that id. When this isn't
done, the low level driver that created this device can be unloaded
before iser has a chance to free all the objects and a such a call may
invoke code segment which isn't valid any more and crash.
Cc: Sean Hefty <sean.hefty@intel.com
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Annex A12 of the IBTA spec defines additional information that needs
to be provided through the CM exchange relating to usage of ZBVA (Zero
Based VAs) and Send With Invalidate over an iSER connection.
Currently, the initiator sets both to not supported, but does provide
the header so that existing iSER targets can be patched to start
looking on the private data carried by the CM.
This is a preparation step to enable iSER with HW drivers for which
FMRs are not supported, such as mlx4 VF instances or new HW devices
which might support only FRWR (Fast Registration Work-Requests) along
the details of the IB_DEVICE_MEM_MGT_EXTENSIONS device capability.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Introduce iser_info() and move informational messages that were
printed as errors to use that macro. Also, cleanup printk leftovers to
use the existing macros.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
[ Use pr_warn(... instead of printk(KERN_WARNING .... - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reuse the "SG unaligned for FMR" driver flow to make the initiator
functional when running over driver instance which doesn't support
FMRs, such as a mlx4 virtual function.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
RX/TX CQs will now be selected from a per HCA pool. For the RX flow
this has the effect of using different interrupt vectors when using
low level drivers (such as mlx4) that map the "vector" param provided
by the ULP on CQ creation to a dedicated IRQ/MSI-X vector. This
allows the RX flow processing of IO responses to be distributed across
multiple CPUs.
QPs (--> iSER sessions) are assigned to CQs in round robin order using
the CQ with the minimum number of sessions attached to it.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The current error flow code was releasing the IB connection object and
calling iscsi_destroy_endpoint() directly without going through the
reference counting mechanism introduced in commit 39ff05d ("IB/iser:
Enhance disconnection logic for multi-pathing"). This resulted in a
double free of the iscsi endpoint object, which causes a kernel NULL
pointer dereference. Fix that by plugging into the IB conn reference
counting correctly.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We allocate the login dma buffers in iser_verbs.c as part of
alloc_ib_conn_resources(), however we are freeing them in
iser_initiator.c as part of iser_free_rx_descriptors(). This is
needlessly confusing. We have an alloc_rx_descriptors() and it
doesn't alloc something that the free_rx_descriptors() frees, and we
have an alloc_ib_conn_resources() that allocs something not freed by
free_ib_conn_resources(). Clean that up.
Signed-off-by: Doug Ledford <dledford@redhat.com>
[ Fix build error in iser_free_ib_conn_res(). - Or ]
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The driver counted on the transactional nature of iSCSI login/text
flows and used the same buffer for both the request and the response.
We also went further and did DMA mapping only once, with
DMA_FROM_DEVICE, which violates the DMA mapping API. Fix that by
using different buffers, one for requests and one for responses, and
use the correct DMA mapping direction for each.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The RDMA CM currently infers the QP type from the port space selected
by the user. In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type. For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.
Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We shouldn't free things here because we free them later.
The call tree looks like this:
iser_connect() ==> initiating the connection establishment
and later
iser_cma_handler() => iser_route_handler() => iser_create_ib_conn_res()
if we fail here, eventually iser_conn_release() is called, resulting
in a double free.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The iser connection teardown flow isn't over until the underlying
Connection Manager (e.g the IB CM) delivers a disconnected or timeout
event through the RDMA-CM. When the remote (target) side isn't
reachable, e.g when some HW e.g port/hca/switch isn't functioning or
taken down administratively, the CM timeout flow is used and the event
may be generated only after relatively long time -- on the order of
tens of seconds.
The current iser code exposes this possibly long delay to higher
layers, specifically to the iscsid daemon and iscsi kernel stack. As a
result, the iscsi stack doesn't respond well: this low-level CM delay
is added to the fail-over time under HA schemes such as the one
provided by DM multipath through the multipathd(8) service.
This patch enhances the reference counting scheme on iser's IB
connections so that the disconnect flow initiated by iscsid from user
space (ep_disconnect) doesn't wait for the CM to deliver the
disconnect/timeout event. (The connection teardown isn't done from
iser's view point until the event is delivered)
The iser ib (rdma) connection object is destroyed when its reference
count reaches zero. When this happens on the RDMA-CM callback
context, extra care is taken so that the RDMA-CM does the actual
destroying of the associated ID, since doing it in the callback is
prohibited.
The reference count of iser ib connection normally reaches three,
where the <ref, deref> relations are
1. conn <init, terminate>
2. conn <bind, stop/destroy>
3. cma id <create, disconnect/error/timeout callbacks>
With this patch, multipath fail-over time is about 30 seconds, while
without this patch, multipath fail-over time is about 130 seconds.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The iscsi connection object life cycle includes binding and unbinding
(conn_stop) to/from the iscsi transport connection object. Since
iscsi connection objects are recycled, at the time the transport
connection (e.g iser's IB connection) is released, it is not valid to
touch the iscsi connection tied to the transport back-pointer since it
may already point to a different transport connection.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add handler to handle events such as port up and down. This is useful
when testing high-availability schemes such as multi-pathing.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Remove unnecessary checks for the IB connection state and for QP
overflow, as conn state changes are reported by iSER to libiscsi and
handled there. QP overflow is theoretically possible only when
unsolicited data-outs are used; anyway it's being checked and handled
by HW drivers.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Simplify and shrink the logic/code used for the send descriptors.
Changes include removing struct iser_dto (an unnecessary abstraction),
using struct iser_regd_buf only for handling SCSI commands, using
dma_sync instead of dma_map/unmap, etc.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use a different CQ for send completions, where send completions are
polled by the interrupt-driven receive completion handler. Therefore,
interrupts aren't used for the send CQ.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Now that both the posting and reaping of receive buffers is done in
the completion path, the counter of outstanding buffers not be atomic.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Currently, the recv buffer posting logic is based on the transactional
nature of iSER which allows for posting a buffer before sending a PDU.
Change this to post only when the number of outstanding recv buffers
is below a water mark and in a batched manner, thus simplifying and
optimizing the data path. Use a pre-allocated ring of recv buffers
instead of allocating from kmem cache. A special treatment is given
to the login response buffer whose size must be 8K unlike the size of
buffers used for any other purpose which is 128 bytes.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We will make a major change in the recv buffer posting logic, after
which the problem commit bba7ebb "avoid recv buffer exhaustion caused
by unexpected PDUs" comes to solve doesn't exist any more, so revert it.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
fix some typos and punctuation in comments
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Remove hard setting of the IB MTU used by iSER's RC queue-pair to 1K,
as this was done due to inter-op issues with an old iser target which
is not used any more.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
iSCSI/iSER targets may send PDUs without a prior request from the
initiator. RFC 5046 refers to these PDUs as "unexpected". NOP-In PDUs
with itt=RESERVED and Asynchronous Message PDUs occupy this category.
The amount of active "unexpected" PDU's an iSER target may have at any
time is governed by the MaxOutstandingUnexpectedPDUs key, which is not
yet supported.
Currently when an iSER target sends an "unexpected" PDU, the
initiators recv buffer consumed by the PDU is not replaced. If over
initial_post_recv_bufs_num "unexpected" PDUs are received then the
receive queue will run out of receive work requests entirely.
This patch ensures recv buffers consumed by "unexpected" PDUs are
replaced in the next iser_post_receive_control() call.
Signed-off-by: David Disseldorp <ddiss@sgi.com>
Signed-off-by: Ken Sandars <ksandars@sgi.com>
Acked-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch lets the files using linux/version.h match the files that
#include it.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enhance iser to act upon notification on network stack changes that
make its RDMA connection unaligned with the link used by the stack for
the <src,dst> IPs used to establish the connection.
When RDMA_CM_EVENT_ADDR_CHANGE arrives, just disconnect the
connection, assuming that the user space iscsid daemon will reconnect,
and the new connection will be aligned with the IP stack.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (102 commits)
[SCSI] scsi_dh: fix kconfig related build errors
[SCSI] sym53c8xx: Fix bogus sym_que_entry re-implementation of container_of
[SCSI] scsi_cmnd.h: remove double inclusion of linux/blkdev.h
[SCSI] make struct scsi_{host,target}_type static
[SCSI] fix locking in host use of blk_plug_device()
[SCSI] zfcp: Cleanup external header file
[SCSI] zfcp: Cleanup code in zfcp_erp.c
[SCSI] zfcp: zfcp_fsf cleanup.
[SCSI] zfcp: consolidate sysfs things into one file.
[SCSI] zfcp: Cleanup of code in zfcp_aux.c
[SCSI] zfcp: Cleanup of code in zfcp_scsi.c
[SCSI] zfcp: Move status accessors from zfcp to SCSI include file.
[SCSI] zfcp: Small QDIO cleanups
[SCSI] zfcp: Adapter reopen for large number of unsolicited status
[SCSI] zfcp: Fix error checking for ELS ADISC requests
[SCSI] zfcp: wait until adapter is finished with ERP during auto-port
[SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver
[SCSI] sg: Add target reset support
[SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC
[SCSI] sd: Move scsi_disk() accessor function to sd.h
...
This hooks iser into the iscsi endpoint code. Previously it handled the
lookup and allocation. This has been made generic so bnx2i and iser can
share it. It also allows us to pass iser the leading conn's ep, so we
know the ib_deivce being used and can set it as the scsi_host's parent.
And that allows scsi-ml to set the dma_mask based on those values.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
After the stop_conn callback has returned the LLD should not
touch the scsi cmds. iscsi_tcp and libiscsi use the
conn->recv_lock and suspend_rx field to halt recv path
processing, but iser does not have any protection.
This patch modifies iser so that userspace can just
call the ep_disconnect callback, which will halt
all recv IO, before calling the stop_conn callback so
we do not have to worry about the conn->recv_lock and
suspend rx field. iser just needs to stop the send side
from accessing the ib conn.
Fixup to handle when the ep poll fails and ep disconnect
is called from Erez.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should
release the connection resources.
This is necessary when the IB HCA module is unloaded while open-iscsi
is still running. Currently, iSER just BUG()s.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
"iser_device" allocation failure is "handled" with a BUG_ON() right
before dereferencing the NULL-pointer - fix this!
Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
The iteration through the list of "iser_device"s during device
lookup/creation is broken -- it might result in an infinite loop if
more than one HCA is used with iSER. Fix this by using
list_for_each_entry() instead of the open-coded flawed list iteration
code.
Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some RDMA CM events are not supported or not handled in iSER.
This patch adds some info (printk) for the user about them.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
<asm/scatterlist.h> is not needed because everyplace it appears,
<linux/scatterlist.h> also appears. <asm/io.h> is not needed because
nothing seems to be using device IO anyway.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make iser_conn_release() and iser_start_rdma_unaligned_sg() static,
since they are only used in the .c file where they are defined. In
addition to being a cleanup, this even shrinks the generated code by
allowing the single call of iser_start_rdma_unaligned_sg() to be
inlined into its callsite. On x86_64:
add/remove: 0/1 grow/shrink: 1/0 up/down: 466/-533 (-67)
function old new delta
iser_reg_rdma_mem 1518 1984 +466
iser_start_rdma_unaligned_sg 533 - -533
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch allows us to set can_queue and cmds_per_lun from userspace
when we create the session/host. From there we can set it on a per
target basis. The patch fully converts iscsi_tcp, but only hooks
up ib_iser for cmd_per_lun since it currently has a lots of preallocations
based on can_queue.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API. Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value. Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.
We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity. This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a connection is terminated asynchronously from the iSCSI layer's
perspective, iSER needs to notify the iSCSI layer that the connection
has failed. This is done using a workqueue (switched to from the iSER
tasklet context). Meanwhile, the connection object (that holds the
work struct) is released. If the workqueue function wasn't called
yet, it will be called later with a NULL pointer, which will crash the
kernel.
The context switch (tasklet to workqueue) is not required, and
everything can be done from the iSER tasklet. This eliminates the NULL
work struct bug (and simplifies the code).
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
iSER uses a data transaction object (struct iser_dto) as part
of its IB data descriptors (struct iser_desc) management.
It also uses a hierarchy of connection structures pointing to
each other. A DTO may exist even after the iscsi_iser connection
pointed by it is destroyed (eg one that is bound to a post
receive buffer which was flushed by the IB HW). Hence DTOs need
point to the lowest connection, which is struct iser_conn.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fast Memory Registration (fmr) is used to register for rdma an sg whose
elements are not linearly sequential after dma mapping.
The IB verbs layer provides an "all dma memory MR (memory region)" which
can be used for RDMA-ing a dma linearly sequential buffer.
Change the code to use the dma mr instead of doing fmr when dma mapping
produces a single dma entry sg.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
As iser is able to use at most one rdma operation for the
execution of a scsi command, and registration of the sg
associated with scsi command has its restrictions, the code
checks if an sg is "aligned for rdma".
Alignment for rdma is measured in "fmr page" units whose
possible resolutions are different between HCAs and can be
smaller, equal or bigger to the system page size.
When the system page size is bigger than 4KB (eg the default
with ia64 kernels) there a bigger chance that an sg would be
aligned for rdma if the fmr page size is 4KB.
Change the code to create FMR whose pages are of size 4KB
and to take that into account when processing the sg.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ib_fmr_pool_map_phys gets the virtual address by pointer but never writes
there, and users (e.g. srp) seem to assume this and ignore the value
returned. This patch cleans up the API to get the VA by value, and updates
all users.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This file contains the low level interaction with the RDMA CM
and the IB verbs, where iSER is consumer of both.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>