Use round_jiffies() to align ehca's 1-second timer with other timers
and potentially save power by sleeping cores for longer.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
By default, the responder_resources parameter is set to that received
in a connection request. The passive side may override this value
when accepting the connection. Use the value provided by the passive
side when transitioning the QP to RTR state, rather than the value
given in the connect request. Without this change, the RTR transition
may fail if the passive side supports fewer responder_resources than
that in the request.
For code consistency and to protect against QP destruction, restructure
overriding initiator_depth to match how responder_resources is set.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The original QHT7040 had significant performance issues so there was an
additional check in the driver for a newer serial number. Support for
the small quantities of that board shipped has been dropped, so this
patch removes the special checks to simplify the code.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Different chips have different width interrupt status registers, so add
a flag and accessor function to decide which width register read to use.
Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The 6110 had a bug that caused some registers to be swapped; it was
fixed for the 7220 (and didn't affect the 6120 because it had fewer
registers). This adds a flag and related code to handle that, and
includes some minor cleanups in the same area.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
The number of configured ports for the 7220 changes the number of eager
TIDs available per port, for all but port 0 (kernel port) which remains
constant, so add a field to give port0 count separate from the portdata
structure.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
User registers have different alignments on different chips (4KB on
older, 64KB on 7220). Allow mapping the user registers on kernels with
page sizes up to 64K.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Various hardware counters are exported via the ipath file system (since
it is binary data). The old file format was very dependent on the HW
offsets for these registers. Newer HCA chips can have different
counters at different offsets. This patch adds a level of indirection
to make the file format consistent across HCAs.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for QLogic HCAs which have hardware performance sampling
registers for PortSamplesControl and PortSamplesResult MADs.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When you have multiple targets, it gets really confusing when you try
to track down who did a reset when there is no identifying information
in the log message, especially when the same extension ID is mapped
through two different local IB ports. So, add an identifier that can
be used to track back to which local IB port/remote target pair is the
one having problems.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG
entries for SRQs. Currently IPoIB/CM implicitly assumes all HCAs will
support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This
patch removes that restriction by limiting the maximum MTU in connected
mode to what the maximum number of SRQ SG entries allows.
This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728>
Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
By default, the SCSI mid-layer seems to send down 512KB requests
(sg_tablesize = 256), with some requests occasionally combined. By
allowing the mid-layer to chain requests, we can easily grow to 1024KB
or larger -- I've tested 4096KB I/O requests with no problems.
I looked through the DMA paths on the hardware drivers to ensure they
could take advantage of the SG chaining, and it seems that every one
except ipath uses the system's DMA routines, which have been converted
to handle chaining. ipath looks like it should be OK, but I have no
way to test it.
Signed-off-by: David Dillow <dillowda@ornl.gov>
[ Tested on ipath. - Roland ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current SRP initiator will send requests even if it has no credits
available. The results of sending extra requests are vendor specific,
but on some devices, overrunning credits will cost 85% of peak
performance -- e.g. 100 MB/s vs 720 MB/s. Other devices may just drop
the requests.
This patch will tell the SCSI midlayer to queue requests if there are
fewer than two credits remaining, and will not issue a task management
request if there are no credits remaining. The mid-layer will retry
the queued command once an outstanding command completes.
The patch also removes the unlikely() in __srp_get_tx_iu(), as it is
not at all unlikely to hit this limit under heavy load.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs. The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised. This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.
The next step will be to add a way to configure the scope for an IPoIB
interface.
Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch moves some arrays that were defined per-device to be
variables defined in the per context data structure, thus avoiding extra
kzalloc() calls.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In preparation for upcoming chips that have different values for
INFINIPATH_R_PORTENABLE_SHIFT, INFINIPATH_R_INTRAVAIL_SHIFT,
INFINIPATH_R_TAILUPD_SHIFT, and portcfg_shift, remove the shared
#defines and use device-specific variables instead.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
kreceive is now portdata * instead of devdata * and other kreceive
related cleanups....
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove an unused parameter and fix up the comment.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a couple of minor problems with RNR NAK handling:
- The insertion sort was causing extra delay when inserting ahead
vs. behind an existing entry on the list.
- A resend of a first packet of a message which is still not ready,
needs another RNR NAK (i.e., it was suppressed when it shouldn't).
- Also, the resend tasklet doesn't need to be woken up unless the
ACK/NAK actually indicates progress has been made.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch allows ehca to forward event client-reregister-required to
registered clients. One such event is generated by a switch eg. after
its reboot.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Rather than byte-swapping cqe->g_mlpath_rqpn each time we extract a
field from it, byte-swap it once into a temporary variable. This
results in smaller, better code -- eg, on 32-bit x86:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function old new delta
mlx4_ib_poll_cq 1188 1183 -5
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove MSI support from the mthca driver, as scheduled. There is no
reason to use MSI instead of MSI-X, since MSI-X performs better. No
one has spoken up since MSI support was deprecated in commit f6be6fbe
("IB/mthca: Schedule MSI support for removal"), so apparently the MSI
support is unused.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This is based on user feedback from Doug Ledford at RedHat:
Events that occur on an rdma_cm_id are reported to userspace through an
event channel. Connection request events are reported on the event
channel associated with the listen. When the connection is accepted, a
new rdma_cm_id is created and automatically uses the listen event
channel. This is suboptimal where the user only wants listen events on
that channel.
Additionally, it may be desirable to have events related to connection
establishment use a different event channel than those related to
already established connections.
Allow the user to migrate an rdma_cm_id between event channels. All
pending events associated with the rdma_cm_id are moved to the new event
channel.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Enable conn_id remove on the passive side after connection
establishment. This corrects an issue where the IB driver can't be
unloaded after running applications over RDS. The 'dev_remove' counter
does not reach 0 for established connections on the passive side.
This problem is limited to device removal, and only occurs on the
passive side if there are established connections.
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In cancel_mads(), MADs are moved from the wait_list and local_list
to a cancel_list for processing. However, the structures on these two
lists are not the same. The wait_list references struct
ib_mad_send_wr_private, but local_list references struct
ib_mad_local_private. Cancel_mads() treats all items moved to the
cancel_list as struct ib_mad_send_wr_private. This leads to a system
crash when requests are moved from the local_list to the cancel_list.
Fix this by leaving local_list alone. All requests on the local_list
have completed are just awaiting processing by a queued worker thread.
Bug (crash) reported by Dotan Barak <dotanb@dev.mellanox.co.il>.
Problem with local_list access reported by Robert Reynolds
<rreynolds@opengridcomputing.com>.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add performance/debug counters to track sent/received messages, retries,
and duplicates. Counters are tracked per CM message type, per port.
The counters are always enabled, so intrusive state tracking is not done.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
To allow ULPs to tune timeout values and capture retry statistics,
report the number of times that a mad send operation was retried.
For RMPP mads, report the total number of times that the any portion
(send window) of the send operation was retried.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
P_key changes can invalidate multicast groups. Report errors on all
multicast groups affected by a pkey change.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add the work completion error code to the QP error debug output.
This makes it easier to determine the cause of the error.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
An internal code review found the comment here lacking -- update it with
more specifics of how and why the rmb() is there.
Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
During a code review, someone noticed the comments didn't match the code.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The gen2_basic tests check for the errno value when a CQ is resized
smaller than the number of outstanding completions queue on the CQ.
This patch changes ib_ipath to return EINVAL which is what ib_mthca
returns and what gen2_basic expects.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Code review pointed out that the locking around uses of ipath_sendctrl
and kr_sendctrl were, in several places, incorrect and/or inconsistent.
Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
At one point in time there was code to allow a user process to
wait for a send buffer if none were available. This feature was
never used and most of the code was removed. This removes
some missed unused code.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The 5.0 firmware now supports translating sgls in recv work requests,
so remove the host driver logic currently doing the translation.
Note: this change requires 5.0 firmware.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Currently the call into cxgb3 to get the driver info is not serialized.
The iw_cxgb3 module needs to hold the rtnl_lock around the ethtool ops
call like dev_ioctl() does.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Acked-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Tested-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The local loopback of an outgoing DR SMP response is limited to those
that originate at the driver specific SMA implementation during the
driver specific process_mad() function. This patch enables a
returning DR SMP originating in userspace (or elsewhere) to be
delivered to the local managment stack. In this specific case the
driver process_mad() function does not consume or process the MAD, so
a reponse mad has not be created and the original MAD must manually be
copied to the MAD buffer that is to be handed off to the local agent.
Signed-off-by: Steve Welch <swelch@systemfabricworks.com>
Acked-by: Hal Rosenstock <hal@xsigo.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch is in response to reviewing a patch to the core MAD
processing which fixes loopback of directed route packets to/from user
level MAD agents. This change enables the core code to work for
ib_ipath by fixing the return code from the ipath process_mad method.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In ib_mad_recv_done_handler(), the response pointer is checked for
NULL after allocating it. It is then checked again in the local
process_mad() path but there is no possibility of it changing in
between.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Acked-by: Hal Rosenstock <hal@xsigo.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Set the initiator depth and responder resources to the device max
values for new connect request events in the iWARP connection manager.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Improve interrupt handler cache footprint by noinline'ing error
functions that are rarely called.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared
receive queues). The current IPoIB connected mode support only works
on devices that support SRQs.
Fix this by adding support for using the receive queue of each
connected mode receive QP. The disadvantage of this compared to using
an SRQ is that it means a full queue of receives must be posted for
each remote connected mode peer, which means that total memory usage
is potentially much higher than when using SRQs. To manage this, add
a new module parameter "max_nonsrq_conn_qp" that limits the number of
connections allowed per interface.
The rest of the changes are fairly straightforward: we use a table of
struct ipoib_cm_rx to hold all the active connections, and put the
table index of the connection in the high bits of receive WR IDs.
This is needed because we cannot rely on the struct ib_wc.qp field for
non-SRQ receive completions. Most of the rest of the changes just
test whether or not an SRQ is available, and post receives or find
received packets in the right place depending on the answer.
Cleaning up dead connections actually becomes simpler, because we do
not have to do the "last WQE reached" dance that is required to
destroy QPs attached to an SRQ. We just move the QP to the error
state and wait for all pending receives to be flushed.
Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
[ Completely rewritten and split up, based on Pradeep's work. Several
bugs fixed and no doubt several bugs introduced. - Roland ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Factor out the code for going through the rx_reap list of struct
ipoib_cm_rx and freeing each one. This consolidates the code
duplicated between ipoib_cm_dev_stop() and ipoib_cm_rx_reap() and
reduces the risk of error when adding additional accounting.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Factor out the code to create an SRQ and allocate the receive ring in
ipoib_cm_dev_init() into a new function ipoib_cm_create_srq(). This
will make the code neater when support for devices that don't implement
SRQs is added.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Factor out the code to unmap/free skbs and free the receive ring in
ipoib_cm_dev_cleanup() into a new function ipoib_cm_free_rx_ring().
This function will be called from a couple of other places when
support for devices that don't implement SRQs is added.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 23b9c1ab ("Infiniband: make ipath driver use default driver
groups.") introduced a bug in the ipath driver where
ipath_device_create_group() fell through into the error path, even on
success, which meant that the sysfs groups it created would always get
removed right away. This made ipath_device_remove_group() hit the
BUG_ON() in sysfs_remove_group() when it tried to remove those groups a
second time.
Correct the return path so that the groups stick around until they are
supposed to be cleaned up.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make the ipath driver use the new driver functions so that it does not
touch the sysfs portion of the driver structure.
We also remove the redundant symlink from the device back to the driver,
as it is already in the sysfs tree. Any userspace tools should be using
the standard symlink, not some driver specific one.
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: Arthur Jones <arthur.jones@qlogic.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <mshefty@ichips.intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
eh_device_reset_handler was already added to scsi_host_template
in iscsi_tcp, and is now added also for iscsi_iser.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This fixes a small bug in ipath_ud_rcv()'s handling of UD messages
with immediate data. We need to test whether immediate data is
present and update the header size accordingly *before* testing the
packet size from the header against the actual received length.
Otherwise the wrong header size will be used and all messages with
immediate data will be dropped.
This bug keeps MVAPICH-UD and HP MPI from working at all on ipath devices.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Convert xmit to iscsi chunks.
from michaelc@cs.wisc.edu:
Bug fixes, more digest integration, sg chaining conversion and other
sg wrapper changes, coding style sync up, and removal of io fields,
like pdu_sent, that are not needed.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
During root boot and shutdown the target could send us nops.
At this time iscsid cannot be running, so the target will drop
the session and the boot or shutdown will hang.
To handle this and allow us to better control when to check the network
this patch moves the nop handling to the kernel.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
There is not need to block the session during logout. Since
we are going to fail the commands that were blocked just fail them
immediately instead.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
- The default initialization of hdr_max is the minimum -
sizeof(struct iscsi_cmd) - Once this patch goes into iser the default
initialization at libiscsi can be removed.
- This is not yet full support for AHSs at iser end. But it should be easy.
Just allocate more space at iser_desc right after iscsi_hdr. Than
at transmission time use ctask->hdr_len to retrieve the total
size of all iscsi pdu headers. See previous patch at iscsi_tcp.[ch]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch adds logical unit reset support. This should work for ib_iser,
but I have not finished testing that driver so it is not hooked in yet.
This patch also temporarily reverts the iscsi_tcp r2t write out patch.
That code is completely rewritten in this patchset.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The documented call sequence for removing a host is to call the
transport xxx_remove_host() prior to scsi_remove_host(). The SRP
transport used to crash when that order was followed, but as it is now
fixed, use the documented order.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix the value of pkey_index in completions to get a valid value for
GSI QPs. Without this fix, incoming GSI packets on port 2 get an
invalid P_Key index in the completion, which prevents the MAD layer
from sending back a response, which can make the second port of
ConnectX HCAs completely useless.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a missing call to srp_remove_host() in srp_remove_one() so that we
don't leak SRP transport class list entries.
Tested-by: David Dillow <dillowda@ornl.gov>
Acked-by: FUJITA Tomonori <tomof@acm.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Several pSeries firmware versions share a rare locking issue in the
HCA-related hCalls. Check for a feature flag that indicates the issue
being fixed and serialize all HCA hCalls if not.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Firmware would round up the number of SGEs to four, because the WQE
structure holds four SGEs. For SRQ, only three are supported, so return
a fixed value instead.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The formula would yield -1 if the path is faster than the link, which
is wrong in a bad way (max throttling). Clamp to 0, which is the
correct value.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a port goes down, ipoib_ib_dev_down() is invoked -- which flushes
the mcasts (clearing priv->broadcast) and clearing the path record
cache. If ipoib_start_xmit() is then invoked (before the broadcast
group is rejoined), a kernel oops results from attempting to access
priv->broadcast, which is still unset.
Returning NULL from path_rec_create() if priv->broadcast is NULL is a
harmless way of bypassing the problem -- the offending packet is
simply discarded "without prejudice."
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
While adding sg chaining support to iSER, a "for" loop was replaced
with a "for_each_sg" loop. The "for" loop included the incrementation
of 2 variables. Only one of them is incremented in the current
"for_each_sg" loop. This caused iSER to think that all data is
unaligned, and all data was copied to aligned buffers.
This patch increments the missing counter inside the "for_each_sg"
loop whenever necessary.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Wrong choice of port number caused modify_qp() to fail -- fixed.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The error codes for ib_post_send(), ib_post_recv(), and ib_post_srq_recv()
were inconsistent. Use EINVAL for too many SGEs and ENOMEM for too many
WRs.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The wrong offset was being returned to libipathverbs so that when
ibv_modify_srq() calls mmap(), it always fails.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes the code which frees the partially allocated QP
resources if there was an error while creating the QP. In particular,
the QPN wasn't deallocated and the QP wasn't removed from the hash
table.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The wrong offset was being returned to libipathverbs so that when
ibv_resize_cq() calls mmap(), it always fails.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The device attribute max_qp_init_rd_atom is not getting set in cxgb3's
query_device method. Version 1.0.4 of librdmacm now validates the
user's requested initiator and responder resources against the max
supported by the device. Since iw_cxgb3 wasn't setting this attribute
(and it defaulted to 0), all rdma_connect()s fail if there are
initiator resources requested by the app. Fix this by setting the
correct value in iwch_query_device().
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IPD (inter-packet delay) formula was a little off and assumed a
fixed physical link rate; fix the formula and query the actual
physical link rate, now that we can get it. Also, refactor the
calculation into a common function ehca_calc_ipd() and use that
instead of duplicating code.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Newer firmware versions return physical port information to the
partition, so hand that information to the consumer if it's present.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When an ACK is received, the QP is removed from the timeout list and
then if there are still pending send WQEs, the QP is put back on the
timeout list. It is possible that another post send has put the QP on
the timeout list thus, a check needs to be made before trying to do it
again or the list is corrupted.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/fmr_pool: Stop ib_fmr threads from contributing to load average
IB/ipath: Fix incorrect use of sizeof on msg buffer (function argument)
IB/ipath: Limit length checksummed in eeprom
IB/ipath: Fix a race where s_last is updated without lock held
IB/mlx4: Lock SQ lock in mlx4_ib_post_send()
IPoIB/cm: Fix receive QP cleanup
I noticed my machine was at a constant load average of 1. This was
because ib_create_fmr_pool calls kthread_create but does not
immediately wake the thread up.
Change to using kthread_run so we enter ib_fmr_cleanup_thread(), set
TASK_INTERRUPTIBLE, then go to sleep.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Inside a function declared as
void foo(char bar[512])
the value of sizeof bar is the size of a pointer, not 512. So avoid
constructions like this by passing the size explicitly.
Also reduce the size of the buffer to 128 bytes (512 was overly generous).
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The small eeprom that holds the GUID etc. contains a data-length, but if
the actual eeprom is new or has been erased, that byte will be 0xFF,
which is greater than the maximum physical length of the eeprom, and
more importantly greater than the length of the buffer we vmalloc'd.
Sanity-check the length to avoid the possbility of reading past end of
buffer.
Signed-off-by: Michael Albaugh <Michael.Albaugh@Qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There is a small window where a send work queue entry could be
overwritten by ib_post_send() because s_last is updated before the
entry is read.
This patch closes the window by acquiring the lock and updating
the last send work queue entry index after reading the wr_id.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Because of a typo, mlx4_ib_post_send() takes the same lock rq.lock as
mlx4_ib_post_recv(). Correct the code so the intended sq.lock is
taken when posting a send.
Noticed by Yossi Leybovitch and pointed out by Jack Morgenstein from
Mellanox.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 1b524963 ("IPoIB/cm: Use common CQ for CM send completions")
changed how the high-order bits of work request IDs were used, which
had the effect that IPOIB_CM_RX_DRAIN_WRID was no longer handled as a
connected mode receive completion. This leads to the messages
ib1: cm send completion event with wrid 1073741823 (> 64)
ib1: RX drain timing out
when an interface with connected mode QPs is brought down. Fix this
by making sure that both IPOIB_OP_CM and IPOIB_OP_RECV are set in
IPOIB_CM_RX_DRAIN_WRID.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.
Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
IPoIB/cm: Use common CQ for CM send completions
IB/uverbs: Fix checking of userspace object ownership
IB/mlx4: Sanity check userspace send queue sizes
IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
IB/ehca: Enable large page MRs by default
IB/ehca: Change meaning of hca_cap_mr_pgsize
IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
IB/ehca: Fix masking error in {,re}reg_phys_mr()
IB/ehca: Supply QP token for SRQ base QPs
IPoIB: Use round_jiffies() for ah_reap_task
RDMA/cma: Fix deadlock destroying listen requests
RDMA/cma: Add locking around QP accesses
IB/mthca: Avoid alignment traps when writing doorbells
mlx4_core: Kill mlx4_write64_raw()
More fallout from sg_page changes:
drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_set_pagebuf_user1':
drivers/infiniband/hw/ehca/ehca_mrmw.c:1779: error: 'struct scatterlist' has no member named 'page'
drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_check_kpages_per_ate':
drivers/infiniband/hw/ehca/ehca_mrmw.c:1835: error: 'struct scatterlist' has no member named 'page'
drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_set_pagebuf_user2':
drivers/infiniband/hw/ehca/ehca_mrmw.c:1870: error: 'struct scatterlist' has no member named 'page'
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Use the same CQ for CM send completions as for all other IPoIB
completions. This means all completions are processed via the same
NAPI polling routine. This should help reduce the number of
interrupts for bi-directional traffic (such as TCP) and fixes "driver
is hogging interrupts" errors reported for IPoIB send side, e.g.
<https://bugs.openfabrics.org/show_bug.cgi?id=508>
To do this, keep a per-interface counter of outstanding send WRs, and
stop the interface when this counter reaches the send queue size to
avoid CQ overruns.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 9ead190b ("IB/uverbs: Don't serialize with ib_uverbs_idr_mutex")
rewrote how userspace objects are looked up in the uverbs module's
idrs, and introduced a severe bug in the process: there is no checking
that an operation is being performed by the right process any more.
Fix this by adding the missing check of uobj->context in __idr_get_uobj().
Apparently everyone is being very careful to only touch their own
objects, because this bug was introduced in June 2006 in 2.6.18, and
has gone undetected until now.
Cc: stable <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There is a justifying patch for Stephen's patches. Stephen's patches
disallows using a port range of one single port and brakes the meaning
of the 'remaining' variable, in some places it has different meaning.
My patch gives back the sense of 'remaining' variable. It should mean
how many ports are remaining and nothing else. Also my patch allows
using a single port.
I sure we must be able to use mentioned port range, this does not
restricted by documentation and does not brake current behavior.
usefull links:
Patches posted by Stephen Hemminger
http://marc.info/?l=linux-netdev&m=119206106218187&w=2http://marc.info/?l=linux-netdev&m=119206109918235&w=2
Andrew Morton's comment
http://marc.info/?l=linux-kernel&m=119248225007737&w=2
1. Allows using a port range of one single port.
2. Gives back sense of 'remaining' variable.
Signed-off-by: Anton Arapov <aarapov@redhat.com>
Acked-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Found these while looking at printk uses.
Add missing newlines to dev_<level> uses
Add missing KERN_<level> prefixes to multiline dev_<level>s
Fixed a wierd->weird spelling typo
Added a newline to a printk
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Smart <James.Smart@Emulex.Com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add sanity checks to send queue sizes passed in from userspace. The
minimum sq stride value below is taken from the MT25408 PRM (section
11.10, Table 306, log_sq_stride definition).
Without this check, userspace can submit arbitrarily large/small
values for the number of WQEs and the stride, which can crash the
kernel.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>