Commit Graph

2247 Commits

Author SHA1 Message Date
Roland Dreier 7e2c232854 Merge branches 'ipoib', 'mlx4' and 'nes' into for-linus 2008-09-16 11:57:52 -07:00
Yossi Etigin e8224e4b80 IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
ipoib_stop().  We avoid it by scheduling the piece of code that takes
the lock on ipoib_workqueue instead of executing it directly.  This
works because we only flush the ipoib_workqueue with the RTNL not held.

The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
which calls ipoib_mcast_leave(). The latter calls
ib_sa_free_multicast(), and this waits until the multicast completion
handler finishes.  This handler is ipoib_mcast_join_complete(), which
waits for the rtnl_lock(), which was already taken by ipoib_stop().

This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
RTNL in ipoib_stop()").

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-16 11:57:45 -07:00
Faisal Latif d7ffd5076d RDMA/nes: Fix client side QP destroy
Fix QP not being destroyed properly on the client, which leads to
userspace programs hanging on exit.  This is a missing chunk from the
connection management rewrite in commit 6492cdf3 ("RDMA/nes: CM
connection setup/teardown rework").

Signed-off-by: Faisal Latif <flatif@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-16 11:56:26 -07:00
Vladimir Sokolovsky 29bdc88384 IB/mlx4: Fix up fast register page list format
Byte swap the addresses in the page list for fast register work requests
to big endian to match what the HCA expectx.  Also, the addresses must
have the "present" bit set so that the HCA knows it can access them.
Otherwise the HCA will fault the first time it accesses the memory
region.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-15 14:25:23 -07:00
Vladimir Sokolovsky 4c246edd25 IB/mlx4: Actually return L_Key and R_Key for fast register MRs
Initialize the L_Key and R_Key for memory regions returned from
mlx4_ib_alloc_fast_reg_mr().  Otherwise callers just get garbage for
the memory keys and can't do anything useful with these MRs.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-27 14:40:38 -07:00
Adrian Bunk 7a8fc9b248 removed unused #include <linux/version.h>'s
This patch lets the files using linux/version.h match the files that
#include it.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-23 12:14:12 -07:00
Roland Dreier 45dd75d83c Merge branches 'ipath' and 'ipoib' into for-linus 2008-08-19 15:01:45 -07:00
Roland Dreier a77a57a1a2 IPoIB: Fix deadlock on RTNL in ipoib_stop()
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue.  However, ipoib_stop() (which is run
inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
if the join task is pending.

Fix this by simply not flushing the workqueue from ipoib_stop().  It
turns out that we really don't care about workqueue tasks running
during or after ipoib_stop(), as long as we make sure to flush the
workqueue before unregistering a netdev.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-19 15:01:32 -07:00
Dave Olson 24babadec0 IB/ipath: Fix incorrect check for max physical address in TID
The check for max physical address was incorrect, thus limiting the
range of allowed physical addresses.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-15 11:25:20 -07:00
Ralph Campbell 7ec01ff950 IB/ipath: Fix lost UD send work request
If a UD QP has some work requests queued to be sent by the DMA engine
followed by a local loopback work request, we have to wait for the
previous work requests to finish or the completion for the local
loopback work request would be generated out of order.  The problem
was that the work request queue pointer was already updated so that
the request would not be processed when the DMA queue drained.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-15 11:23:47 -07:00
Roland Dreier 3a3eae0d66 Merge branches 'ehca' and 'ipoib' into for-linus 2008-08-12 13:52:54 -07:00
Alexander Schmidt 6773f079b7 IB/ehca: Discard double CQE for one WR
Under rare circumstances, the ehca hardware might erroneously generate
two CQEs for the same WQE, which is not compliant to the IB spec and
will cause unpredictable errors like memory being freed twice. To
avoid this problem, the driver needs to detect the second CQE and
discard it.

For this purpose, introduce an array holding as many elements as the
SQ of the QP, called sq_map. Each sq_map entry stores a "reported"
flag for one WQE in the SQ. When a work request is posted to the SQ,
the respective "reported" flag is set to zero. After the arrival of a
CQE, the flag is set to 1, which allows to detect the occurence of a
second CQE.

The mapping between WQE / CQE and the corresponding sq_map element is
implemented by replacing the lowest 16 Bits of the wr_id with the
index in the queue map. The original 16 Bits are stored in the sq_map
entry and are restored when the CQE is passed to the application.

Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-12 11:34:59 -07:00
Alexander Schmidt 129a10fb81 IB/ehca: Check idr_find() return value
The idr_find() function may fail when trying to get the QP that is
associated with a CQE, e.g. when a QP has been destroyed between the
generation of a CQE and the poll request for it.  In consequence, the
return value of idr_find() must be checked and the CQE must be
discarded when the QP cannot be found.

Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-12 11:34:59 -07:00
Alexander Schmidt 17c2b53adb IB/ehca: Repoll CQ on invalid opcode
When the ehca driver detects an invalid opcode in a CQE, it currently
passes the CQE to the application and returns with success. This patch
changes the CQE handling to discard CQEs with invalid opcodes and to
continue reading the next CQE from the CQ.

Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-12 11:34:58 -07:00
Alexander Schmidt 6c02eed930 IB/ehca: Rename goto label in ehca_poll_cq_one()
Rename the "poll_cq_one_read_cqe" goto label to what it actually does,
namely "repoll".

Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-12 11:34:58 -07:00
Alexander Schmidt 51ad241af4 IB/ehca: Update qp_state on cached modify_qp()
Since the introduction of the port auto-detect mode for ehca, calls to
modify_qp() may be cached in the device driver when the ports are not
activated yet. When a modify_qp() call is cached, the qp state remains
untouched until the port is activated, which will leave the qp in the
reset state. In the reset state, however, it is not allowed to post SQ
WQEs, which confuses applications like ib_mad.

The solution for this problem is to immediately set the qp state as
requested by modify_qp(), even when the call is cached.

Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-12 11:34:58 -07:00
David J. Wilder b1404069f6 IPoIB/cm: Use vmalloc() to allocate rx_rings
There are users that are running UDP applications that require a large
receive queue size in order to get good performance.  To prevent
allocation failures for rx_rings when using non-SRQ mode and large
recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to
alocate rx_rings.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-08 15:51:29 -07:00
Linus Torvalds 273b257839 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mad: Test ib_create_send_mad() return with IS_ERR(), not == NULL
  IB/mlx4: Allow 4K messages for UD QPs
  mlx4_core: Add ethernet fields to CQE struct
  IB/ipath: Fix printk format warnings
  RDMA/cxgb3: Fix deadlock initializing iw_cxgb3 device
  RDMA/cxgb3: Fix up MW access rights
  RDMA/cxgb3: Fix QP capabilities
  RDMA/cma: Remove padding arrays by using struct sockaddr_storage
  IB/ipath: Use unsigned long for irq flags
  IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr()
2008-08-07 18:14:07 -07:00
Roland Dreier 06a91a02e9 Merge branches 'cma', 'cxgb3', 'ipath', 'ipoib', 'mad' and 'mlx4' into for-linus 2008-08-07 14:12:03 -07:00
Julien Brunel cd55ef5a10 IB/mad: Test ib_create_send_mad() return with IS_ERR(), not == NULL
In case of error, the function ib_create_send_mad() returns an ERR
pointer, but never returns a NULL pointer.  So testing the return
value for error should be done with IS_ERR, not by comparing with
NULL.

A simplified version of the semantic patch that makes this change is
as follows:

(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@correct_null_test@
expression x,E;
statement S1, S2;
@@
x = ib_create_send_mad(...)
<... when != x = E
if (
(
- x@p2 != NULL
+ ! IS_ERR ( x )
|
- x@p2 == NULL
+ IS_ERR( x )
)
 )
S1
else S2
...>
? x = E;
// </smpl>

Signed-off-by: Julien Brunel <brunel@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-07 14:11:56 -07:00
Alex Naslednikov 6e0d733d92 IB/mlx4: Allow 4K messages for UD QPs
Current code limits the max message size to 2K for UD QPs, while MTU
might be as big as 4K.  This patch sets the maximum message size to
4K, which is needed for UD to work correctly on fabrics with a 4K MTU.

Signed-off-by: Alex Naslednikov <xalex@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-07 14:06:50 -07:00
Yevgeny Petrilin f780a9f119 mlx4_core: Add ethernet fields to CQE struct
Add ethernet-related fields to struct mlx4_cqe so that the mlx4_en
ethernet NIC driver can share the same definition.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-06 20:14:06 -07:00
Alexander Beregalov 70117b9e86 IB/ipath: Fix printk format warnings
ipath_driver.c:1260: warning: format '%Lx' expects type 'long long unsigned int', but argument 6 has type 'long unsigned int'
    ipath_driver.c:1459: warning: format '%Lx' expects type 'long long unsigned int', but argument 4 has type 'u64'
    ipath_intr.c:358: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64'
    ipath_intr.c:358: warning: format '%Lu' expects type 'long long unsigned int', but argument 6 has type 'u64'
    ipath_intr.c:1119: warning: format '%Lx' expects type 'long long unsigned int', but argument 5 has type 'u64'
    ipath_intr.c:1119: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64'
    ipath_intr.c:1123: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64'
    ipath_intr.c:1130: warning: format '%Lx' expects type 'long long unsigned int', but argument 4 has type 'u64'
    ipath_iba7220.c:1032: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
    ipath_iba7220.c:1045: warning: format '%llX' expects type 'long long unsigned int', but argument 3 has type 'u64'
    ipath_iba7220.c:2506: warning: format '%Lu' expects type 'long long unsigned int', but argument 4 has type 'u64'

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-04 11:12:18 -07:00
Steve Wise be43324d8b RDMA/cxgb3: Fix deadlock initializing iw_cxgb3 device
Running 'ifconfig up' on the cxgb3 interface with iw_cxgb3 loaded
causes a deadlock.  The rtnl lock is already held in this path.  The
function fw_supports_fastreg() was introduced in 2.6.27 to
conditionally set the IB_DEVICE_MEM_MGT_EXTENSIONS bit iff the
firmware was at 7.0 or greater, and this function also acquires the
rtnl lock and which thus causes a deadlock.  Further, if iw_cxgb3 is
loaded _after_ the nic interface is brought up, then the deadlock does
not occur and therefore fw_supports_fastreg() does need to grab the
rtnl lock in that path.

It turns out this code is all useless anyway.  The low level driver
will NOT allow the open if the firmware isn't 7.0, so iw_cxgb3 can
always set the MEM_MGT_EXTENSIONS bit.  Simplify...

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-04 11:08:37 -07:00
Steve Wise 1c355a6e80 RDMA/cxgb3: Fix up MW access rights
- MWs don't have local read/write permissions.
- Set the MW_BIND enabled bit if a MR has MW_BIND access.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-04 11:05:43 -07:00
Steve Wise 5f0f66b022 RDMA/cxgb3: Fix QP capabilities
- Set the stag0 and fastreg capability bits only for kernel qps.
- QP_PRIV flag is no longer used, so don't set it.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-04 11:04:42 -07:00
Roland Dreier 3f44675439 RDMA/cma: Remove padding arrays by using struct sockaddr_storage
There are a few places where the RDMA CM code handles IPv6 by doing

	struct sockaddr		addr;
	u8			pad[sizeof(struct sockaddr_in6) -
				    sizeof(struct sockaddr)];

This is fragile and ugly; handle this in a better way with just

	struct sockaddr_storage	addr;

[ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
  switch to struct sockaddr_storage and get rid of padding arrays in
  struct rdma_addr. ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-04 11:02:14 -07:00
Stephen Rothwell b8b572e101 powerpc: Move include files to arch/powerpc/include/asm
from include/asm-powerpc.  This is the result of a

mkdir arch/powerpc/include/asm
git mv include/asm-powerpc/* arch/powerpc/include/asm

Followed by a few documentation/comment fixups and a couple of places
where <asm-powepc/...> was being used explicitly.  Of the latter only
one was outside the arch code and it is a driver only built for powerpc.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-04 12:02:00 +10:00
Niels de Vos 61a2d07d3f Remove newline from the description of module parameters
Some module parameters with only one line have the '\n' at the end of the
description.  This is not needed nor wanted as after the description the
type (i.e.  int) is followed by a newline.

Some modules contain a multi-line description, these are not affected
by this patch.

Signed-off-by: Niels de Vos <niels.devos@wincor-nixdorf.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Ed L. Cashin <ecashin@coraid.com>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Roland Dreier <rolandd@cisco.com>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-01 12:46:41 -07:00
Vegard Nossum 52fd8ca6ad IB/ipath: Use unsigned long for irq flags
A few functions in the ipath driver incorrectly use unsigned int to
hold irq flags for spin_lock_irqsave().

This patch was generated using the Coccinelle framework with the
following semantic patch:

The semantic patch I used was this:

@@
expression lock;
identifier flags;
expression subclass;
@@

- unsigned int flags;
+ unsigned long flags;

...

<+...

(
 spin_lock_irqsave(lock, flags)
|
 _spin_lock_irqsave(lock)
|
 spin_unlock_irqrestore(lock, flags)
|
 _spin_unlock_irqrestore(lock, flags)
|
 read_lock_irqsave(lock, flags)
|
 _read_lock_irqsave(lock)
|
 read_unlock_irqrestore(lock, flags)
|
 _read_unlock_irqrestore(lock, flags)
|
 write_lock_irqsave(lock, flags)
|
 _write_lock_irqsave(lock)
|
 write_unlock_irqrestore(lock, flags)
|
 _write_unlock_irqrestore(lock, flags)
|
 spin_lock_irqsave_nested(lock, flags, subclass)
|
 _spin_lock_irqsave_nested(lock, subclass)
|
 spin_unlock_irqrestore(lock, flags)
|
 _spin_unlock_irqrestore(lock, flags)
|
 _raw_spin_lock_flags(lock, flags)
|
 __raw_spin_lock_flags(lock, flags)
)

...+>

Cc: Ralph Campbell <ralph.campbell@qlogic.com>
Cc: Julia Lawall <julia@diku.dk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-30 09:29:06 -07:00
Roland Dreier e08198169e IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr()
wr->sg_list should be set to the sge pointer passed in, not
priv->cm.rx_sge.

Reported-by: Hoang-Nam Nguyen <HNGUYEN@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-30 07:21:46 -07:00
Linus Torvalds 8be1a6d6c7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files
  mlx4_core: Add VLAN tag field to WQE control segment struct
  RDMA/nes: CM connection setup/teardown rework
  IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG
  IPoIB/cm: Connected mode is no longer EXPERIMENTAL
  RDMA/ucm: BKL is not needed for ib_ucm_open()
  RDMA/ucma: BKL is not needed for ucma_open()
2008-07-26 20:40:36 -07:00
Roland Dreier cc9969c967 Merge branches 'bkl-removal', 'ipoib', 'mlx4' and 'nes' into for-linus 2008-07-26 13:59:47 -07:00
FUJITA Tomonori 8d8bb39b9e dma-mapping: add the device argument to dma_mapping_error()
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:

This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread).  So I
CC'ed this to KVM camp.  Comments are appreciated.

A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
NULL, the system-wide dma_ops pointer is used as before.

If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging).  It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.

The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
device.  Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.

The first patch adds the device argument to dma_mapping_error.  The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.

This patch:

dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations.  So we can't have dma_mapping_ops per device.

Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
argument.

[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:03 -07:00
Jack Morgenstein 51a379d0c8 mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files
Update existing Mellanox copyright lines to 2008, and add such lines
to files where they are missing.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-25 10:32:52 -07:00
Faisal Latif 6492cdf3a2 RDMA/nes: CM connection setup/teardown rework
Major rework of CM connection setup/teardown.  We had a number of issues
with MPI applications not starting/terminating properly over time.
With these changes we were able to run longer on larger clusters.

* Remove memory allocation from nes_connect() and nes_cm_connect().
* Fix mini_cm_dec_refcnt_listen() when destroying listener.
* Remove unnecessary code from schedule_nes_timer() and nes_cm_timer_tick().
* Functionalize mini_cm_recv_pkt() and process_packet().
* Clean up cm_node->ref_count usage.
* Reuse skbs if available.

Signed-off-by: Faisal Latif <flatif@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:50:45 -07:00
Roland Dreier 9905922446 IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG
The help text for INFINIBAND_IPOIB_DEBUG refers to "ipoib_debugfs,"
which no longer exists.  Correct this to talk about the files under
debugfs that are really created.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:37:25 -07:00
Roland Dreier 99c3a5a9e3 IPoIB/cm: Connected mode is no longer EXPERIMENTAL
Connected mode is now tested and used by lots of people.  No need to
hide it under CONFIG_EXPERIMENTAL.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:37:25 -07:00
Roland Dreier 5ba18b186c RDMA/ucm: BKL is not needed for ib_ucm_open()
Remove explicit cycle_kernel_lock() call and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:36:59 -07:00
Roland Dreier f7a6117ee5 RDMA/ucma: BKL is not needed for ucma_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:36:59 -07:00
Linus Torvalds 5c402355ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  MAINTAINERS: Remove Glenn Streiff from NetEffect entry
  mlx4_core: Improve error message when not enough UAR pages are available
  IB/mlx4: Add support for memory management extensions and local DMA L_Key
  IB/mthca: Keep free count for MTT buddy allocator
  mlx4_core: Keep free count for MTT buddy allocator
  mlx4_code: Add missing FW status return code
  IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg
  mlx4_core: Add module parameter to enable QoS support
  RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes
  IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
  IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one()
  IB/ehca: Release mutex in error path of alloc_small_queue_page()
  IB/ehca: Use default value for Local CA ACK Delay if FW returns 0
  IB/ehca: Filter PATH_MIG events if QP was never armed
  IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event
  RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
  RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
2008-07-24 12:56:07 -07:00
Roland Dreier 2cc177364e Merge branches 'bkl-removal', 'cma', 'ehca', 'for-2.6.27', 'mlx4', 'mthca' and 'nes' into for-linus 2008-07-24 08:38:47 -07:00
Linus Torvalds 26dcce0fab Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
  NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
  cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
  NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
  NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
  cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
  cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
  cpumask: Provide a generic set of CPUMASK_ALLOC macros
  cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
  cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
  cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
  cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
  cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
  cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
  Revert "cpumask: introduce new APIs"
  cpumask: make for_each_cpu_mask a bit smaller
  net: Pass reference to cpumask variable in net/sunrpc/svc.c
  ...

Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
2008-07-23 18:37:44 -07:00
Roland Dreier 95d04f0735 IB/mlx4: Add support for memory management extensions and local DMA L_Key
Add support for the following operations to mlx4 when device firmware
supports them:

 - Send with invalidate and local invalidate send queue work requests;
 - Allocate/free fast register MRs;
 - Allocate/free fast register MR page lists;
 - Fast register MR send queue work requests;
 - Local DMA L_Key.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-23 08:12:26 -07:00
Roland Dreier e8bb4beb2b IB/mthca: Keep free count for MTT buddy allocator
MTT entries are allocated with a buddy allocator, which just keeps
bitmaps for each level of the buddy table.  However, all free space
starts out at the highest order, and small allocations start scanning
from the lowest order.  When the lowest order tables have no free
space, this can lead to scanning potentially millions of bits before
finding a free entry at a higher order.

We can avoid this by just keeping a count of how many free entries
each order has, and skipping the bitmap scan when an order is
completely empty.  This provides a nice performance boost for a
negligible increase in memory usage.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:20:05 -07:00
Roland Dreier 47b374752a IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg
Make the struct name consistent with other WQE segment struct types
defined in <linux/mlx4/qp.h>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:19:39 -07:00
Dotan Barak 1ca8d15619 RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes
Remove IB_ACCESS_LOCAL_WRITE from qp.qp_access_flags because this
attribute is only used to set remote permissions.

Signed-off-by: Dotan Barak <dotanba@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:34 -07:00
Or Gerlitz 01b3fc8b15 IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
Print the return code of ib_sa_path_rec_get() if it fails to help
debug errors.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:34 -07:00
Ralph Campbell 64b784b583 IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one()
If update_sm_ah() fails, it leaves the port's sm_ah as NULL.  Then if
the device or module is removed, ib_sa_remove_one() will dereference a
NULL pointer when it calls kref_put().  Fix this by testing if sm_ah
is NULL before dropping the reference.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:33 -07:00
Julia Lawall 1a867c33bb IB/ehca: Release mutex in error path of alloc_small_queue_page()
The pd->lock mutex is released on a successful return, so it should be
released on an error return as well.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
expression l;
@@

mutex_lock(l);
... when != mutex_unlock(l)
    when any
    when strict
(
if (...) { ... when != mutex_unlock(l)
+   mutex_unlock(l);
    return ...;
}
|
mutex_unlock(l);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:10 -07:00
Joachim Fenkes 593e4d4a05 IB/ehca: Use default value for Local CA ACK Delay if FW returns 0
Some firmware versions report a Local CA ACK Delay of 0.  In that
case, return a more sensible default value of 12 (-> 16 msec) instead.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:08 -07:00
Joachim Fenkes 5b673b71c8 IB/ehca: Filter PATH_MIG events if QP was never armed
Certain firmware versions sometimes cause spurious PATH_MIG events to
occur during QP creation.  Filter these events by making sure PATH_MIG
events are only handed down when they actually make sense (i.e. when
the QP has been armed at least once).

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:07 -07:00
Or Gerlitz 2f5de15128 IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event
Enhance iser to act upon notification on network stack changes that
make its RDMA connection unaligned with the link used by the stack for
the <src,dst> IPs used to establish the connection.

When RDMA_CM_EVENT_ADDR_CHANGE arrives, just disconnect the
connection, assuming that the user space iscsid daemon will reconnect,
and the new connection will be aligned with the IP stack.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:16:21 -07:00
Amir Vadai 38ca83a588 RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
Consumers that want to re-use their QPs in new connections need to
know when the QP has exited the timewait state.  Report the timewait
event through the rdma_cm.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:14:23 -07:00
Or Gerlitz dd5bdff83b RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm
consumers that wish to have their RDMA sessions always use the same
links (eg <hca/port>) as the IP stack does.  In the current code, this
does not happen when bonding is used and fail-over happened but the IB
link used by an already existing session is operating fine.

Use the netevent notification for sensing that a change has happened
in the IP stack, then scan the rdma-cm ID list to see if there is an
ID that is "misaligned" with respect to the IP stack, and deliver
RDMA_CM_EVENT_ADDR_CHANGE for this ID.  The consumer can act on the
event or just ignore it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:14:22 -07:00
Greg Kroah-Hartman 110cf374a8 infiniband: make cm_device use a struct device and not a kobject.
This object really should be a struct device, or at least contain a
pointer to a struct device, as it is trying to create a separate device
tree outside of the main device tree.  This patch fixes this problem.

It is needed for the class core rework that is being done in the driver
core.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:49 -07:00
Greg Kroah-Hartman d4c4196f24 infiniband: rename "device" to "ib_device" in cm_device
This pointer really is a struct ib_device, not a struct device, so name
it properly to help prevent confusion.

This makes the followon patch in this series much smaller and easier to
understand as well.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:49 -07:00
Greg Kroah-Hartman c76d3d28c3 device create: infiniband: convert device_create to device_create_drvdata
device_create() is race-prone, so use the race-free
device_create_drvdata() instead as device_create() is going away.

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:43 -07:00
Ingo Molnar eb6a12c242 Merge branch 'linus' into cpus4096-for-linus
Conflicts:

	net/sunrpc/svc.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-21 17:19:50 +02:00
Ingo Molnar bb2c018b09 Merge branch 'linus' into cpus4096
Conflicts:

	drivers/acpi/processor_throttling.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 22:00:54 +02:00
David S. Miller 49997d7515 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	Documentation/powerpc/booting-without-of.txt
	drivers/atm/Makefile
	drivers/net/fs_enet/fs_enet-main.c
	drivers/pci/pci-acpi.c
	net/8021q/vlan.c
	net/iucv/iucv.c
2008-07-18 02:39:39 -07:00
Linus Torvalds 89a93f2f48 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (102 commits)
  [SCSI] scsi_dh: fix kconfig related build errors
  [SCSI] sym53c8xx: Fix bogus sym_que_entry re-implementation of container_of
  [SCSI] scsi_cmnd.h: remove double inclusion of linux/blkdev.h
  [SCSI] make struct scsi_{host,target}_type static
  [SCSI] fix locking in host use of blk_plug_device()
  [SCSI] zfcp: Cleanup external header file
  [SCSI] zfcp: Cleanup code in zfcp_erp.c
  [SCSI] zfcp: zfcp_fsf cleanup.
  [SCSI] zfcp: consolidate sysfs things into one file.
  [SCSI] zfcp: Cleanup of code in zfcp_aux.c
  [SCSI] zfcp: Cleanup of code in zfcp_scsi.c
  [SCSI] zfcp: Move status accessors from zfcp to SCSI include file.
  [SCSI] zfcp: Small QDIO cleanups
  [SCSI] zfcp: Adapter reopen for large number of unsolicited status
  [SCSI] zfcp: Fix error checking for ELS ADISC requests
  [SCSI] zfcp: wait until adapter is finished with ERP during auto-port
  [SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver
  [SCSI] sg: Add target reset support
  [SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC
  [SCSI] sd: Move scsi_disk() accessor function to sd.h
  ...
2008-07-15 18:58:04 -07:00
Ingo Molnar 82638844d9 Merge branch 'linus' into cpus4096
Conflicts:

	arch/x86/xen/smp.c
	kernel/sched_rt.c
	net/iucv/iucv.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 00:29:07 +02:00
Ingo Molnar 6c9fcaf2ee Merge branch 'core/rcu' into core/rcu-for-linus 2008-07-15 21:10:12 +02:00
David S. Miller b9e4085768 netdev: Do not use TX lock to protect address lists.
Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:15:08 -07:00
David S. Miller e308a5d806 netdev: Add netdev->addr_list_lock protection.
Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:13:44 -07:00
Eli Cohen f507d28bff IB/mlx4: Use kzalloc() for new QPs so flags are initialized to 0
Current code uses kmalloc() and then just does a bitwise OR operation on
qp->flags in create_qp_common(), which means that qp->flags may
potentially have some unintended bits set.  This patch uses kzalloc()
and avoids further explicit clearing of structure members, which also
shrinks the code:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-65 (-65)
function                                     old     new   delta
create_qp_common                            2024    1959     -65

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00
Or Gerlitz de910bd921 RDMA/cma: Simplify locking needed for serialization of callbacks
The RDMA CM has some logic in place to make sure that callbacks on a
given CM ID are delivered to the consumer in a serialized manner.
Specifically it has code to protect against a device removal racing
with a running callback function.

This patch simplifies this logic by using a mutex per ID instead of a
wait queue and atomic variable.  This means that cma_disable_remove()
now is more properly named to cma_disable_callback(), and
cma_enable_remove() can now be removed because it just would become a
trivial wrapper around mutex_unlock().

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00
Or Gerlitz 64c5e613b9 RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr
Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
and copy it in as part of rdma_copy_addr().  Use rdma_translate_ip()
in cma_new_conn_id() to reduce some code duplication and also make
sure the src_dev member gets set.

In a high-availability configuration the netdevice pointer can be used
by the RDMA CM to align RDMA sessions to use the same links as the IP
stack does under fail-over and route change cases.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00
Steve Wise 4ab928f692 RDMA/cxgb3: Fixes for zero STag
Handling the zero STag in receive work request requires some extra
logic in the driver:

 - Only set the QP_PRIV bit for kernel mode QPs.

- Add a zero STag build function for recv wrs. The uP needs a PBL
  allocated and passed down in the recv WR so it can construct a HW
  PBL for the zero STag S/G entries.  Note: we need to place a few
  restrictions on zero STag usage because of this:

  1) all SGEs in a recv WR must either be zero STag or not.  No mixing.

  2) an individual SGE length cannot exceed 128MB for a zero-stag SGE.
     This should be OK since it's not really practical to allocate
     such a large chunk of pinned contiguous DMA mapped memory.

- Add an optimized non-zero-STag recv wr format for kernel users.
  This is needed to optimize both zero and non-zero STag cracking in
  the recv path for kernel users.

 - Remove the iwch_ prefix from the static build functions.

 - Bump required FW version.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
2008-07-14 23:48:53 -07:00
Steve Wise 96f15c0353 RDMA/core: Add local DMA L_Key support
- Change the IB_DEVICE_ZERO_STAG flag to the transport-neutral name
  IB_DEVICE_LOCAL_DMA_LKEY, which is used by iWARP RNICs to indicate 0
  STag support and IB HCAs to indicate reserved L_Key support.

- Add a u32 local_dma_lkey member to struct ib_device.  Drivers fill
  this in with the appropriate local DMA L_Key (if they support it).

- Fix up the drivers using this flag.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00
Roland Dreier aed012279d IB/mthca: Fix check of max_send_sge for special QPs
The MLX transport requires two extra gather entries for sends (one for
the header and one for the checksum at the end, as the comment says).
However the code checked that max_recv_sge was not too big, instead of
checking max_send_sge as it should have.  Fix the code to check the
correct condition.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Roland Dreier c036925ac0 IB/mthca: Use round_jiffies() for catastrophic error polling timer
Exactly when the catastrophic error polling timer function runs is not
important, so use round_jiffies() to save unnecessary wakeups.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Roland Dreier 4522e08ced IB/mthca: Remove "stop" flag for catastrophic error polling timer
Since we use del_timer_sync() anyway, there's no need for an
additional flag to tell the timer not to rearm.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Eli Cohen bc3a290b51 IPoIB: Double default RX/TX ring sizes
Increase IPoIB ring sizes to twice their original sizes (RX: 128->256,
TX: 64->128) to act as a shock absorber for high traffic peaks.  With
the current settings, we have seen cases that there are many calls to
netif_stop_queue(), which causes degradation in throughput.  Also,
larger receive buffer sizes help IPoIB in CM mode to avoid experiencing
RNR NAK conditions due to insufficient receive buffers at the SRQ.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Eli Cohen e112373fd6 IPoIB/cm: Reduce connected mode TX object size
Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA
mapping per send, so we don't need a mapping[] array.  Define a new
struct with a single u64 mapping member and use it for the CM tx_ring.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Ralph Campbell df8666198d IB/ipath: Use IEEE OUI for vendor_id reported by ibv_query_device()
The IB spe. for SubnGet(NodeInfo) and query HCA says that the vendor
ID field should be the IEEE OUI assigned to the vendor.  The ipath
driver was returning the PCI vendor ID instead.  This will affect
applications which call ibv_query_device().  The old value was
0x001fc1 or 0x001077, the new value is 0x001175.

The vendor ID doesn't appear to be exported via /sys so that should
reduce possible compatibility issues.  I'm only aware of Open MPI as a
major application which depends on this change, and they have made
necessary adjustments.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Eli Cohen bd3606715e IPoIB: Use dev_set_mtu() to change mtu
When the driver sets the MTU of the net device outside of its
change_mtu method, it should make use of dev_set_mtu() instead of
directly setting the mtu field of struct netdevice.  Otherwise
functions registered to be called upon MTU change will not get called
(this is done through call_netdevice_notifiers() in dev_set_mtu()).

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:51 -07:00
Eli Cohen c8c2afe360 IPoIB: Use rtnl lock/unlock when changing device flags
Use of this lock is required to synchronize changes to the netdvice's
data structs.  Also move the call to ipoib_flush_paths() after the
modification of the netdevice flags in set_mode().

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:51 -07:00
Roland Dreier 9eae554c17 IPoIB: Get rid of ipoib_mcast_detach() wrapper
ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just
use the core API in the one place that does a multicast group detach.

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105)
function                                     old     new   delta
ipoib_mcast_leave                            357     319     -38
ipoib_mcast_detach                            67       -     -67

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Eli Cohen d0de13622d IPoIB: Only set Q_Key once: after joining broadcast group
The current code will set the Q_Key for any join of a non-sendonly
multicast group.  The operation involves a modify QP operation, which
is fairly heavyweight, and is only really required after the join of
the broadcast group.  Fix this by adding a parameter to ipoib_mcast_attach()
to control when the Q_Key is set.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Eli Cohen 5892eff91a IPoIB: Remove priv->mcast_mutex
No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast
since these operations are synchronized at the HW driver layer.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Eli Cohen c03d4731b5 IPoIB: Remove unused IPOIB_MCAST_STARTED code
The IPOIB_MCAST_STARTED flag is not used at all since commit b3e2749b
("IPoIB: Don't drop multicast sends when they can be queued"), so
remove it.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Steve Wise 70fe1796a5 RDMA/cxgb3: Set rkey field for new memory windows in iwch_alloc_mw()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Roland Dreier 8294f29767 RDMA/nes: Get rid of ring_doorbell parameter of nes_post_cqp_request()
Every caller of nes_post_cqp_request() passed it NES_CQP_REQUEST_RING_DOORBELL,
so just remove that parameter and always ring the doorbell.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Faisal Latif <flatif@neteffect.com>
2008-07-14 23:48:49 -07:00
Jon Mason 52c8084b74 RDMA/cxgb3: Propagate HW page size capabilities
cxgb3 does not currently report the page size capabilities, and
incorrectly reports them internally.

This version changes the bit-shifting to a static value (per Steve's
request).

Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Roland Dreier 1ff66e8c1f RDMA/nes: Encapsulate logic nes_put_cqp_request()
The iw_nes driver repeats the logic

	if (atomic_dec_and_test(&cqp_request->refcount)) {
		if (cqp_request->dynamic) {
			kfree(cqp_request);
		} else {
			spin_lock_irqsave(&nesdev->cqp.lock, flags);
			list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs);
			spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
		}
	}

over and over.  Wrap this up in functions nes_free_cqp_request() and
nes_put_cqp_request() to simplify such code.

In addition to making the source smaller and more readable, this shrinks
the compiled code quite a bit:

add/remove: 2/0 grow/shrink: 0/13 up/down: 164/-1692 (-1528)
function                                     old     new   delta
nes_free_cqp_request                           -     147    +147
nes_put_cqp_request                            -      17     +17
nes_modify_qp                               2316    2293     -23
nes_hw_modify_qp                             737     657     -80
nes_dereg_mr                                 945     860     -85
flush_wqes                                   501     416     -85
nes_manage_apbvt                             648     560     -88
nes_reg_mr                                  1117    1026     -91
nes_cqp_ce_handler                           927     769    -158
nes_alloc_mw                                1052     884    -168
nes_create_qp                               5314    5141    -173
nes_alloc_fmr                               2212    2035    -177
nes_destroy_cq                              1097     918    -179
nes_create_cq                               2787    2598    -189
nes_dealloc_mw                               762     566    -196

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Faisal Latif <flatif@neteffect.com>
2008-07-14 23:48:49 -07:00
Moni Shoua ee1e2c82c2 IPoIB: Refresh paths instead of flushing them on SM change events
The patch tries to solve the problem of device going down and paths being
flushed on an SM change event. The method is to mark the paths as candidates for
refresh (by setting the new valid flag to 0), and wait for an ARP
probe a new path record query.

The solution requires a different and less intrusive handling of SM
change event. For that, the second argument of the flush function
changes its meaning from a boolean flag to a level.  In most cases, SM
failover doesn't cause LID change so traffic won't stop.  In the rare
cases of LID change, the remote host (the one that hadn't changed its
LID) will lose connectivity until paths are refreshed. This is no
worse than the current state.  In fact, preventing the device from
going down saves packets that otherwise would be lost.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Joachim Fenkes 038919f296 IB/ehca: Make device table externally visible
This gives ehca an autogenerated modalias and therefore enables automatic loading.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Vladimir Sokolovsky af40da894e IPoIB: add LRO support
Add "ipoib_use_lro" module parameter to enable LRO and an
"ipoib_lro_max_aggr" module parameter to set the max number of packets
to be aggregated.  Make LRO controllable and LRO statistics accessible
through ethtool.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Ron Livne 1240673405 IPoIB: Use multicast loopback blocking if available
Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if
supported by the underlying device.  This creates an improvement of up
to 39% in bandwidth when sending multicast packets with IPoIB, and an
improvment of 12% in cpu usage.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Ron Livne 521e575b9a IB/mlx4: Add support for blocking multicast loopback packets
Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK
flag by using the per-multicast group loopback blocking feature of
mlx4 hardware.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Steve Wise 14cc180f7b RDMA/cxgb3: Add support for protocol statistics
- Add a new rdma ctl command called RDMA_GET_MIB to the cxgb3 low
  level driver to obtain the protocol mib from the rnic hardware.

- Add new iw_cxgb3 provider method to get the MIB from the low level
  driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Steve Wise 7f624d023b RDMA/core: Add iWARP protocol statistics attributes in sysfs
This patch adds a sysfs attribute group called "proto_stats" under
/sys/class/infiniband/$device/ and populates this group with protocol
statistics if they exist for a given device.  Currently, only iWARP
stats are defined, but the code is designed to allow InfiniBand
protocol stats if they become available.  These stats are per-device
and more importantly -not- per port.

Details:

- Add union rdma_protocol_stats in ib_verbs.h.  This union allows
  defining transport-specific stats.  Currently only iwarp stats are
  defined.

- Add struct iw_protocol_stats to define the current set of iwarp
  protocol stats.

- Add new ib_device method called get_proto_stats() to return protocol
  statistics.

- Add logic in core/sysfs.c to create iwarp protocol stats attributes
  if the device is an RNIC and has a get_proto_stats() method.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Roland Dreier a7d834c4bc IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq()
For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is
called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(),
and these two callers are not synchronized against each other.
However, ipoib_cm_post_receive_nonsrq() always reuses the same receive
work request and scatter list structures, so multiple callers can end
up stepping on each other, which leads to posting garbled work
requests.

Fix this by having the caller pass in the ib_recv_wr and ib_sge
structures to use, and allocating new local structures in
ipoib_cm_nonsrq_init_rx().

Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and
David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam
Nguyen <hnguyen@de.ibm.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Roland Dreier 468f2239bc RDMA/cma: Add missing newlines to printk()s
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2008-07-14 23:48:47 -07:00
Roland Dreier eec8845d29 RDMA/cxgb3: Remove write-only iwch_rnic_attributes fields
The members struct iwch_rnic_attributes.vendor_id and .vendor_part_id
are write-only, so we might as well get rid of them.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
2008-07-14 23:48:47 -07:00
Steve Wise 97d1cc8055 RDMA/cxgb3: Fix up some ib_device_attr fields
- set fw_ver
- set hw_ver
- set max_qp_wr to something reasonable
- set max_cqe to something reasonable

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Stefan Roscher 6f7bc01a73 IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts
During corner case testing, we noticed that some versions of ehca do
not properly transition to interrupt done in special load situations.
This can be resolved by periodically triggering EOI through H_EOI, if
EQEs are pending.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Joachim Fenkes 3e255eac56 IB/ehca: Reject receive work requests if QP is in RESET state
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Roland Dreier 7c27f35820 IB/mlx4: Remove extra code for RESET->ERR QP state transition
Commit 65adfa91 ("IB/mlx4: Fix RESET to RESET and RESET to ERROR
transitions") added some extra code to handle a QP state transition
from RESET to ERROR.  However, the latest 1.2.1 version of the IB spec
has clarified that this transition is actually not allowed, so we can
remove this extra code again.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:46 -07:00
Roland Dreier d3809ad097 IB/mthca: Remove extra code for RESET->ERR QP state transition
Commit b18aad71 ("IB/mthca: Fix RESET to ERROR transition") added some
extra code to handle a QP state transition from RESET to ERROR.
However, the latest 1.2.1 version of the IB spec has clarified that
this transition is actually not allowed, so we can remove this extra
code again.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:46 -07:00
Ralph Campbell e5a5e7d59a IB/core: Reset to error QP state transition is not allowed
I was reviewing the QP state transition diagram in the IB 1.2.1 spec
and the code for qp_state_table[], and noticed that the code allows a
QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes
for figure 124 (pg 457) specifically says that this transition isn't
allowed.  This is a clarification from earlier versions of the IB
spec, which were ambiguous in this area and suggested that the RESET
to ERR transition was allowed.

Fix up the qp_state_table[] to make RESET->ERR not allowed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:46 -07:00
Eli Cohen 6578cf3398 IB/mlx4: Pass congestion management class MADs to the HCA
ConnectX HCAs support the IB_MGMT_CLASS_CONG_MGMT management class, so
process MADs of this class through the MAD_IFC firmware command.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Eli Cohen d1f2cd895f IB/mlx4: Configure QPs' max message size based on real device capability
ConnectX returns the max message size it supports through the
QUERY_DEV_CAP firmware command.  When modifying a QP to RTR, the max
message size for the QP must be specified.  This value must not exceed
the value declared through QUERY_DEV_CAP.  The current code ignores
the max allowed size and unconditionally sets the value to 2^31.  This
patch sets all QPs to the max value allowed as returned from firmware.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Steve Wise e7e5582999 RDMA/cxgb3: MEM_MGT_EXTENSIONS support
- set IB_DEVICE_MEM_MGT_EXTENSIONS capability bit if fw supports it.
- set max_fast_reg_page_list_len device attribute.
- add iwch_alloc_fast_reg_mr function.
- add iwch_alloc_fastreg_pbl
- add iwch_free_fastreg_pbl
- adjust the WQ depth for kernel mode work queues to account for
  fastreg possibly taking 2 WR slots.
- add fastreg_mr work request support.
- add local_inv work request support.
- add send_with_inv and send_with_se_inv work request support.
- removed useless duplicate enums/defines for TPT/MW/MR stuff.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Steve Wise 00f7ec36c9 RDMA/core: Add memory management extensions support
This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement).  The new operations are:

 - Allocate an ib_mr for use in fast register work requests.

 - Allocate/free a physical buffer lists for use in fast register work
   requests.  This allows device drivers to allocate this memory as
   needed for use in posting send requests (eg via dma_alloc_coherent).

 - New send queue work requests:
   * send with remote invalidate
   * fast register memory region
   * local invalidate memory region
   * RDMA read with invalidate local memory region (iWARP only)

Consumer interface details:

 - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
   to indicate device support for these features.

 - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
   IB_WR_RDMA_READ_WITH_INV are added.

 - A new consumer API function, ib_alloc_mr() is added to allocate
   fast register memory regions.

 - New consumer API functions, ib_alloc_fast_reg_page_list() and
   ib_free_fast_reg_page_list() are added to allocate and free
   device-specific memory for fast registration page lists.

 - A new consumer API function, ib_update_fast_reg_key(), is added to
   allow the key portion of the R_Key and L_Key of a fast registration
   MR to be updated.  Consumers call this if desired before posting
   a IB_WR_FAST_REG_MR work request.

Consumers can use this as follows:

 - MR is allocated with ib_alloc_mr().

 - Page list memory is allocated with ib_alloc_fast_reg_page_list().

 - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().

 - MR made VALID and bound to a specific page list via
   ib_post_send(IB_WR_FAST_REG_MR)

 - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
   ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
   invalidate operation.

 - MR is deallocated with ib_dereg_mr()

 - page lists dealloced via ib_free_fast_reg_page_list().

Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ).  For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.

The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key.  The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Eli Cohen f89271da32 IPoIB: Copy small received SKBs in connected mode
The connected mode implementation in the IPoIB driver has a large
overhead in the way SKBs are handled in the receive flow.  It usually
allocates an SKB with as big as was used in the currently received SKB
and moves unused fragments from the old SKB to the new one. This
involves a loop on all the remaining fragments and incurs overhead on
the CPU.  This patch, for small SKBs, allocates an SKB just large
enough to contain the received data and copies to it the data from the
received SKB.  The newly allocated SKB is passed to the stack and the
old SKB is reposted.

When running netperf, UDP small messages, without this pach I get:

    UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    14.4.3.178 (14.4.3.178) port 0 AF_INET
    Socket  Message  Elapsed      Messages
    Size    Size     Time         Okay Errors   Throughput
    bytes   bytes    secs            #      #   10^6bits/sec

    114688     128   10.00     5142034      0     526.31
    114688           10.00     1130489            115.71

With this patch I get both send and receive at ~315 mbps.

The reason that send performance actually slows down is as follows:
When using this patch, the overhead of the CPU for handling RX packets
is dramatically reduced.  As a result, we do not experience RNR NAK
messages from the receiver which cause the connection to be closed and
reopened again; when the patch is not used, the receiver cannot handle
the packets fast enough so there is less time to post new buffers and
hence the mentioned RNR NACKs.  So what happens is that the
application *thinks* it posted a certain number of packets for
transmission but these packets are flushed and do not really get
transmitted.  Since the connection gets opened and closed many times,
each time netperf gets the CPU time that otherwise would have been
given to IPoIB to actually transmit the packets.  This can be verified
when looking at the port counters -- the output of ifconfig and the
oputput of netperf (this is for the case without the patch):

    tx packets
    ==========
    port counter:   1,543,996
    ifconfig:       1,581,426
    netperf:        5,142,034

    rx packets
    ==========
    netperf         1,1304,089

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
2008-07-14 23:48:44 -07:00
Roland Dreier f3781d2e89 RDMA: Remove subversion $Id tags
They don't get updated by git and so they're worse than useless.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Robert P. J. Day fd91b1bf1b IB/ipath: Simplify code using ARRAY_SIZE() macro
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Eli Cohen 9670e55391 IB/mlx4: Optimize QP stamping
The idea is that for QPs with fixed size work requests (eg selective
signaling QPs), before stamping the WQE, we read the value of the DS
field, which gives the effective size of the descriptor as used in the
previous post.  Then we stamp only that area, since the rest of the
descriptor is already stamped.

When initializing the send queue buffer, make sure the DS field is
initialized to the max descriptor size so that the subsequent stamping
will be done on the entire descriptor area.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Moni Shoua 164ba0893c IB/sa: Fail requests made while creating new SM AH
This patch solves a race that occurs after an event occurs that causes
the SA query module to flush its SM address handle (AH).  When SM AH
becomes invalid and needs an update it is handled by the global
workqueue.  On the other hand this event is also handled in the IPoIB
driver by queuing work in the ipoib_workqueue that does multicast
joins.  Although queuing is in the right order, it is done to 2
different workqueues and so there is no guarantee that the first to be
queued is the first to be executed.

This causes a problem because IPoIB may end up sending an request to
the old SM, which will take a long time to time out (since the old SM
is gone); this leads to a much longer than necessary interruption in
multicast traffer.

The patch sets the SA query module's SM AH to NULL when the event
occurs, and until update_sm_ah() is done, any request that needs sm_ah
fails with -EAGAIN return status.

For consumers, the patch doesn't make things worse.  Before the patch,
MADs are sent to the wrong SM so the request gets lost.  Consumers can
be improved if they examine the return code and respond to EAGAIN
properly but even without an improvement the situation is not getting
worse.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Sean Hefty a947491709 RDMA: Fix license text
The license text for several files references a third software license
that was inadvertently copied in.  Update the license to what was
intended.  This update was based on a request from HP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Christophe Jaillet 929555a2ba RDMA/nes: Remove unnecessary memset()
Remove an explicit memset(..., 0, ...) of a 'listener' structure
allocated with kzalloc().

Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
Acked-by: Faisal Latif <faisal@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Roland Dreier 969a60f9db IB/srp: Remove use of cached P_Key/GID queries
The SRP initiator is currently using ib_find_cached_pkey() and
ib_get_cached_gid() in situations where the uncached ib_find_pkey()
and ib_query_gid() functions serve just as well: sleeping is allowed
and performance is not an issue.  Since we want to eliminate the
cached operations in the long term, convert SRP to use the uncached
variants.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Jonathan Corbet 2fceef397f Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
Mike Christie 8e9a20cee4 [SCSI] libiscsi, iscsi_tcp, ib_iser: fix setting of can_queue with old tools.
This patch fixes two bugs that are related.

1. Old tools did not set can_queue/cmds_max. This patch modifies
libiscsi so that when we add the host we catch this and set it
to the default.

2. iscsi_tcp thought that the scsi command that was passed to
the eh functions needed a iscsi_cmd_task allocated for it. It
only needed a mgmt task, and now it does not matter since it
all comes from the same pool and libiscsi handles this for the
drivers. ib_iser had copied iscsi_tcp's code and set can_queue
to its max - 1 to handle this. So this patch removes the max -1,
and just sets it to the max.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:29 -05:00
Mike Christie 913e5bf435 [SCSI] libiscsi, iser, tcp: remove recv_lock
The recv lock was defined so the iscsi layer could block
the recv path from processing IO during recovery. It
turns out iser just set a lock to that pointer which was pointless.

We now disconnect the transport connection before doing recovery
so we do not need the recv lock. For iscsi_tcp we still stop
the recv path incase older tools are being used.

This patch also has iscsi_itt_to_ctask user grab the session lock
and has the caller access the task with the lock or get a ref
to it in case the target is broken and sends a tmf success response
then sends data or a response for the command that was supposed to
be affected bty the tmf.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:22 -05:00
Mike Christie 88dfd340b9 [SCSI] iscsi class: Add session initiatorname and ifacename sysfs attrs.
This adds two new attrs used for creating initiator ports and
binding sessions to hardware.

The session level initiatorname:

Since bnx2i does a scsi_host per host device, we need to add the
iface initiator port settings on the session, so we can create
multiple initiator ports (each with different inames) per device/scsi_host.

The current iname reflects that qla4xxx can have one iname per hba, and we are
allocating a host per session for software. The iname on the host will
remain so we can export and set the hba level qla4xxx setting.

The ifacename attr:

To bind a session to a some peice of hardware in userspace we maintain
some mappings, but during boot or iscsid restart (iscsid contains the user
space part of the driver) we need to be able to figure out which of those
host mappings abstractions maps to certain sessions. This patch adds
a ifacename attr, which userspace can set to id the host side of the
endpoint across pivot_roots and iscsid restarts.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:21 -05:00
Mike Christie 412eeafa0a [SCSI] iser: Modify iser to take a iscsi_endpoint struct in ep callouts and session setup
This hooks iser into the iscsi endpoint code. Previously it handled the
lookup and allocation. This has been made generic so bnx2i and iser can
share it. It also allows us to pass iser the leading conn's ep, so we
know the ib_deivce being used and can set it as the scsi_host's parent.
And that allows scsi-ml to set the dma_mask based on those values.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:21 -05:00
Mike Christie 7970634b81 [SCSI] iscsi class: user device_for_each_child instead of duplicating session list
Currently we duplicate the list of sessions, because we were using the
test for if a session was on the host list to indicate if the session
was bound or unbound. We can instead use the target_id and fix up
the class so that drivers like bnx2i do not have to manage the target id
space.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:20 -05:00
Mike Christie 2261ec3d68 [SCSI] iser: handle iscsi_cmd_task rename
This handles the iscsi_cmd_task rename and renames
the iser cmd task to iser task.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:20 -05:00
Mike Christie 2747fdb257 [SCSI] iser: convert ib_iser to support merged tasks
Convert ib_iser to support merged tasks.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:19 -05:00
Mike Christie 0af967f5d4 [SCSI] libiscsi, iscsi_tcp, iser: add session cmds array accessor
Currently to get a ctask from the session cmd array, you have to
know to use the itt modifier. To make this easier on LLDs and
so in the future we can easilly kill the session array and use
the host shared map instead, this patch adds a nice wrapper
to strip the itt into a session->cmds index and return a ctask.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:18 -05:00
Mike Christie b40977d95f [SCSI] iser: fix handling of scsi cmnds during recovery.
After the stop_conn callback has returned the LLD should not
touch the scsi cmds. iscsi_tcp and libiscsi use the
conn->recv_lock and suspend_rx field to halt recv path
processing, but iser does not have any protection.

This patch modifies iser so that userspace can just
call the ep_disconnect callback, which will halt
all recv IO, before calling the stop_conn callback so
we do not have to worry about the conn->recv_lock and
suspend rx field. iser just needs to stop the send side
from accessing the ib conn.

Fixup to handle when the ep poll fails and ep disconnect
is called from Erez.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:17 -05:00
Mike Christie 5d91e209fb [SCSI] iscsi: remove session/conn_data_size from iscsi_transport
This removes the session and conn data_size fields from the iscsi_transport.
Just pass in the value like with host allocation. This patch also makes
it so the LLD iscsi_conn data is allocated with the iscsi_cls_conn.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie a4804cd6eb [SCSI] iscsi: add iscsi host helpers
This finishes the host/session unbinding, by adding some helpers
to add and remove hosts and the session they manage.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie 756135215e [SCSI] iscsi: remove session and host binding in libiscsi
bnx2i allocates a host per netdevice but will use libiscsi,
so this unbinds the session from the host in that code.

This will also be useful for the iser parent device dma settings
fixes.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie d3826721b1 [SCSI] iscsi class, iscsi drivers: remove unused iscsi_transport attrs
max_cmd_len and max_conn are not really used. max_cmd_len is
always 16 and can be set by the LLD. max_conn is always one
since we do not support MCS.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:15 -05:00
Mike Christie 40753caa36 [SCSI] iscsi class, iscsi_tcp/iser: add host arg to session creation
iscsi offload (bnx2i and qla4xx) allocate a scsi host per hba,
so the session creation path needs a shost/host_no argument.
Software iscsi/iser will follow the same behabior as before
where it allcoates a host per session, but in the future iser
will probably look more like bnx2i where the host's parent is
the hardware (rnic for iser and for bnx2i it is the nic), because
it does not use a socket layer like how iscsi_tcp does.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:15 -05:00
Roland Dreier feae1ef116 IB/umad: BKL is not needed for ib_umad_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-11 16:40:58 -06:00
Ingo Molnar 0c81b2a144 Merge branch 'linus' into core/rcu
Conflicts:

	include/linux/rculist.h
	kernel/rcupreempt.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 10:46:50 +02:00
Steve Wise 5e19cf663b RDMA/cxgb3: Fix regression caused by class_device -> device conversion
The change to iwch_provider.c in commit f4e91eb4 ("IB: convert struct
class_device to struct device") undid the fix done in commit 7f049f2f
("RDMA/cxgb3: Hold rtnl_lock() around ethtool get_drvinfo call").  It
removed the calls to rtnl_lock() that serialized the iw_cxgb3 ethtool
ops calls into the cxgb3 driver.  This locking is needed to avoid
messing up the internal state of the cxgb3 driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-08 14:40:05 -07:00
Ingo Molnar 68083e05d7 Merge commit 'v2.6.26-rc9' into cpus4096 2008-07-06 14:23:39 +02:00
Roland Dreier 5b2d281acb IB/uverbs: BKL is not needed for ib_uverbs_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-04 10:32:28 -06:00
Ingo Molnar 9a13150109 Merge commit 'v2.6.26-rc8' into core/rcu 2008-06-26 09:24:23 +02:00
Eli Cohen 87afd448b1 IB/mthca: Clear ICM pages before handing to FW
Current memfree FW has a bug which in some cases, assumes that ICM
pages passed to it are cleared.  This patch uses __GFP_ZERO to
allocate all ICM pages passed to the FW.  Once firmware with a fix is
released, we can make the workaround conditional on firmware version.

This fixes the bug reported by Arthur Kepner <akepner@sgi.com> here:
http://lists.openfabrics.org/pipermail/general/2008-May/050026.html

Cc: <stable@kernel.org>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>

[ Rewritten to be a one-liner using __GFP_ZERO instead of vmap()ing
  ICM memory and memset()ing it to 0. - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-23 09:29:58 -07:00
Ingo Molnar 1e74f9cbbb Merge branch 'linus' into core/rcu 2008-06-23 11:29:11 +02:00
Arnd Bergmann 6b0ee363b2 infiniband-ucma: BKL pushdown
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2008-06-20 14:05:57 -06:00
Jonathan Corbet f2b9857eee Add a bunch of cycle_kernel_lock() calls
All of the open() functions which don't need the BKL on their face may
still depend on its acquisition to serialize opens against driver
initialization.  So make those functions acquire then release the BKL to be
on the safe side.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:53 -06:00
Jonathan Corbet 057e7c7ff9 infiniband: more BKL pushdown
Be extra-cautious and protect the remaining open() functions.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:51 -06:00
Jonathan Corbet d21c95c569 Add "no BKL needed" comments to several drivers
This documents the fact that somebody looked at the relevant open()
functions and concluded that, due to their trivial nature, no locking was
needed.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:50 -06:00
Jack Morgenstein fb77bcef9f IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler()
Commit 1ae5c187 ("IB/uverbs: Don't store struct file * for event
files") changed the way that closed files are handled in the uverbs
code.  However, after the conversion, is_closed flag is checked
incorrectly in ib_uverbs_async_handler().  As a result, no async
events are ever passed to applications.

Found by: Ronni Zimmerman <ronniz@mellanox.co.il>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-18 15:36:38 -07:00
Ingo Molnar 766d02786e Merge branch 'linus' into core/rcu 2008-06-16 11:23:36 +02:00
Roland Dreier 24797a3442 RDMA/nes: Fix off-by-one in nes_reg_user_mr() error path
nes_reg_user_mr() should fail if page_count becomes >= 1024 * 512
rather than just testing for strict >, because page_count is
essentially used as an index into an array with 1024 * 512 entries, so
allowing the loop to continue with page_count == 1024 * 512 means that
memory after the end of the array is corrupted.  This leads to a crash
triggerable by a userspace application that requests registration of a
too-big region.

Also get rid of the call to pci_free_consistent() here to avoid
corrupting state with a double free, since the same memory will be
freed in the code jumped to at reg_user_mr_err.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-10 12:29:49 -07:00
Roland Dreier 4c0283fc56 IB/core: Remove IB_DEVICE_SEND_W_INV capability flag
In 2.6.26, we added some support for send with invalidate work
requests, including a device capability flag to indicate whether a
device supports such requests.  However, the support was incomplete:
the completion structure was not extended with a field for the key
contained in incoming send with invalidate requests.

Full support for memory management extensions (send with invalidate,
local invalidate, fast register through a send queue, etc) is planned
for 2.6.27.  Since send with invalidate is not very useful by itself,
just remove the IB_DEVICE_SEND_W_INV bit before the 2.6.26 final
release; we will add an IB_DEVICE_MEM_MGT_EXTENSIONS bit in 2.6.27,
which makes things simpler for applications, since they will not have
quite as confusing an array of fine-grained bits to check.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-09 09:58:42 -07:00
Roland Dreier 8079ffa0e1 IB/umem: Avoid sign problems when demoting npages to integer
On a 64-bit architecture, if ib_umem_get() is called with a size value
that is so big that npages is negative when cast to int, then the
length of the page list passed to get_user_pages(), namely

	min_t(int, npages, PAGE_SIZE / sizeof (struct page *))

will be negative, and get_user_pages() will immediately return 0 (at
least since 900cf086, "Be more robust about bad arguments in
get_user_pages()").  This leads to an infinite loop in ib_umem_get(),
since the code boils down to:

	while (npages) {
		ret = get_user_pages(...);
		npages -= ret;
	}

Fix this by taking the minimum as unsigned longs, so that the value of
npages is never truncated.

The impact of this bug isn't too severe, since the value of npages is
checked against RLIMIT_MEMLOCK, so a process would need to have an
astronomical limit or have CAP_IPC_LOCK to be able to trigger this,
and such a process could already cause lots of mischief.  But it does
let buggy userspace code cause a kernel lock-up; for example I hit
this with code that passes a negative value into a memory registartion
function where it is promoted to a huge u64 value.

Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-06 21:38:37 -07:00
Ralph Campbell 27676a3e16 IB/ipath: Fix SM trap forwarding
SM/SMA traps received by the ipath driver should be forwarded to the
SM if it is running on the host.  The ib_ipath driver was incorrectly
replying with "bad method."

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-06 11:23:29 -07:00
Joachim Fenkes 088af1543c IB/ehca: Reject send WRs only for RESET, INIT and RTR state
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-06 11:21:33 -07:00
Ralph Campbell 03031f71c7 IB/ipath: Fix device capability flags
The driver supports a few features (RNR NAK, port active event, SRQ
resize) that were not reported in the device capability flags.  This
patch fixes that.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-26 15:22:17 -07:00
Roland Dreier e8ffef73c8 IB/ipath: Avoid test_bit() on u64 SDMA status value
Gabriel C <nix.or.die@googlemail.com> pointed out that when the x86
bitops are updated to operate on unsigned long, the code in
sdma_abort_task() will produce warnings:

    drivers/infiniband/hw/ipath/ipath_sdma.c: In function 'sdma_abort_task':
    drivers/infiniband/hw/ipath/ipath_sdma.c:267: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type

and so on, because it uses test_bit() to operation on a u64 value
(returned by ipath_read_kref64() for a hardware register).

Fix up these warnings by converting the test_bit() operations to &ing
with appropriate symbolic defines of the bits within the hardware
register.  This has the benign side-effect of making the code more
self-documenting as well.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-26 15:20:34 -07:00
Linus Torvalds c2448278e3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
  IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()
  MAINTAINERS: Add cxgb3 and iw_cxgb3 NIC and iWARP driver entries
  IB/mlx4: Fix creation of kernel QP with max number of send s/g entries
  IB/mthca: Fix max_sge value returned by query_device
  RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send()
  IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send()
  IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate
  IB/ipath: Fix printk format for ipath_sdma_status
2008-05-23 11:11:44 -07:00
Dave Olson 5a4f2b6752 IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
If a low-level driver returns IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED,
handle_outgoing_dr_smp() doesn't clean up properly.  The fix is to
kfree the local data and break, rather than falling through.  This was
observed with the ipath driver, but could happen with any driver.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1027>.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-23 10:52:59 -07:00
Mike Travis 5d7bfd0c4d infiniband: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate

Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 18:39:06 +02:00
Jack Morgenstein e1d50dce5a IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()
We saw a kernel oops in our regression testing when a multicast "join
finish" occurred just after the interface was -- this is
<https://bugs.openfabrics.org/show_bug.cgi?id=1040>.  The test
randomly causes the HCA physical port to go down then up.

The cause of this is that ipoib_mcast_join_finish() processing happen
just after ipoib_mcast_dev_flush() was invoked (in which case the
broadcast pointer is NULL).  This patch tests for and handles the case
where priv->broadcast is NULL.

Cc: <stable@kernel.org>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-20 15:41:09 -07:00
Roland Dreier cd155c1c7c IB/mlx4: Fix creation of kernel QP with max number of send s/g entries
When creating a kernel QP where the consumer asked for a send queue
with lots of scatter/gater entries, set_kernel_sq_size() incorrectly
returned an error if the send queue stride is larger than the
hardware's maximum send work request descriptor size.  This is not a
problem; the only issue is to make sure that the actual descriptors
used do not overflow the maximum descriptor size, so check this instead.

Clamp the returned max_send_sge value to be no bigger than what
query_device returns for the max_sge to avoid confusing hapless users,
even if the hardware is capable of handling a few more s/g entries.

This bug caused NFS/RDMA mounts to fail when the server adapter used
the mlx4 driver.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-20 14:00:02 -07:00
Greg Kroah-Hartman 6c06aec248 IB: fix race in device_create
There is a race from when a device is created with device_create() and
then the drvdata is set with a call to dev_set_drvdata() in which a
sysfs file could be open, yet the drvdata will be NULL, causing all
sorts of bad things to happen.

This patch fixes the problem by using the new function,
device_create_drvdata().

Cc: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-20 13:31:55 -07:00
Franck Bui-Huu 82524746c2 rcu: split list.h and move rcu-protected lists into rculist.h
Move rcu-protected lists from list.h into a new header file rculist.h.

This is done because list are a very used primitive structure all over the
kernel and it's currently impossible to include other header files in this
list.h without creating some circular dependencies.

For example, list.h implements rcu-protected list and uses rcu_dereference()
without including rcupdate.h.  It actually compiles because users of
rcu_dereference() are macros.  Others RCU functions could be used too but
aren't probably because of this.

Therefore this patch creates rculist.h which includes rcupdates without to
many changes/troubles.

Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-19 10:01:37 +02:00
Roland Dreier 12103dca52 IB/mthca: Fix max_sge value returned by query_device
The mthca driver returns the maximum number of scatter/gather entries
returned by the firmware as the max_sge value when device properties
are queried.  However, the firmware also reports a limit on the
maximum descriptor size allowed, and because mthca takes into account
the worst case send request overhead when checking whether to allow a
QP to be created, the largest number of scatter/gather entries that
can be used with mthca may be limited by the maximum descriptor size
rather than just by the actual s/g entry limit.

This means that applications cannot actually create QPs with
max_send_sge equal to the limit returned by ib_query_device().  Fix
this by checking if the maximum descriptor size imposes a lower limit
and if so returning that lower limit.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-16 14:58:44 -07:00
Roland Dreier 21609ae3ef RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send()
drivers/infiniband/hw/cxgb3/iwch_qp.c: In function 'iwch_post_send':
    drivers/infiniband/hw/cxgb3/iwch_qp.c:232: warning: 't3_wr_flit_cnt' may be used uninitialized in this function

This is what akpm describes as "the dopey
gcc-doesn't-know-that-foo(&var)-writes-to-var problem."

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
2008-05-16 14:58:40 -07:00
Andrew Morton a3d8e1591d IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send()
drivers/infiniband/hw/mlx4/qp.c: In function 'mlx4_ib_post_send':
    drivers/infiniband/hw/mlx4/qp.c:1460: warning: 'seglen' may be used uninitialized in this function

This is the dopey gcc-doesn't-know-that-foo(&var)-writes-to-var problem.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-16 14:28:30 -07:00
Ralph Campbell df3f0da8db IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate
When I fixed the RC receive completion opcode in 2bfc8e9e ("IB/ipath:
Return the correct opcode for RDMA WRITE with immediate"), I forgot to
fix UC, which had the same problem for RDMA write with immediate
returning the wrong opcode.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-15 16:37:25 -07:00
Roland Dreier cd80ec6f81 IB/ipath: Fix printk format for ipath_sdma_status
Commit f018c7e1 ("IB/ipath: Change ipath_devdata.ipath_sdma_status to be
unsigned long") changed ipath_sdma_status to be unsigned long, but left
a few debug messages that printed it out with a %016llx format, which
generates the warnings

    drivers/infiniband/hw/ipath/ipath_sdma.c:348: warning: format '%016llx' expects type 'long long unsigned int', but argument  3 has type 'long unsigned int'
    drivers/infiniband/hw/ipath/ipath_sdma.c:618: warning: format '%016llx' expects type 'long long unsigned int', but argument  3 has type 'long unsigned int'

Fix this by changing the format used to print out the value to %08lx
(8 hex digits are now sufficient, because the highest bit used is 31).

Warnings reported by Randy Dunlap <randy.dunlap@oracle.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-15 15:28:55 -07:00
Steve Wise a58e58fafd RDMA/cxgb3: Wrap the software send queue pointer as needed on flush
cxio_flush_sq() was failing to wrap around the software send queue
causing garbage completion entries on a flush operation.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:52:55 -07:00
Roland Dreier f018c7e177 IB/ipath: Change ipath_devdata.ipath_sdma_status to be unsigned long
Andrew Morton <akpm@linux-foundation.org> pointed out that bitops
should take an unsigned long * arg.  However, the ipath driver was
doing bitops on struct ipath_devdata.ipath_sdma_status, which is u64.
Change this member to unsigned long to avoid tons of warnings when x86
fixes the bitops to take unsigned long * instead of void *.

Also, change the IPATH_SDMA_RUNNING and IPATH_SDMA_SHUTDOWN bit
numbers to 30 and 31 (instead of 62 and 63) so that we're not setting
another booby trap for someone who tries to make ipath work on a
32-bit architecture.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:51:23 -07:00
Pavel Emelyanov 40d97692fb IB/ipath: Make ipath_portdata work with struct pid * not pid_t
The official reason is "with the presence of pid namespaces in the
kernel using pid_t-s inside one is no longer safe."

But the reason I fix this right now is the following:

About a month ago (when 2.6.25 was not yet released) there still was a
one last caller of a to-be-deprecated-soon function find_pid() - the
kill_proc() function, which in turn was only used by nfs callback
code.

During the last merge window, this last caller was finally eliminated
by some NFS patch(es) and I was about to finally kill this kill_proc()
and find_pid(), but found, that I was late and the kill_proc is now
called from the ipath driver since commit 58411d1c ("IB/ipath: Head of
Line blocking vs forward progress of user apps").

So here's a patch that fixes this code to use struct pid * and (!)
the kill_pid routine.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:45:32 -07:00
Ralph Campbell 74116f580b IB/ipath: Fix RDMA read response sequence checking
If an out of sequence RDMA read response middle or last packet is
received, we should only resend the RDMA read request on the first
out of sequence packet and drop subsequent out of sequence packets
otherwise, we get "too many retries".

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:42:20 -07:00
Ralph Campbell e509be898d IB/ipath: Fix many locking issues when switching to error state
The send DMA hardware queue voided a number of prior assumptions about
when a send is complete which led to completions being generated out of
order.  There were also a number of locking issues when switching the QP
to the error or reset states, and we implement the IB_QPS_SQD state.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:41:29 -07:00
Ralph Campbell 53dc1ca194 IB/ipath: Fix RC and UC error handling
When errors are detected in RC, the QP should transition to the
IB_QPS_ERR state, not the IB_QPS_SQE state. Also, when the error is on
the responder side, the receive work completion error was incorrect
(remote vs. local).

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:40:25 -07:00
Roland Dreier dd37818dbd RDMA/nes: Fix up nes_lro_max_aggr module parameter
Fix some bugs with the max_aggr module parameter added with LRO support:

 - The module parameter value ignored and not actually used to set
   lro_mgr.max_aggr.
 - MODULE_PARM_DESC had a typo "_mro_" instead of "_lro_" so it didn't
   end up describing the actual module parameter.
 - The nes_lro_max_aggr variable was declared as unsigned, but the
   module_param line said "int" instead of "uint" for the type.
 - The default value for the parameter was stuck in the permissions
   field of module_param, which led to nonsensical permissions for the
   file under /sys/module/iw_nes/param.
 - The parameter was used in only one file but defined in another, which
   led to the variable being global for no good reason.  Move everything
   related to the parameter to the file nes_hw.c where it is actually
   used.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:27:25 -07:00
Stefan Roscher 12137c593d IB/ehca: Wait for async events to finish before destroying QP
This is necessary because, in a multicore environment, a race between
uverbs async handler and destroy QP could occur.

Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 11:35:06 -07:00
John Gregor ab69b3cf12 IB/ipath: Fix SDMA error recovery in absence of link status change
What's fixed:

    in ipath_cancel_sends()

        We need to unconditionally set ABORTING.  So, swap the tests
        so the set_bit() isn't shadowed by the &&.

        If we've disarmed the piobufs, then we need to unconditionally
        set DISARMED.  So, move it out from the overly protective if
        at the bottom.

    in sdma_abort_task()

        Abort_task was written knowing that the SDMA engine would always
        be reset (and restarted) on error.  A recent change broke that
        fundamental assumption by taking the restart portion and making
        it conditional on a link status change.  But, SDMA can go boom
        without a link status change in some conditions.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 11:01:10 -07:00
Dave Olson e2ab41cae4 IB/ipath: Need to always request and handle PIO avail interrupts
Now that we always use PIO for vl15 on 7220, we could get stuck forever
if we happened to run out of PIO buffers from the verbs code, because
the setup code wouldn't run; the interrupt was also ignored if SDMA was
supported.  We also have to reduce the pio update threshold if we have
fewer kernel buffers than the existing threshold.

Clean up the initialization a bit to get ordering safer and more
sensible, and use the existing ipath_chg_kernavail call to do init,
rather than doing it separately.

Drop unnecessary clearing of pio buffer on pio parity error.

Drop incorrect updating of pioavailshadow when exitting freeze mode
(software state may not match chip state if buffer has been allocated
and not yet written).

If we couldn't get a kernel buffer for a while, make sure we are
in sync with hardware, mainly to handle the exitting freeze case.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 11:00:15 -07:00
Michael Albaugh 2889d1ef12 IB/ipath: Fix count of packets received by kernel
The loop in ipath_kreceive() that processes packets increments the
loop-index 'i' once too often, because the exit condition does not
depend on it, and is checked after the increment. By adding a check for
!last to the iterator in the for loop, we correct that in a way that is
not so likely to be re-broken by changes in the loop body.

Signed-off-by: Michael Albaugh <micheal.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:59:23 -07:00
Ralph Campbell 2bfc8e9edf IB/ipath: Return the correct opcode for RDMA WRITE with immediate
This patch fixes a bug in the RC responder which generates a completion
entry with the wrong opcode when an RDMA WRITE with immediate is received.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:58:50 -07:00
Dave Olson b4d390d8d2 IB/ipath: Fix bug that can leave sends disabled after freeze recovery
The semantics of cancel_sends changed, but the code using it was missed.
Don't leave sends and pioavail updates disabled, and add a comment as to
why the force update is needed.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:57:48 -07:00
Ralph Campbell 6e87d15007 IB/ipath: Only increment SSN if WQE is put on send queue
If a send work request has immediate errors and is not put on the
send queue, we shouldn't update any of the QP state.

The increment of the SSN wasn't obeying this.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:57:14 -07:00
Michael Albaugh 5f51efc195 IB/ipath: Only warn about prototype chip during init
We warn about prototype chips, but the function that checks for
support is also called as a result of a get_portinfo request, which
can clutter the logs.

Restrict warning to only appear during initialization.

Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:56:47 -07:00
Roland Dreier 273748cc90 RDMA/cxgb3: Fix severe limit on userspace memory registration size
Currently, iw_cxgb3 is severely limited on the amount of userspace
memory that can be registered in in a single memory region, which
causes big problems for applications that expect to be able to
register 100s of MB.

The problem is that the driver uses a single kmalloc()ed buffer to
hold the physical buffer list (PBL) for the entire memory region
during registration, which means that 8 bytes of contiguous memory are
required for each page of memory being registered.  For example, a 64
MB registration will require 128 KB of contiguous memory with 4 KB
pages, and it unlikely that such an allocation will succeed on a busy
system.

This is purely a driver problem: the temporary page list buffer is not
needed by the hardware, so we can fix this by writing the PBL to the
hardware in page-sized chunks rather than all at once.  We do this by
splitting the memory registration operation up into several steps:

 - Allocate PBL space in adapter memory for the full registration
 - Copy PBL to adapter memory in chunks
 - Allocate STag and enable memory region

This also allows several other cleanups to the __cxio_tpt_op()
interface and related parts of the driver.

This change leaves the reregister memory region and memory window
operations broken, but they already didn't work due to other
longstanding bugs, so fixing them will be left to a later patch.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-06 15:56:22 -07:00
Roland Dreier 0e9913362a RDMA/cxgb3: Don't add PBL memory to gen_pool in chunks
Current iw_cxgb3 code adds PBL memory to the driver's gen_pool in 2 MB
chunks.  This limits the largest single allocation that can be done to
the same size, which means that with 4 KB pages, each of which takes 8
bytes of PBL memory, the largest memory region that can be allocated
is 1 GB (256K PBL entries * 4 KB/entry).

Remove this limit by adding all the PBL memory in a single gen_pool
chunk, if possible.  Add code that falls back to smaller chunks if
gen_pool_add() fails, which can happen if there is not sufficient
contiguous lowmem for the internal gen_pool bitmap.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-06 15:03:38 -07:00
Stefan Roscher cf04690885 IB/ehca: Fix function return types
Also remove duplicate assignment of local_ca_ack_delay and change
min_t check for local_ca_ack_delay to u8 instead of int.

Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-05 15:51:49 -07:00
Steve Wise 77a8d5741f RDMA/cxgb3: Bump up the MPA connection setup timeout.
Testing on large clusters shows its way too short at 10 secs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-02 10:57:09 -07:00
Steve Wise c4d49776e8 RDMA/cxgb3: Silently ignore close reply after abort.
Remove bad BUG_ON() that can trigger in correct operation from
close_con_rpl().  It is possible to get a close_rpl message on a dead
connection.  The sequence is:

	- host refs ep for close exchange
	- host posts close_req
	- hw posts PEER_ABORT from incoming RST
	- host marks ep DEAD
	- host posts ABORT_RPL and releases ep resources
	- hw posts CLOSE_RPL
	- host derefs ep and ep freed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-02 10:57:09 -07:00
Steve Wise c8286944b8 RDMA/cxgb3: QP flush fixes
- Flush the QP only after the HW disables the connection.  Currently
  we flush the QP when transitioning to CLOSING.  This exposes a race
  condition where the HW can complete a RECV WR, for instance, -and-
  the SW can flush that same WR.

- Only call CQ event handlers on flush IFF we actually flushed something.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-02 10:56:57 -07:00
Eli Cohen 57ce41d1d1 IB/ipoib: Fix transmit queue stalling forever
Commit f56bcd80 ("IPoIB: Use separate CQ for UD send completions")
introduced a bug where the transmit queue could get stopped and never
woken up.  The problem is that send completions are only polled at the
end of the xmit function, so if the send queue fills up and the xmit
path stops the queue, then there is no way for send completions to
ever get polled, and so the transmit queue stays stopped forever.

Fix this by arming the send CQ just before posting the last send
request that fills the send queue.  Then, when the completion event
handler is called, drain the send CQ.  Since it is possible that not
enough send completions are in the CQ, verify that the the net queue
has been woken up after draining the send CQ, and if not arm a timer
and drain again at the timer function.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-30 20:02:45 -07:00
Roland Dreier 3ae15e1623 IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf()
When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I
changed things around so that mlx4_ib_alloc_cq_buf() and
mlx4_ib_free_cq_buf() were used everywhere they could be.  However, I
screwed up the number of entries passed into mlx4_ib_alloc_cq_buf()
in a couple places -- the function bumps the number of entries
internally, so the caller shouldn't add 1 as well.

Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf()
can cause the cleanup to go off the end of an array and corrupt
allocator state in interesting ways.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-30 19:52:55 -07:00
Glenn Streiff 7495ab6837 RDMA/nes: Formatting cleanup
Various cleanups:
	- Change // to /* .. */
	- Place whitespace around binary operators.
	- Trim down a few long lines.
	- Some minor alignment formatting for better readability.
	- Remove some silly tabs.

Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:54 -07:00
Eric Schneider 0e1de5d62e RDMA/nes: Add support for SFP+ PHY
This patch enables the iw_nes module for NetEffect RNICs to support
additional PHYs including SFP+ (referred to as ARGUS in the code).

Signed-off-by: Eric Schneider <eric.schneider@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:54 -07:00
Faisal Latif 37dab4112d RDMA/nes: Use LRO
Signed-off-by: Faisal Latif <flatif@neteffect.com.
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:54 -07:00
Eli Cohen b4132efa1a IPoIB: Copy child MTU from parent
When creating a child interface, copy the MTU information from the
parent.  Otherwise when the child's multicast join completes, the MTU
will not be updated since the code does

	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);

and priv->admin_mtu will be set to 0.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Roland Dreier baaad380c0 IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute
Commit cb9fbc5c ("IB: expand ib_umem_get() prototype") changed the
mthca userspace ABI to provide a way for userspace to indicate which
memory regions need the DMA write barrier attribute.  However, it is
possible to handle this without breaking existing userspace, by having
the mthca kernel driver recognize whether it is talking to old or new
userspace, depending on the size of the register MR structure passed in.

The only potential drawback of this is that is allows old userspace
(which has a bug with DMA ordering on large SGI Altix systems) to
continue to run on new kernels, but the advantage of allowing old
userspace to continue to work on unaffected systems seems to outweigh
this, and we can print a warning to push people to upgrade their
userspace.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Olaf Kirch 0bfe151cc4 IB/mthca: Avoid recycling old FMR R_Keys too soon
When a FMR is unmapped, mthca resets the map count to 0, and clears
the upper part of the R_Key which is used as the sequence counter.

This poses a problem for RDS, which uses ib_fmr_unmap as a fence
operation.  RDS assumes that after issuing an unmap, the old R_Keys
will be invalid for a "reasonable" period of time. For instance,
Oracle processes uses shared memory buffers allocated from a pool of
buffers.  When a process dies, we want to reclaim these buffers -- but
we must make sure there are no pending RDMA operations to/from those
buffers.  The only way to achieve that is by using unmap and sync the
TPT.

However, when the sequence count is reset on unmap, there is a high
likelihood that a new mapping will be given the same R_Key that was
issued a few milliseconds ago.

To prevent this, don't reset the sequence count when unmapping a FMR.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Stefan Roscher d227fa7288 IB/ehca: Allocate event queue size depending on max number of CQs and QPs
If a lot of QPs fall into Error state at once and the EQ of the
respective HCA is too small, it might overrun, causing the eHCA driver
to stop processing completion events and calling the application's
completion handlers, effectively causing traffic to stop.

Fix this by limiting available QPs and CQs to a customizable max
count, and determining EQ size based on these counts and a worst-case
assumption.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Eli Cohen f56bcd8013 IPoIB: Use separate CQ for UD send completions
Use a dedicated CQ for UD send completions. Also, do not arm the UD
send CQ, which reduces the number of interrupts generated.  This patch
farther reduces overhead by not calling poll CQ for every posted send
WR -- it does polls only when there 16 or more outstanding work requests.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Eli Dorfman 87528227df IB/iser: Count FMR alignment violations per session
Count FMR alignment violations per session as part of the iscsi
statistics.

Signed-off-by: Eli Dorfman <elid@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Eli Dorfman 6f735e36ba IB/iser: Move high-volume debug output to higher debug level
Add another level for debug.

Signed-off-by: Eli Dorfman <elid@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Hoang-Nam Nguyen 7df109d917 IB/ehca: handle negative return value from ibmebus_request_irq() properly
ehca_create_eq() was assigning a signed return value to an unsiged
local variable and then checking if the variable was < 0, which meant
that errors were always ignored.  Fix this by using one variable for
signed integer return values and another for u64 hcall return values.

Bug originally found by Roel Kluin <12o3l@tiscali.nl>.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Steve Wise f8b0dfd152 RDMA/cxgb3: Support peer-2-peer connection setup
Open MPI, Intel MPI and other applications don't respect the iWARP
requirement that the client (active) side of the connection send the
first RDMA message.  This class of application connection setup is
called peer-to-peer.  Typically once the connection is setup, _both_
sides want to send data.

This patch enables supporting peer-to-peer over the chelsio RNIC by
enforcing this iWARP requirement in the driver itself as part of RDMA
connection setup.

Connection setup is extended, when the peer2peer module option is 1,
such that the MPA initiator will send a 0B Read (the RTR) just after
connection setup.  The MPA responder will suspend SQ processing until
the RTR message is received and reply-to.

In the longer term, this will be handled in a standardized way by
enhancing the MPA negotiation so peers can indicate whether they
want/need the RTR and what type of RTR (0B read, 0B write, or 0B send)
should be sent.  This will be done by standardizing a few bits of the
private data in order to negotiate all this.  However this patch
enables peer-to-peer applications now and allows most of the required
firmware and driver changes to be done and tested now.

Design:

 - Add a module option, peer2peer, to enable this mode.

 - New firmware support for peer-to-peer mode:

	- a new bit in the rdma_init WR to tell it to do peer-2-peer
	  and what form of RTR message to send or expect.

	- process _all_ preposted recvs before moving the connection
	  into rdma mode.

	- passive side: defer completing the rdma_init WR until all
	  pre-posted recvs are processed.  Suspend SQ processing until
	  the RTR is received.

	- active side: expect and process the 0B read WR on offload TX
	  queue. Defer completing the rdma_init WR until all
	  pre-posted recvs are processed.  Suspend SQ processing until
	  the 0B read WR is processed from the offload TX queue.

 - If peer2peer is set, driver posts 0B read request on offload TX
   queue just after posting the rdma_init WR to the offload TX queue.

 - Add CQ poll logic to ignore unsolicitied read responses.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Steve Wise ccaf10d0ad RDMA/cxgb3: Set the max_mr_size device attribute correctly
cxgb3 only supports 4GB memory regions.  The lustre RDMA code uses
this attribute and currently has to code around our bad setting.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Steve Wise 989a178069 RDMA/cxgb3: Correctly serialize peer abort path
Open MPI and other stress testing exposed a few bad bugs in handling
aborts in the middle of a normal close.  Fix these by:

 - serializing abort reply and peer abort processing with disconnect
   processing

 - warning (and ignoring) if ep timer is stopped when it wasn't running

 - cleaning up disconnect path to correctly deal with aborting and
   dead endpoints

 - in iwch_modify_qp(), taking a ref on the ep before releasing the qp
   lock if iwch_ep_disconnect() will be called.  The ref is dropped
   after calling disconnect.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:51 -07:00
Yevgeny Petrilin e463c7b197 mlx4_core: Add a way to set the "collapsed" CQ flag
Extend the mlx4_cq_resize() API with a way to set the "collapsed" flag
for the CQ being created.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:50 -07:00
Arthur Kepner cb9fbc5c37 IB: expand ib_umem_get() prototype
Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
when mapping user-allocated CQs with ib_umem_get().

Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Roland Dreier 31d1e340f0 RDMA/nes: Remove volatile qualifier from struct nes_hw_cq.cq_vbase
Remove the volatile qualifier from the cq_vbase member of struct
nes_hw_cq, and add an rmb() in the one place where it looks like
access order might make a difference.  As usual, removing a volatile
qualifier in a declaration is actually a bug fix, since a volatile
qualifier is not sufficient to make sure that aggressively
out-of-order CPUs don't reorder things and cause incorrect results.

For example, a CPU might speculatively execute reads of other cqe
fields before the NIC hardware has written those fields and before it
has set the NES_CQE_VALID bit (even though those reads come after the
test of the NES_CQE_VALID bit in program order), but then when the CPU
actually executes the conditional test of the NES_CQE_VALID, the bit
has been set, and the CPU will proceed with the results of the earlier
speculative execution and end up using bogus data.

This also gets rid of the warning:

    drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_destroy_cq':
    drivers/infiniband/hw/nes/nes_verbs.c:1978: warning: passing argument 3 of 'pci_free_consistent' discards qualifiers from pointer target type

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Yevgeny Petrilin 6296883ca4 mlx4_core: Move kernel doorbell management into core
In addition to mlx4_ib, there will be ethernet and FC consumers of
mlx4_core, so move the code for managing kernel doorbells into the
core module to avoid having to duplicate this multiple times.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Joachim Fenkes 14fb05b349 IB/ehca: Bump version number to 0026
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Joachim Fenkes 0455e36d81 IB/ehca: Make some module parameters bool, update descriptions
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Joachim Fenkes a7607c9b11 IB/ehca: Remove mr_largepage parameter
Always enable large page support; didn't seem to cause problems for anyone.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Joachim Fenkes 4da27d6d5b IB/ehca: Move high-volume debug output to higher debug levels
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Joachim Fenkes 863fb09fbf IB/ehca: Prevent posting of SQ WQEs if QP not in RTS
...as required by IB Spec, C10-29.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Shirley Ma bc7b3a36ba IPoIB: Handle 4K IB MTU for UD (datagram) mode
This patch enables IPoIB to use 4K UD messages (when the underlying
device and fabrics support a 4K MTU) by using two scatter buffers when
PAGE_SIZE is less than or equal to thhe HCA IB MTU size.  The first
buffer is for IPoIB header + GRH header, and the second buffer is the
IPoIB payload, which is 4K-4.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Chien Tung bc5698f3ec RDMA/nes: Fix adapter reset after PXE boot
After PXE boot, the iw_nes driver does a full reset to ensure the card
is in a clean state.  However, it doesn't wait for firmware to
complete its work before issuing a port reset to enable the ports,
which leads to problems bringing up the ports.

The solution is to wait for firmware to complete its work before
proceeding with port reset.

This bug was flagged by Roland Dreier <rolandd@cisco.com>.

Cc: <stable@kernel.org>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Roland Dreier e447703123 RDMA/nes: Print IPv4 addresses in a readable format
Use NIPQUAD_FMT instead of printing raw 32-bit hex quantities in
debugging output.

Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:43 -07:00
Roland Dreier 2bd01c5d2e RDMA/nes: Use print_mac() to format ethernet addresses for printing
Removing open-coded MAC formats shrinks the source and the generated
code too, eg on x86-64:

add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-103 (-103)
function                                     old     new   delta
make_cm_node                                 932     912     -20
nes_netdev_set_mac_address                   427     406     -21
nes_netdev_set_multicast_list               1148    1124     -24
nes_probe                                   2349    2311     -38

Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:52:18 -07:00
Roland Dreier bc751fe6ff IB/ipath: Correct capitalization "IntX" -> "INTx"
Match what the PCI specification uses.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:15 -07:00
Roland Dreier 44957572cc IB/ipath: Remove tests of PCI_MSI in ipath_iba7220.c
The PCI MSI interface is stubbed out properly so that all the
functions just return failure if PCI_MSI=n, so there's no reason to
have "#ifdef CONFIG_PCI_MSI" blocks in ipath_iba7220.c.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:15 -07:00
Roland Dreier 480f58e614 IB/ipath: Remove dependency on PCI_MSI || HT_IRQ
Before IBA7220 support was added, the ipath driver didn't support any
hardware unless PCI_MSI and/or HT_IRQ was enabled.  However, the
IBA7220 can generate INTx interrupts, so it makes sense to allow the
driver to be build even if PCI_MSI=n and HT_IRQ=n.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:14 -07:00
Roland Dreier 37a6ab5227 IB/ipath: Build IBA7220 code unconditionally
The new IBA7220 code added a call to ipath_init_iba7220_funcs() that
is compiled unconditionally, but only built the IBA7220 code if
PCI_MSI is enabled.  Fix this by building the IBA7220 file
unconditonally.

This fixes build breakage when PCI_MSI=n, HT_IRQ=y and
INFINIBAND_IPATH=y reported by Ingo Molnar <mingo@elte.hu>:

 drivers/built-in.o: In function `ipath_init_one':
 ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs'

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:14 -07:00
Roland Dreier 88a8317bcd IB/ipath: Remove reference to dev->class_dev
Commit 124b4dcb ("IB/ipath: add calls to new 7220 code and enable in                                
build") inadvertently added core to set dev->class_dev.dev back into                                
ib_ipath.  This is completely redundant since commit 1912ffbb ("IB: Set                             
class_dev->dev in core for nice device symlink"), which removed                                     
class_dev setting from low-level drivers, and also will break the build
when class_dev is removed completely from struct ib_device.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:14 -07:00
Paul Bolle 9862874d21 IB/ipath: Fix module parameter description for disable_sma
Describe disable_sma parameter with its name rather than the internal
ib_ipath_disable_sma variable name, so that the description shows up
properly in modinfo.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:13 -07:00
Roland Dreier 6a5546e76c RDMA/nes: Remove unneeded function declarations
Remove redundant static declarations of functions that are defined
before they are used in the source.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:12 -07:00
Linus Torvalds e80ab411e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
  SCSI: convert struct class_device to struct device
  DRM: remove unused dev_class
  IB: rename "dev" to "srp_dev" in srp_host structure
  IB: convert struct class_device to struct device
  memstick: convert struct class_device to struct device
  driver core: replace remaining __FUNCTION__ occurrences
  sysfs: refill attribute buffer when reading from offset 0
  PM: Remove destroy_suspended_device()
  Firmware: add iSCSI iBFT Support
  PM: Remove legacy PM (fix)
  Kobject: Replace list_for_each() with list_for_each_entry().
  SYSFS: Explicitly include required header file slab.h.
  Driver core: make device_is_registered() work for class devices
  PM: Convert wakeup flag accessors to inline functions
  PM: Make wakeup flags available whenever CONFIG_PM is set
  PM: Fix misuse of wakeup flag accessors in serial core
  Driver core: Call device_pm_add() after bus_add_device() in device_add()
  PM: Handle device registrations during suspend/resume
  block: send disk "change" event for rescan_partitions()
  sysdev: detect multiple driver registrations
  ...

Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
2008-04-21 15:49:58 -07:00
Tony Jones ee959b00c3 SCSI: convert struct class_device to struct device
It's big, but there doesn't seem to be a way to split it up smaller...

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:33 -07:00
Greg Kroah-Hartman 0532193746 IB: rename "dev" to "srp_dev" in srp_host structure
This sets us up to be able to convert the srp_host to use a struct
device instead of a class_device.

Based on a original patch from Tony Jones, but split up into this piece
by Greg.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:30 -07:00
Tony Jones f4e91eb4a8 IB: convert struct class_device to struct device
This converts the main ib_device to use struct device instead of struct
class_device as class_device is going away.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:30 -07:00
Matthew Wilcox 6188e10d38 Convert asm/semaphore.h users to linux/semaphore.h
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-18 22:22:54 -04:00
Matthew Wilcox d3135846f6 drivers: Remove unnecessary inclusions of asm/semaphore.h
None of these files use any of the functionality promised by
asm/semaphore.h.  It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-18 22:16:32 -04:00
Erez Zilber 0a22ab92f5 IB/iser: Don't change itt endianness
The itt field in struct iscsi_data is not defined with any particular
endianness.  open-iscsi should use it as-is without byte-swapping it.
This fixes sparse warnings coming from doing ntohl(hdr->itt).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Jack Morgenstein 068c4ea1bb IB/mlx4: Update module version and release date
The mlx4_ib driver is stable enough for production use, so bump the
version number to 1.0 to indicate this.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Roland Dreier 9fdd5e5bf6 IPoIB: Handle case when P_Key is deleted and re-added at same index
If a P_Key is deleted and then re-added at the same index, then IPoIB
gets confused because __ipoib_ib_dev_flush() only checks whether the
index is the same without checking whether the P_Key was present, so
the interface is stopped when the P_Key is deleted, but the event when
the P_Key is re-added gets ignored and the interface never gets
restarted.

Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually.  This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.

Thanks to Venkata Subramonyam <vsubramo@cisco.com> for debugging this
problem and testing this fix.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Erez Zilber d97c51707d IB/iser: Release connection resources on RDMA_CM_EVENT_DEVICE_REMOVAL event
When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should
release the connection resources.

This is necessary when the IB HCA module is unloaded while open-iscsi
is still running.  Currently, iSER just BUG()s.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Stefan Roscher c83b5b1cb2 IB/ehca: Support all ibv_devinfo values in query_device() and query_port()
Also, introduce a few inline helper functions to make the code more readable.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Roland Dreier 4cd1e5eb3c RDMA/nes: Free IRQ before killing tasklet
Move the free_irq() call in nes_remove() to before the tasklet_kill();
otherwise there is a window after tasklet_kill() where a new interrupt
can be handled and reschedule the tasklet, leading to a use-after-free
crash.

Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:34 -07:00
Jack Morgenstein 940801b27e IB/mthca: Update module version and release date
The ib_mthca driver has been stable for a while, so bump the version
number to 1.0 to indicate this.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:34 -07:00
Dotan Barak 0df6703095 IB/mlx4: Update QP state if query QP succeeds
If the QP was moved to another state (such as SQE) by the hardware,
then after this change the user won't have to set the IBV_QP_CUR_STATE
mask in order to execute modify QP in order to recover from this state.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:34 -07:00
Dotan Barak 5121df3ae4 IB/mthca: Update QP state if query QP succeeds
If the QP was moved to another state (such as SQE) by the hardware,
then after this change the user won't have to set the IBV_QP_CUR_STATE
mask in order to execute modify QP in order to recover from this state.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:34 -07:00
Tom Tucker 9285faa1e7 RDMA/amso1100: Add check for NULL reply_msg in c2_intr()
Fix a place where we might dereference a NULL pointer; this fixes
Coverity CID 1392.  On inspection I also found a place where we could
attempt to kmem_cache_free() a NULL pointer, so fix this too.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:34 -07:00
Vladimir Sokolovsky bbf8eed1a0 IB/mlx4: Add support for resizing CQs
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen 3fdcb97f0b IB/mlx4: Add support for modifying CQ moderation parameters
Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen 28d52b3cd8 IPoIB: Support modifying IPoIB CQ event moderation
This can be used to tune at run time the parameters controlling the
event (interrupt) generation rate and thus reduce the overhead
incurred by handling interrupts resulting in better throughput.  Since
IPoIB uses a single CQ for both RX and TX, RX is chosen to dictate
configuration for both RX and TX.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen 2dd5716227 IB/core: Add support for modify CQ
Add support for modifying CQ parameters for controlling event
generation moderation.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen 82c24c18af IPoIB: Add basic ethtool support
Just add the infrastructure so we can add functionality later.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Roland Dreier 139b2db795 RDMA/amso1100: Add support for "send with invalidate" work requests
Handle IB_WR_SEND_WITH_INV work requests.

This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Roland Dreier 0f39cf3d54 IB/core: Add support for "send with invalidate" work requests
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions.  Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated.  Add this new union to struct
ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().

Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.

Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit.  The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing).  Remove the flag from
all existing drivers that set it until we know which ones are OK.

The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.

This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Ralph Campbell e7eacd3686 IB/ipath: Update copyright dates for files changed in 2008
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Dave Olson 124b4dcb1d IB/ipath: add calls to new 7220 code and enable in build
This patch adds the initialization calls into the new 7220 HCA files,
changes the Makefile to compile and link the new files, and code to
handle send DMA.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Arthur Jones bb9171448d IB/ipath: Misc changes to prepare for IB7220 introduction
The patch adds a number of minor changes to support newer HCAs:
 - New send buffer control bits
 - New error condition bits
 - Locking and initialization changes
 - More send buffers

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
Arthur Jones 8babfa4fb9 IB/ipath: User mode send DMA
A new file which allows the IBA7220 send DMA engine to be used from
userland.  The routines here are not linked in yet, that will happen in
a follow-on patch...

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
Arthur Jones 909c0faa8f IB/ipath: User mode send DMA header file
A new header file which allows the IBA7220 send DMA engine to be used
from userland.  The definitions here are not used yet, that will happen
in a follow-on patch...

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
John Gregor f7a60d71af IB/ipath: Add code for IBA7220 send DMA
The IBA7220 HCA has a new feature to DMA data to the on chip send
buffers instead of or in addition to the host CPU doing the data
transfer.  This patch adds code to support the send DMA queue.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
Ralph Campbell 2c19643563 IB/ipath: Add IBA7220-specific SERDES initialization data
This patch adds binary data to initialize the IB SERDES.

Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
Michael Albaugh ab0fb2e049 IB/ipath: Support for SerDes portion of IBA7220
The control and initialization of the SerDes blocks of the IBA7220 is
sufficiently complex to merit a separate file.

Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
Ralph Campbell 843e6ab489 IB/ipath: HCA-specific code to support IBA7220
This patch adds the HCA-specific code for the IBA7220 HCA.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Michael Albaugh dd042d59c1 IB/ipath: Isolate 7220-specific content
This patch adds a new ASIC-specific header file for the HCAs using the IBA7220.

Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Ralph Campbell afce688ba9 IB/ipath: Header file changes to support IBA7220
This is part of a patch series to add support for a new HCA.  This patch
adds new fields to the header files.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Ralph Campbell 6bb68835d3 IB/ipath: Fix up error handling
This patch makes chip reset more robust and reduces lock contention
between user and kernel TID register updates.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Dave Olson 9b436eb4f8 IB/ipath: Fix check for no interrupts to reliably fallback to INTx
Newer HCAs support MSI interrupts and also INTx interrupts.  Fix the
code so that INTx can be reliably enabled if MSI interrupts are not
working.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Dave Olson 1d7c2e529f IB/ipath: Enable reduced PIO update for HCAs that support it.
Newer HCAs have a threshold counter to reduce the number of DMAs the
chip makes to update the PIO buffer availability status bits.  This
patch enables the feature.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Dave Olson 0ab6b2b9ab IB/ipath: Set LID filtering for HCAs that support it.
Whenever the LID is set, notify the HCA specific code so that the
appropriate HW registers can be updated. Also log the info on the
console at low priority.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:29 -07:00
Dave Olson b3e8f54107 IB/ipath: Add support for IBTA 1.2 Heartbeat
This patch adds code to enable/disable the IBTA 1.2 heartbeat for testing
if the HCA supports it.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:29 -07:00
Dave Olson 555b203e48 IB/ipath: Make link state transition code ignore (transient) link recovery
The hardware-based recovery doesn't need any intervention, and in a few
cases we can get a bit confused about state and skip steps such as
turning off the link state LED when we consider recovery to be "down".
So ignore this transition, and either we recover in hardware, or we
transition to down, and will handle it then.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:29 -07:00
Ralph Campbell 9355fb6a06 IB/ipath: Add support for 7220 receive queue changes
Newer HCAs have a HW option to write a sequence number to each receive
queue entry and avoid a separate DMA of the tail register to memory.
This patch adds support for these changes.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:29 -07:00
Ralph Campbell 2ba3f56eb4 IB/ipath: Fix some white space and code style issues
This patch makes some white space changes and minor non-functional
changes to more closely match the code in OFED-1.3.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:29 -07:00
Michael Albaugh afd9970f95 IB/ipath: Allow old and new diagnostic packet formats
This patch checks for old and new format writes to send a packet via the
diagnostic interface.

Signed-off-by: Michael Albaugh <Michael.Albaugh@Qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Dotan Barak 7ce5eacb45 IB/core: Check optional verbs before using them
Make sure that a device implements the modify_srq and reg_phys_mr
optional methods before calling them.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Robert P. J. Day b3b8128fd3 IB/ipath: Fix time comparison to use time_after_eq()
Raw comparison against jiffies will fail if jiffies wraps, although
since ipath currently only supports 64-bit architectures, this is rather
far-fetched.  Still, it's better to use time_after_eq().

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Roland Dreier f438000f7a IB/mlx4: Micro-optimize mlx4_ib_post_send()
Rather than have build_mlx_header() return a negative value on failure
and the length of the segments it builds on success, add a pointer
parameter to return the length and return 0 on success.  This matches
the calling convention used for build_lso_seg() and generates slightly
smaller code -- eg, on 64-bit x86:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-22 (-22)
function                                     old     new   delta
mlx4_ib_post_send                           2023    2001     -22

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Eli Cohen b832be1e40 IB/mlx4: Add IPoIB LSO support
Add TSO support to the mlx4_ib driver.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Eli Cohen 40ca1988e0 IPoIB: Add LSO support
For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set
NETIF_F_TSO and use HW LSO to offload TCP segmentation.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Eli Cohen b846f25aa2 IB/core: Add creation flags to struct ib_qp_init_attr
Add a create_flags member to struct ib_qp_init_attr that will allow a
kernel verbs consumer to create a pass special flags when creating a QP.
Add a flag value for telling low-level drivers that a QP will be used
for IPoIB UD LSO.  The create_flags member will also be useful for XRC
and ehca low-latency QP support.

Since no create_flags handling is implemented yet, add code to all
low-level drivers to return -EINVAL if create_flags is non-zero.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Michael Albaugh d84e0b28d3 IB/ipath: EEPROM support for 7220 devices, robustness improvements, cleanup
Add support for reading newer card's EEPROMs while continuing to support
older EEPROMs.

Also, add support for the temperature sensor if present.

Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Ralph Campbell d98b193776 IB/ipath: Use PIO buffer for RC ACKs
This reduces the latency for RC ACKs when a PIO buffer is available.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Ralph Campbell c4b4d16e09 IB/ipath: Make send buffers available for kernel if not allocated to user
A fixed partitioning of send buffers is determined at driver load time
for user processes and kernel use.  Since send buffers are a scarce
resource, it makes sense to allow the kernel to use the buffers if they
are not in use by a user process.

Also, eliminate code duplication for ipath_force_pio_avail_update().

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Michael Albaugh 4330e4dad7 IB/ipath: Prevent link-recovery code from negating admin disable
The link can be put in LINKDOWN_DISABLE state either locally or via a
MAD.  However, the link-recovery code will take it out of that state as
a side-effect of attempts to clear SerDes/XGXS issues.

We add a flag to indicate "link is down on purpose, leave it alone."

Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Ralph Campbell 8c641d4b5f IB/ipath: Remove some useless (void) casts
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Ralph Campbell 928e3e4bb9 IB/ipath: Change the module author
Update the module author to the current email address.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Robert P. J. Day 4e96a77440 RDMA/nes: Use more concise list_for_each_entry()
In list iteration code, you normally wouldn't be calling
"container_of()" directly anyway, you'd be invoking "list_entry()".
But you don't even need that here, "list_for_each_entry()" is fine.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Robert P. J. Day 157de22946 IB: Use shorter list_splice_init() for brevity
Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init()

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Julia Lawall 10f32065a2 RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0
The function rdma_create_id() always returns either a valid pointer or
a value made with ERR_PTR, so its result should be tested with IS_ERR,
not with a test for 0.

The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)

//<smpl>
@a@
expression E, E1;
statement S,S1;
position p;
@@

E = rdma_create_id(...)
... when != E = E1
if@p (E) S else S1

@n@
position a.p;
expression E,E1;
statement S,S1;
@@

E = NULL
... when != E = E1
if@p (E) S else S1

@depends on !n@
expression E;
statement S,S1;
position a.p;
@@

* if@p (E)
  S else S1
//</smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Roland Dreier 4d43653263 RDMA/nes: Remove session_id from nes_cm stuff
The session_id members of struct nes_cm_listener and struct
nes_cm_node are write-only, so remove them.  This allows the
session_id member of struct nes_cm_core to be removed as well, since
it is only used to write those other session_id values.

This removes the use of current->tgid (which will be deprecated)
pointed out by Pavel Emelyanov <xemul@openvz.org>.

Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Roland Dreier 782203884e IB/ipath: Fix PCI config write size used to clear linkctrl error bits
In slave_or_pri_blk(), pci_write_config_byte() is used to write a
16-bit quantity to clear linkctrl CRC error bits.  This is clearly a
bug and also causes the warning

    drivers/infiniband/hw/ipath/ipath_iba6110.c: In function 'slave_or_pri_blk':
    drivers/infiniband/hw/ipath/ipath_iba6110.c:849: warning: overflow in implicit constant conversion

Fix this by using pci_write_config_word() instead.

Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Ralph Campbell 10a8c3cd01 IB/ipath: Fix sanity checks on QP number of WRs and SGEs
The receive queue number of WRs and SGEs shouldn't be checked if a
SRQ is specified.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Ralph Campbell 69bd74c696 IB/ipath: Remove useless comments
Remove useless comment about list removal since locks are held and
the code checks that the QP is on the list before removing it.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Dave Olson 72708a0a2b IB/ipath: HW workaround for case where chip can send but not receive
Workaround a QLE7140 problem that in rare cases causes flow control
problems after link recovery by forcing a link retrain after recovery.
A module parameter is provided to control the behavior in case it causes
problems.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Ralph Campbell a51a2513a8 IB/ipath: Add code to support multiple link speeds and widths
This patch adds code to get/set portinfo to support multiple link speeds
and widths.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:24 -07:00
John Gregor 58411d1c01 IB/ipath: Head of Line blocking vs forward progress of user apps
There's a conflict between our need to quiesce PSM-based applications
to avoid HoL blocking when the IB link goes down and the apps' desire
to remain running so that their quiescence timout mechanism can keep
running.

The compromise is to STOP the processes for a fixed period of time and
then alternate between CONT and STOP until the link is again active.

If there are poor interactions with subnet manager configuration at a
given site, the interval can be adjusted via a module paramter.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:24 -07:00
Ralph Campbell 6be979d71a IB/ipath: Make debug error message match the constraint that is checked for
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:24 -07:00
Ralph Campbell c1702be20f IB/ipath: Don't try to handle freeze mode HW errors if diagnostic mode
Don't try to handle freeze mode HW errors if the driver is in diagnostic
mode since some tests can cause errors that shouldn't be processed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:14 -07:00
Arthur Jones b848882153 IB/ipath: Fix link up LED display
The check for link up was incorrect, thus setting the LED display
inconsistently with the link state.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
Ralph Campbell 8bae0ff259 IB/ipath: Fix error recovery for send buffer status after chip freeze mode
The error recovery code for updating the driver's cached status information
for which send buffers are busy or free wasn't updated for IBA7220.
It should be similar to the initialization code in enable_chip().

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
Ralph Campbell 0349d16620 IB/ipath: Fix byte order of pioavail in handle_errors()
Fix byte order of value assigned to pioavailshadow.  This bug was
detected by sparse endianness warnings.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
Roland Dreier c263ff65d5 IB/mthca: Avoid integer overflow when allocating huge ICM table
In mthca_alloc_icm_table(), the number of entries to allocate for the
table->icm array is computed by calculating obj_size * nobj and then
dividing by MTHCA_TABLE_CHUNK_SIZE.  If nobj is really large, then
obj_size * nobj may overflow and the division may get the wrong value
(even a negative value).  Fix this by calculating the number of
objects per chunk and then dividing nobj by this value instead.

This patch allows crazy configurations such as loading ib_mthca with
the module parameter num_mtt=33554432 to work properly.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
Roland Dreier 19773539d6 IB/mthca: Avoid integer overflow when dealing with profile size
mthca_make_profile() returns the size in bytes of the HCA context
layout it creates, or a negative value if an error occurs.  However,
the return value is declared as u64 and the memfree initialization
path casts this value to int to test if it is negative.  This makes it
think incorrectly than an error has occurred if the context size
happens to be bigger than 2GB, since this turns into a negative int.

Fix this by having mthca_make_profile() return an s64 and testing
for an error by checking whether this 64-bit value itself is negative.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
Hoang-Nam Nguyen f4f82994d1 IB/ehca: Remove tgid checking
Pavel Emelyanov <xemul@openvz.org> mentioned in <http://lkml.org/lkml/2008/3/17/131>
that the task_struct->tgid field is about to become deprecated, so the
uses in the ehca driver need to be fixed up.

However, all the uses in ehca are for some object ownership checking
that is not really needed, and anyway is implementing a policy that
should be in common code rather than a low-level driver.  So just
remove all the checks.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
David Dillow 1e89a1946c IB/srp: Enforce protocol limit on srp_sg_tablesize
The current SRP initiator will allow unlimited s/g entries in the
indirect descriptors lists, but the entry count field in the SRP_CMD
request is 8 bits, so setting srp_sg_tablesize too large will open the
possibility of wrapping the count and generating invalid requests.

Clamp srp_sg_tablesize to the protocol limits to prevent surprises.

Reported by Martin W. Schlining III <mschlining@datadirectnet.com>.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Dave Olson 826d801009 IB/ipath: Enable 4KB MTU
Enable use of 4KB MTU.  Since the driver uses more pinned memory for
receive buffers when the 4KB MTU is enabled, whether or not the fabric
supports that MTU, add a "mtu4096" module parameter that can be used to
limit the MTU to 2KB when it is known that 4KB MTUs can't be used
anyway.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Dave Olson 5d1ce03dd3 IB/ipath: Shared context code needs to be sure device is usable
The code was checking if units are present, but not that present units
were usable (link up, etc.)

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Arthur Jones 6ca2abf4c0 IB/ipath: Provide I/O bus speeds for diagnostic purposes
Modern I/O buses like PCIe and HT can be configured for multiple speeds
and widths.  When an ipath HCA seems to have lower than expected
performance, it is very useful to be able to display what the driver
thinks the bus speed is.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Dave Olson f2ceb4929a IB/ipath: Make some constants chip-specific, related cleanup
This patch makes some constants chip-specific, and makes some related
changes to prepare for supporting another HCA.

Signed-off-by: Dave Olson <dave.olson@qlogic.com
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Arthur Jones 3dd59e226e IB/ipath: Misc sparse warning cleanup
Recent sparse versions and kernel cleanups knock down the false positive
rate of the ipath driver code to a point where having it be sparse clean
is worthwhile. Here we fixup the sparse warnings.  Some of these warnings
(and the impetus to run sparse again) are due to work by Roland Dreier.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:11 -07:00
Eli Cohen 680b575f6d IB/mthca: Add IPoIB checksum offload support
Arbel and Sinai devices support checksum generation and verification
of TCP and UDP packets for UD IPoIB messages.  This patch checks if
the HCA supports this and sets the IB_DEVICE_UD_IP_CSUM capability
flag if it does.  It implements support for handling the IB_SEND_IP_CSUM
send flag and setting the csum_ok field in receive work completions.

Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:11 -07:00
Eli Cohen 8ff095ec4b IB/mlx4: Add IPoIB checksum offload support
ConnectX devices support checksum generation and verification of TCP
and UDP packets for UD IPoIB messages.  This patch checks if the HCA
supports this and sets the IB_DEVICE_UD_IP_CSUM capability flag if it
does.  It implements support for handling the IB_SEND_IP_CSUM send
flag and setting the csum_ok field in receive work completions.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Ali Ayub <ali@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Eli Cohen 6046136c74 IPoIB: Use checksum offload support if available
For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM
in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and
use the HCA to generate and verify IP checksums.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Harvey Harrison 3371836383 IB: Replace remaining __FUNCTION__ occurrences with __func__
__FUNCTION__ is gcc-specific, use __func__ instead.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Roland Dreier e8e91f6b4d IB/ehca: Make symbols used only in a single source file static
Allow the compiler to optimize better and generate smaller code:

add/remove: 0/6 grow/shrink: 2/0 up/down: 1528/-1864 (-336)
function                                     old     new   delta
.ehca_set_pagebuf                           1344    2172    +828
.ehca_probe                                 2312    3012    +700
ehca_set_pagebuf_phys                         24       -     -24
ehca_set_pagebuf_fmr                          24       -     -24
ehca_init_device                              24       -     -24
.ehca_set_pagebuf_fmr                        480       -    -480
.ehca_set_pagebuf_phys                       512       -    -512
.ehca_init_device                            800       -    -800

Also this fixes warnings like:

    drivers/infiniband/hw/ehca/ehca_mrmw.c:2015:5: warning: symbol 'ehca_set_pagebuf_fmr' was not declared. Should it be static?

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Roland Dreier 1a855fbfb6 RDMA/nes: Make symbols used only in a single source file static
Avoid namespace pollution and allow the compiler to optimize better.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:09 -07:00
Roland Dreier 71e0957c62 RDMA/nes: Use proper format and cast to print dma_addr_t
On some platforms, eg sparc64, dma_addr_t is not the same size as a
pointer, so printing dma_addr_t values by casting to void * and using
a %p format generates warnings.  Fix this by casting to unsigned long
and using %lx instead.  This fixes the warnings:

    drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_setup_virt_qp':
    drivers/infiniband/hw/nes/nes_verbs.c:1047: warning: cast to pointer from integer of different size
    drivers/infiniband/hw/nes/nes_verbs.c:1078: warning: cast to pointer from integer of different size
    drivers/infiniband/hw/nes/nes_verbs.c:1078: warning: cast to pointer from integer of different size
    drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_reg_user_mr':
    drivers/infiniband/hw/nes/nes_verbs.c:2657: warning: cast to pointer from integer of different size

Reported by Andrew Morton <akpm@linux-foundation.org>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:09 -07:00
Roland Dreier 9d84ab9c7e RDMA/nes: Remove unused nes_netdev_exit() function
nes_netdev_exit() has no callers, so delete it.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:09 -07:00
Roland Dreier 5bd8341ce2 RDMA/nes: Remove redundant NULL check in nes_unregister_ofa_device()
nes_unregister_ofa_device() dereferences the nesibdev pointer before
testing if it's NULL.  Also, the test is doubly redundant because the
only caller of nes_unregister_ofa_device() is nes_destroy_ofa_device(),
which already tests if nesibdev is NULL.  Remove the unnecessary test.

This was spotted by the Coverity checker (CID 2190).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:09 -07:00
Roland Dreier a7dab9e887 IB/uverbs: Use alloc_file() instead of get_empty_filp()
Christoph Hellwig wants to unexport get_empty_filp(), which is an ugly
internal interface.  Change the modular user in ib_uverbs_alloc_event_file()
to use the better alloc_file() interface; this makes the code cleaner too.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier 1ae5c187ac IB/uverbs: Don't store struct file * for event files
The file member of struct ib_uverbs_event_file was only used to keep
track of whether the file had been closed or not.  The only thing we
ever did with the value was check if it was NULL or not.  Simplify the
code and get rid of the need to keep track of the struct file * we
allocate by replacing the file member with an is_closed member.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier 37608eea86 mlx4_core: Fix confusion between mlx4_event and mlx4_dev_event enums
The struct mlx4_interface.event() method was supposed to get an enum
mlx4_dev_event, but the driver code was actually passing in the
hardware enum mlx4_event values.  Fix up the callers of
mlx4_dispatch_event() so that they pass in the right type of value,
and fix up the event method in mlx4_ib so that it can handle the enum
mlx4_dev_event values.

This eliminates the need for the subtype parameter to the event
method, so remove it.

This also fixes the sparse warning

    drivers/net/mlx4/intf.c:127:48: warning: mixing different enum types
    drivers/net/mlx4/intf.c:127:48:     int enum mlx4_event  versus
    drivers/net/mlx4/intf.c:127:48:     int enum mlx4_dev_event

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier 26c4fc26d0 RDMA/amso1100: Endian annotate mqsq allocator
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier dc544bc9cb RDMA/amso1100: Start of endianness annotation
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
2008-04-16 21:01:08 -07:00
Roland Dreier d23b9d8ff2 RDMA/nes: Delete unused variables
None of the cqp_reqs_XXX counters were ever used anywhere, and neither
was the nics_per_function variable.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier b30db1c186 RDMA/nes: Trivial endianness annotations
Fix a couple of htonl() that should really be ntohl().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:07 -07:00
Roland Dreier 9cda779cc2 RDMA/ucma: Endian annotation
Add __force cast of node_guid to __u64, since we are sticking it into a
structure whose definition is shared with userspace.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:07 -07:00
Roland Dreier a88f488857 IB/cm: Endianness annotations
Mostly update the RB tree comparisons to force __be types to normal
integers, but the change to cm_format_sidr_req() is a real fix:
param->path->pkey is already __be16.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2008-04-16 21:01:07 -07:00
Roland Dreier d2ae16d576 IB/mlx4: Endianness annotations
Trivial fixes to stamp_send_wqe().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:07 -07:00
Roland Dreier 6358ae25fd IB/ipath: Fix sparse warning about shadowed symbol
Fix

    drivers/infiniband/hw/ipath/ipath_init_chip.c:526:10: warning: symbol 'val' shadows an earlier one
    drivers/infiniband/hw/ipath/ipath_init_chip.c:473:6: originally declared here

by giving the second val a different name.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Arthur Jones <arthur.jones@qlogic.com>
2008-04-16 21:01:07 -07:00
Arthur Jones 6ef6aee2f0 IB/ipath: Fix sparse warning about pointer signedness
There's no reason for the third parameter of ipath_count_units() to be
a u32 *, so change it to be an int * instead.  This fixes the sparse
warning:

    drivers/infiniband/hw/ipath/ipath_file_ops.c:1654:47: warning: incorrect type in argument 3 (different signedness)
    drivers/infiniband/hw/ipath/ipath_file_ops.c:1654:47:    expected unsigned int [usertype] *maxportsp
    drivers/infiniband/hw/ipath/ipath_file_ops.c:1654:47:    got int *<noident>

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:06 -07:00
Roland Dreier edba846af9 RDMA/cxgb3: IDR IDs are signed
Fix sparse warnings about pointer signedness by using a signed int when
calling idr_get_new_above().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
2008-04-16 21:01:06 -07:00
Roland Dreier 4b29043921 RDMA/amso1100: Don't use 0UL as a NULL pointer
Write tests for NULL pointers as

	if (!ptr)

instead of

	if (ptr == 0UL)

to fix sparse warnings.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
2008-04-16 21:01:06 -07:00
Roland Dreier 5d5e815db9 IB/mlx4: Convert "if(foo)" to "if (foo)"
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:04 -07:00
Roland Dreier b39993936d IB/mthca: Formatting cleanups
Fix a few whitespace and other coding style problems.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:03 -07:00
Al Viro 1b90c137cc trivial endianness annotations: infiniband core
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30 14:20:24 -07:00
Roland Dreier 1f71f50342 RDMA/cxgb3: Program hardware IRD with correct value
Because of a typo in iwch_accept_cr(), the cxgb3 connection handling
code programs the hardware IRD (incoming RDMA read queue depth) with
the value that is passed in for the ORD (outgoing RDMA read queue
depth).  In particular this means that if an application passes in IRD
> 0 and ORD = 0 (which is a completely sane and valid thing to do for
an app that expects only incoming RDMA read requests), then the
hardware will end up programmed with IRD = 0 and the app will fail in
a mysterious way.

Fix this by using "ep->ird" instead of "ep->ord" in the intended place.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28 10:45:32 -07:00
Chien Tung f2b2b59b93 RDMA/nes: Fix MSS calculation on RDMA path
Fix the calculation of the MSS for RDMA connections: we need to
allow space in frames for a VLAN tag too.

Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-21 13:59:28 -07:00
Roland Dreier 10313cbb92 IPoIB: Allocate priv->tx_ring with vmalloc()
Commit 7143740d ("IPoIB: Add send gather support") made struct
ipoib_tx_buf significantly larger, since the mapping member changed
from a single u64 to an array with MAX_SKB_FRAGS + 1 entries.  This
means that allocating tx_rings with kzalloc() may fail because there
is not enough contiguous memory for the new, much bigger size.  Fix
this regression by allocating the rings with vmalloc() instead.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-12 07:51:03 -07:00
Roland Dreier 4200406b8f IPoIB/cm: Set tx_wr.num_sge in connected mode post_send()
Commit 7143740d ("IPoIB: Add send gather support") made it possible
for tx_wr.num_sge to be != 1 -- this happens if send gather support is
enabled.  However, the code in the connected mode post_send() function
assumes the old invariant, namely that tx_wr.num_sge is always 1.  Fix
this by explicitly setting tx_wr.num_sge to 1 in the CM post_send().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 18:35:20 -07:00
Or Gerlitz b3e2749bf3 IPoIB: Don't drop multicast sends when they can be queued
When set_multicast_list() is called the multicast task is restarted
and the IPOIB_MCAST_STARTED bit is cleared.  As a result for some
window of time, multicast packets are not transmitted nor queued but
rather dropped by ipoib_mcast_send().  These dropped packets are
painful in two cases:

 - bonding fail-over which both calls set_multicast_list() on the new
   active slave and sends Gratuitous ARP through that slave.

 - IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
   device and issues IGMP leave.

In both these cases, depending on the scheduling of the IPoIB
multicast task, the packets would be dropped.  As a result, in the
bonding case, the failover would not be detected by the peers until
their neighbour is renewed the neighbour (which takes a few tens of
seconds).  In the IGMP case, the IP router doesn't get an IGMP leave
and would only learn on that from further probes on the group (also a
delay of at least a few tens of seconds).

Fix this by allowing transmission (or queuing) depending on the
IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.

Signed-off-by: Olga Shern <olgas@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 14:12:03 -07:00
Patrick Marchand Latifi 450bb3875f IB/ipath: Reset the retry counter for RDMA_READ_RESPONSE_MIDDLE packets
Reset the retry counter when we get a good RDMA_READ_RESPONSE_MIDDLE
packet.  This fix will prevent the requester from reporting a retry
exceeded error too early.

Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
2008-03-11 14:04:35 -07:00
Patrick Marchand Latifi 2a049e514b IB/ipath: Fix error completion put on send CQ instead of recv CQ
A work completion entry could be placed on the wrong completion
queue when an RC QP is placed in the error state.

Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 14:03:54 -07:00
Patrick Marchand Latifi 4cd5060cf7 IB/ipath: Fix RC QP initialization
This patch fixes the initialization of RC QPs, since we would rely on
the queue pair type (ibqp->qp_type) being set, but this field is only
initialized when we return from ipath_create_qp (it is initialized by
the user-level verbs library).

The fix is to not depend on this field to initialize the send and
the receive state of the RC QP.

Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 14:02:32 -07:00
Patrick Marchand Latifi 87d5aed85b IB/ipath: Fix potentially wrong RNR retry counter returned in ipath_query_qp()
There can be a case where the requester's rnr retry counter
(s_rnr_retry) is less than the number of rnr retries allowed per QP
(s_rnr_retry_cnt).  This can happen if the s_rnr_retry counter is being
decremented and an ipath_query_qp call is issued during that time frame.
The fix is to always return the number of rnr retries allowed per QP
instead of the requester's rnr counter.

Found by code review.

Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 14:01:14 -07:00
Ralph Campbell 140277e9a7 IB/ipath: Fix IB compliance problems with link state vs physical state
Subnet manager SetPortinfo messages distingush between changing the link
state (DOWN, ARM, ACTIVE) and the link physical state (POLL, SLEEP,
DISABLED).  These are somewhat independent commands and affect when link
width and speed changes take effect.  Without this patch, a link DOWN
physical state NOP command was causing the link width and speed settings
to take effect which should only happen when the link physical state is
goes down (either by a SMP or some link physical error like link errors
exceeding the threshold).

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 13:58:22 -07:00
Steve Wise d7c1fbd660 RDMA/iwcm: Don't access a cm_id after dropping reference
cm_work_handler() can access cm_id_priv after it drops its reference
by calling iwch_deref_id(), which might cause it to be freed.  The fix
is to look at whether IWCM_F_CALLBACK_DESTROY is set _before_ dropping
the reference.  Then if it was set, free the cm_id on this thread.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-10 21:22:22 -07:00
Arne Redlich d33ed425c6 IB/iser: Handle iser_device allocation error gracefully
"iser_device" allocation failure is "handled" with a BUG_ON() right
before dereferencing the NULL-pointer - fix this!

Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
2008-03-10 21:17:51 -07:00
Arne Redlich 9a378270c0 IB/iser: Fix list iteration bug
The iteration through the list of "iser_device"s during device
lookup/creation is broken -- it might result in an infinite loop if
more than one HCA is used with iSER.  Fix this by using
list_for_each_entry() instead of the open-coded flawed list iteration
code.

Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-10 21:15:49 -07:00
Jon Mason 4fa45725df RDMA/cxgb3: Fix iwch_create_cq() off-by-one error
The cxbg3 driver is unnecessarily decreasing the number of CQ entries by
one when creating a CQ.  This will cause the CQ not to have as many
entries as requested by the user if the user requests a power of 2 size.

Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-09 13:54:12 -07:00
Jon Mason 1bab74e691 RDMA/cxgb3: Return correct max_inline_data when creating a QP
Set cap.max_inline_data to the actual max inline data that the adapter
support, so that userspace apps see the right value returned.

Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:53:18 -08:00
Pete Wyckoff 331552925d IB/fmr_pool: Flush all dirty FMRs from ib_fmr_pool_flush()
Commit a3cd7d90 ("IB/fmr_pool: ib_fmr_pool_flush() should flush all
dirty FMRs") caused a regression for iSER and was reverted in
e5507736.

This change attempts to redo the original patch so that all used FMR
entries are flushed when ib_flush_fmr_pool() is called without
affecting the normal FMR pool cleaning thread.  Simply move used
entries from the clean list onto the dirty list in ib_flush_fmr_pool()
before letting the cleanup thread do its job.

Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:31:48 -08:00
Pete Wyckoff 35fb5340e3 Revert "IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs"
This reverts commit a3cd7d9070.

The original commit breaks iSER reliably, making it complain:

    iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11

The FMR cleanup thread runs ib_fmr_batch_release() as dirty entries
build up.  This commit causes clean but used FMR entries also to be
purged.  During that process, another thread can see that there are no
free FMRs and fail, even though there should always have been enough
available.

Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:29:19 -08:00
Sean Hefty 84ba284cd7 IB/cm: Flush workqueue when removing device
When a CM MAD is received, it is queued to a CM workqueue for
processing.  The queued work item references the port and device on
which the MAD was received.  If that device is removed from the system
before the work item can execute, the work item will reference freed
memory.

To fix this, flush the workqueue after unregistering to receive MAD,
and before the device is be freed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:27:52 -08:00
John Lacombe 4b1cc7e7ca RDMA/nes: Fix interrupt moderation low threshold
Interrupt moderation low threshold value was incorrectly triggering,
indicating that the threshold should be lowered.

The impact was the timer was likely to become 40usecs and get stuck
there.  The biggest side effect was too many interrupts and nonoptimal
performance.

Signed-off-by: John Lacombe <jlacombe@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:29 -08:00
Faisal Latif 30da7cff87 RDMA/nes: Fix CRC endianness for RDMA connection establishment on big-endian
With commit ef19454b ("[LIB] crc32c: Keep intermediate crc state in
cpu order"), the behavior of crc32c changes on big-endian platforms.

Our algorithm expects the previous behavior; otherwise we have RDMA
connection establishment failure on big-endian platforms like powerpc.
Apply cpu_to_le32() to value returned by crc32c() to get the previous
behavior.

Signed-off-by: Faisal Latif <flatif@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:29 -08:00
Faisal Latif a2e9c384ce RDMA/nes: Fix use-after-free in mini_cm_dec_refcnt_listen()
Fix use-after-free spotted by Coverity checker flagged by Adrian Bunk.

Signed-off-by: Faisal Latif <flatif@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:29 -08:00
Glenn Streiff f84fba6f96 RDMA/nes: Fix use-after-free in nes_create_cq()
Just delete the debugging statement so we don't use cqp_request after
freeing it.  Adrian Bunk flagged this use-after-free issue spotted by
the Coverity checker.

Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:29 -08:00
Adrian Bunk a4435febd4 RDMA/nes: Fix a check-after-use in nes_probe()
Fix a check-after-use spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:29 -08:00
Adrian Bunk ed0ba33d64 RDMA/nes: Fix a memory leak in schedule_nes_timer()
Fix a memory leak spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:27 -08:00
Adrian Bunk 65b07ec293 RDMA/nes: Fix off-by-one
Fix an off-by-one spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-25 16:00:30 -08:00
Chien Tung 9300c0c067 RDMA/nes: Resurrect error path dead code
Adrian Bunk pointed out that a Coverity scan found some apparently
dead code in nes_verbs.c that really shouldn't have been dead.

The function nes_create_cq() was missing the assignment

	err = 1;

just prior to an iteration that conditionally set err = 0 if a PBL was
found for a given virtual CQ.  I also noticed we should have been
returning -EFAULT on a couple related error paths.

Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-25 16:00:30 -08:00
Bryan Rosenburg 82d416fffb RDMA/cxgb3: Fix shift calc in build_phys_page_list() for 1-entry page lists
A single entry (addr 0x10001000, size 0x2000) will get converted to
page address 0x10000000 with a page size of 0x4000.  The code as it
stands doesn't address the single buffer case, but in fact it allows
the subsequent single-buffer special case to be eliminated entirely.
Because the mask now includes the (page adjusted) starting and ending
addresses, the general case works for the single buffer case as well.

Signed-off-by: Bryan Rosenburg <rosnbrg@us.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-25 16:00:29 -08:00
Roland Dreier b7f9c112a5 IB/mthca: Free correct MPT on error exit from mthca_fmr_alloc()
When mthca_fmr_alloc() returns an error, it should free the MPT at the
index key, not mr->ibmr.lkey, since the lkey has been mangled by
hw_index_to_key() and no longer is the real index.  This bug causes
corruption of the MPT table free bitmap when mthca_fmr_alloc() fails.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-19 10:42:50 -08:00
Pradeep Satyanarayana ec229e5e81 IPoIB/cm: Fix ipoib_cm_dev_stop() cleanup when drain times out
Commit efcd9971 ("IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()")
introduced a bug in ipoib_cm_dev_stop() when the receive drain times
out.  In that case, the function moves all the pending rx stuff into a
private list but then calls ipoib_cm_free_rx_reap_list(), which
handles a different list.

Fix this by moving everything to the rx_reap_list that will actually
get freed up.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=906>.

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-19 10:25:11 -08:00
Roland Dreier 51af33e8e4 RDMA/nes: Fix possible array overrun
In nes_create_qp(), the test

	if (nesqp->mmap_sq_db_index > NES_MAX_USER_WQ_REGIONS) {

is used to error out if the db_index is too large; however, if the
test doesn't trigger, then the index is used as

	nes_ucontext->mmap_nesqp[nesqp->mmap_sq_db_index] = nesqp;

and mmap_nesqp is declared as

	struct nes_qp      *mmap_nesqp[NES_MAX_USER_WQ_REGIONS];

which leads to an array overrun if the index is exactly equal to
NES_MAX_USER_WQ_REGIONS.  Fix this by bailing out if the index is
greater than or equal to NES_MAX_USER_WQ_REGIONS.

This was spotted by the Coverity checker (CID 2162).

Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-18 10:33:59 -08:00
Chien Tung edd2fd643c RDMA/nes: Fix VLAN support
We need to account for the VLAN header size in nes_netdev_change_mtu()
and nes_netdev_init().  Also, add spin lock/unlock during VLAN RX
registration so only one process can assign VLAN group for a given
interface at a time.

Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-16 21:16:33 -08:00
Glenn Streiff 11e0704b7e RDMA/nes: Fix MAC interrupt erroneously masked on ifdown
Only mask out MAC interrupt if necessary and re-enable on ifup.  There
could be multiple netdevs going through the same MAC.  MAC interrupts
should not be masked off until the last netdev is downed.

Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-15 15:05:05 -08:00
Li Zefan c7482b81c8 IB: Fix return value in ib_device_register_sysfs()
If kobject_create_and_add() fails and returns NULL, the current code
in ib_device_register_sysfs() does not set ret and hence returns 0.
Set ret to -ENOMEM for this failure, so that the caller knows that
ib_device_register_sysfs() actually failed.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-15 15:05:05 -08:00
Sean Hefty ead595aeb0 RDMA/cma: Do not issue MRA if user rejects connection request
There's an undesirable interaction with issuing MRA requests to
increase connection timeouts and the listen backlog.

When the rdma_cm receives a connection request, it queues an MRA with
the ib_cm.  (The ib_cm will send an MRA if it receives a duplicate
REQ.)  The rdma_cm will then create a new rdma_cm_id and give that to
the user, which in this case is the rdma_user_cm.

If the listen backlog maintained in the rdma_user_cm is full, it
destroys the rdma_cm_id, which in turns destroys the ib_cm_id.  The
ib_cm_id generates a REJ because the state of the ib_cm_id has changed
to MRA sent, versus REQ received.  When the backlog is full, we just
want to drop the REQ so that it is retried later.

Fix this by deferring queuing the MRA until after the user of the
rdma_cm has examined the connection request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 15:30:41 -08:00
Jack Morgenstein e6028c0e00 IB/mlx4: mlx4_ib_fmr_alloc() should call mlx4_fmr_enable()
Currently mlx4_ib_fmr_alloc() calls mlx4_mr_enable() instead of
mlx4_fmr_enable().  The two functions are equivalent at the moment, but 
this is not really correct (and the change is needed to fix a bug).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 10:39:36 -08:00
Eli Cohen a9d1884925 IPoIB: Remove unused struct ipoib_cm_tx.ibwc member
struct ipoib_cm_tx.ibwc is unused since commit 1b524963 ("IPoIB/cm:
Use common CQ for CM send completions"), so remove it.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
2008-02-14 10:30:50 -08:00
Jack Morgenstein 167c42655c IPoIB: On P_Key change event, reset state properly
In P_Key event handling, if the old P_Key is no longer available, the
driver must call ipoib_ib_dev_stop() -- just as it does when the P_Key
is still available (see procedure __ipoib_ib_dev_flush()).

When a P_Key becomes available, the driver will perform ipoib_open(),
which assumes that the QP is in RESET, the cm_id has been
destroyed/deleted, etc.  If ipoib_ib_dev_stop() is not called as
described above, then these assumptions will be false, and the attempt
to bring the interface up will fail.

Found by Mellanox QA.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 10:15:06 -08:00
Marcin Slusarz 5163dc1a64 IB/mthca: Convert to use be16_add_cpu()
replace:

	big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
						expression_in_cpu_byteorder);

with:

	beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);

Generated with a semantic patch.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-13 07:47:47 -08:00
Steve Wise 8704e9a879 RDMA/cxgb3: Fail loopback connections
The cxgb3 HW and driver don't support loopback RDMA connections.  So
fail any connection attempt where the destination address is local.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-13 07:47:42 -08:00
Roland Dreier 7c7a9bccd2 IB/cm: Fix infiniband_cm class kobject ref counting
Commit 9af57b7a ("IB/cm: Add basic performance counters") introduced a
bug in how the reference count for cm_class.subsys.kobj was handled:
the path that released a device did a kobject_put() on that kobject, but
there was no kobject_get() in the path the handles adding a device.  So
the reference count ended up too low, which leads to bad things.  Fix up
and simplify the reference counting to avoid this.

(Actually, I introduced the bug when fixing the patch up to match some
of Greg's kobject changes, but who's counting)

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:27 -08:00
Roland Dreier ab64b96067 IB/cm: Remove debug printk()s that snuck upstream
Pesky little devils, sneaking around...

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:27 -08:00
Roland Dreier fe174357eb IB/mthca: Add missing sg_init_table() in mthca_map_user_db()
Usually harmless, since the scatterlist is always hard-coded to a length
of 1, but it triggers a BUG() if CONFIG_DEBUG_SG=y, so we better fix it.
This fixes <http://bugzilla.kernel.org/show_bug.cgi?id=9934>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:22 -08:00
Eli Cohen 7143740d26 IPoIB: Add send gather support
This patch acts as a preparation for using checksum offload for IB
devices capable of inserting/verifying checksum in IP packets.  The
patch does not actaully turn on NETIF_F_SG - we defer that to the
patches adding checksum offload capabilities.

We only add support for send gathers for datagram mode, since existing
HW does not support checksum offload on connected QPs.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08 14:32:37 -08:00
Eli Cohen eb14032f9e IPoIB: Add high DMA feature flag
All current InfiniBand devices can handle all DMA addresses, and it's
hard to imagine anyone would be silly enough to build a new device
that couldn't.  Therefore, enable the NETIF_F_HIGHDMA feature for IPoIB.

This has no effect for no, but is needed when we enable gather/scatter
support and checksum stateless offloads.

Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08 13:39:26 -08:00
Jack Morgenstein ea54b10c77 IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift.  This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting.  Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.

Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.

Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs.  However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware.  And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.

When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08 13:30:02 -08:00
Roland Dreier 1c69fc2a90 IB/mlx4: Consolidate code to get an entry from a struct mlx4_buf
We use struct mlx4_buf for kernel QP, CQ and SRQ buffers, and the code
to look up an entry is duplicated in get_cqe_from_buf() and the QP and
SRQ versions of get_wqe().  Factor this out into mlx4_buf_offset().

This will also make it easier to switch over to using vmap() for buffers.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-06 21:07:54 -08:00
Glenn Streiff 3c2d774cad RDMA/nes: Add a driver for NetEffect RNICs
Add a standard NIC and RDMA/iWARP driver for NetEffect 1/10Gb ethernet adapters.

Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:45 -08:00
Olaf Kirch 2c78853472 IB/mthca: Return proper error codes from mthca_fmr_alloc()
If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc()
would return 0 (success) no matter what. This leads to crashes a
little down the road, when we try to dereference eg mr->mtt, which was
really ERR_PTR(-Ewhatever).

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Roland Dreier f33afc26dc IB: Avoid marking __devinitdata as const
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Roland Dreier 68f3948dab IB/mlx4: Actually print out the driver version
The string mlx4_ib_version was defined, but never used.  Print out the
version once when the first device is initialized.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Eli Cohen 1d368c5465 IB/ib_mthca: Pre-link receive WQEs in Tavor mode
We have recently discovered that Tavor mode requires each WQE in a
posted list of receive WQEs to have a valid NDA field at all times.
This requirement holds true for regular QPs as well as for SRQs.  This
patch prelinks the receive queue in a regular QP and keeps the free
list in SRQ always properly linked.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Eli Cohen 1203c42e7b IB/mthca: Remove checks for srq->first_free < 0
The SRQ receive posting functions make sure that srq->first_free never
becomes negative, so we can remove tests of whether it is negative.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Or Gerlitz 1d96354e61 IB/fmr_pool: Allocate page list for pool FMRs only when caching enabled
Allocate memory for the page_list field of struct ib_pool_fmr only
when caching is enabled for the FMR pool, since the field is not used
otherwise.  This can save significant amounts of memory for large
pools with caching turned off.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
David Dillow 9fe4bcf45e IB/srp: Retry stale connections
When a host just goes away (crash, power loss, etc.) without tearing
down its IB connections, it can get stale connection errors when it
tries to reconnect to targets upon rebooting.  Retrying the connection
a few times will prevent sysadmins from playing the "which disk(s)
went missing?" game.

This would have made things slightly quicker when tracking down some
of the recent bugs, but it also helps quite a bit when you've got a
large number of targets hanging off a wedged server.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Jack Morgenstein 893da75956 mlx4_core: Don't read reserved fields in mlx4_QUERY_ADAPTER()
The firmware QUERY_ADAPTER command does not return vendor_id,
device_id, and revision_id; eliminate these fields from the query.

Initialize the rev_id field of the mlx4 device via init_node_data (MAD
IFC query), as is done in the query_device verb implementation.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Jack Morgenstein 6ccef1de2c IB/mthca: Don't read reserved fields in mthca_QUERY_ADAPTER()
For memfree devices, the firmware QUERY_ADAPTER command does not
return vendor_id, device_id, and revision_id; do not return these
fields in the QUERY_ADAPTER function for memfree devices.

Instead, for memfree devices, initialize the rev_id field of the mthca
device via init_node_data (MAD IFC query), as is done in the
query_device verb implementation.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Or Gerlitz 7bc531dd88 IPoIB: Remove a misleading debug print
Commit 732a2170 ("IB/ipoib: Bound the net device to the ipoib_neigh
structue") left a misleading debug print (n->dev would be a bond
device only if boding is used).  Clean it up.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Or Gerlitz bafff97417 IPoIB: Handle bonding failover race for connected neighbours too
Move up the code that checks for a situation where the remote GID
stored in the ipoib_neigh is different than the one present in the
neighbour (handle gratuitous ARP) or that a bonding fail over has
happened but the neighbour still has a pointer to an ipoib_neigh
created by a different device than the current slave.  This will cause
the driver to apply the check also for connected mode neighbours.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Roland Dreier 0d89fe2c0c IB/mthca: Fix and simplify page size calculation in mthca_reg_phys_mr()
In mthca_reg_phys_mr(), we calculate the page size for the HCA
hardware to use to map the buffer list passed in by the consumer.
For example, if the consumer passes in

    [0] addr 0x1000, size 0x1000
    [1] addr 0x2000, size 0x1000

then the algorithm would come up with a page size of 0x2000 and a list
of two pages, at 0x0000 and 0x2000.  Usually, this would work fine
since the memory region would start at an offset of 0x1000 and have a
length of 0x2000.

However, the old code did not take into account the alignment of the
IO virtual address passed in.  For example, if the consumer passed in
a virtual address of 0x6000 for the above, then the offset of 0x1000
would not be used correctly because the page mask of 0x1fff would
result in an offset of 0.

We can fix this quite neatly by making sure that the page shift we use
is no bigger than the first bit where the start of the first buffer
and the IO virtual address differ.  Also, we can further simplify the
code by removing the special case for a single buffer by noticing that
it doesn't matter if we use a page size that is too big.  This allows
the loop to compute the page shift to be replaced with __ffs().

Thanks to Bryan S Rosenburg <rosnbrg@us.ibm.com> for pointing out the
original bug and suggesting several ways to improve this patch.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Hoang-Nam Nguyen 2b5e6b120e IB/ehca: Add PMA support
This patch enables ehca to redirect any PMA queries to the
actual PMA QP.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Reviewed-by: Joachim Fenkes <fenkes@de.ibm.com>
Reviewed-by: Christoph Raisch <raisch@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Joachim Fenkes 528b03f732 IB/ehca: Update sma_attr also in case of disruptive config change
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Joachim Fenkes 2b7274c392 IB/ehca: Prevent sending UD packets to QP0
The IB spec doesn't allow packets to QP0 sent on any other VL than VL15.
Hardware doesn't filter those packets on the send side, so we need to do
this in the driver and firmware.

As eHCA doesn't support QP0, we can just filter out all traffic going to
QP0, regardless of SL or VL.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Sean Hefty 3971c9f6db IB/cm: Add interim support for routed paths
Paths with hop_limit > 1 indicate that the connection will be routed
between IB subnets.  Update the subnet local field in the CM REQ based
on the hop_limit value.  In addition, if the path is routed, then set
the LIDs in the REQ to the permissive LIDs.  This is used to indicate
to the passive side that it should use the LIDs in the received local
route header (LRH) associated with the REQ when programming the QP.

This is a temporary work-around to the IB CM to support IB router
development until the IB router specification is completed.  It is not
anticipated that this work-around will cause any interoperability
issues with existing stacks or future stacks that will properly
support IB routers when defined.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
James Bottomley d3f46f39b7 [SCSI] remove use_sg_chaining
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.

Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-30 13:14:02 -06:00
Linus Torvalds 0ba6c33bcd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits)
  [IPV6] ADDRLABEL: Fix double free on label deletion.
  [PPP]: Sparse warning fixes.
  [IPV4] fib_trie: remove unneeded NULL check
  [IPV4] fib_trie: More whitespace cleanup.
  [NET_SCHED]: Use nla_policy for attribute validation in ematches
  [NET_SCHED]: Use nla_policy for attribute validation in actions
  [NET_SCHED]: Use nla_policy for attribute validation in classifiers
  [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
  [NET_SCHED]: sch_api: introduce constant for rate table size
  [NET_SCHED]: Use typeful attribute parsing helpers
  [NET_SCHED]: Use typeful attribute construction helpers
  [NET_SCHED]: Use NLA_PUT_STRING for string dumping
  [NET_SCHED]: Use nla_nest_start/nla_nest_end
  [NET_SCHED]: Propagate nla_parse return value
  [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
  [NET_SCHED]: act_api: use nlmsg_parse
  [NET_SCHED]: act_api: fix netlink API conversion bug
  [NET_SCHED]: sch_netem: use nla_parse_nested_compat
  [NET_SCHED]: sch_atm: fix format string warning
  [NETNS]: Add namespace for ICMP replying code.
  ...
2008-01-29 22:54:01 +11:00
Denis V. Lunev f206351a50 [NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00
Denis V. Lunev f1b050bf7a [NETNS]: Add namespace parameter to ip_route_output_flow.
Needed to propagate it down to the __ip_route_output_key.

Signed_off_by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:06 -08:00
Denis V. Lunev 1ab352768f [NETNS]: Add namespace parameter to ip_dev_find.
in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:04 -08:00
Joe Perches 6360a02af1 [IPV4] drivers/infiniband: Use ipv4_is_<type>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:17 -08:00
WANG Cong aff5905778 INFINIBAND: Remove 'TOPDIR' from Makefiles
This patch removes TOPDIR from infiniband Makefile and delete
one include statement pointing to a non-existing directory

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <mshefty@ichips.intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:14:37 +01:00
Linus Torvalds 9b73e76f3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
  [SCSI] usbstorage: use last_sector_bug flag universally
  [SCSI] libsas: abstract STP task status into a function
  [SCSI] ultrastor: clean up inline asm warnings
  [SCSI] aic7xxx: fix firmware build
  [SCSI] aacraid: fib context lock for management ioctls
  [SCSI] ch: remove forward declarations
  [SCSI] ch: fix device minor number management bug
  [SCSI] ch: handle class_device_create failure properly
  [SCSI] NCR5380: fix section mismatch
  [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
  [SCSI] IB/iSER: add logical unit reset support
  [SCSI] don't use __GFP_DMA for sense buffers if not required
  [SCSI] use dynamically allocated sense buffer
  [SCSI] scsi.h: add macro for enclosure bit of inquiry data
  [SCSI] sd: add fix for devices with last sector access problems
  [SCSI] fix pcmcia compile problem
  [SCSI] aacraid: add Voodoo Lite class of cards.
  [SCSI] aacraid: add new driver features flags
  [SCSI] qla2xxx: Update version number to 8.02.00-k7.
  [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
  ...
2008-01-25 17:19:08 -08:00
Steve Wise 8176d297c7 RDMA/cxgb3: Fix the T3A workaround checks
Correctly work around T3A issues by checking "hwtype != T3A" instead of
"hwtype == T3B".  This will be needed for new hardware types.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:47 -08:00
Jan Engelhardt f7fca1e8a8 IB/ipath: Remove unnecessary cast
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:46 -08:00
Jan Engelhardt 1cf18d5aab IPoIB: Constify seq_operations function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:46 -08:00
Steve Wise c6b5b50474 RDMA/cxgb3: Mark QP as privileged based on user capabilities
This is needed to support zero-stag properly.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:45 -08:00
Steve Wise d08ca26cee RDMA/cxgb3: Fix page shift calculation in build_phys_page_list()
The existing logic incorrectly maps this buffer list:

    0: addr 0x10001000, size 0x1000
    1: addr 0x10002000, size 0x1000

To this bogus page list:

    0: 0x10000000
    1: 0x10002000

The shift calculation must also take into account the address of the
first entry masked by the page_mask as well as the last address+size
rounded up to the next page size.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:45 -08:00
Steve Wise 856b592504 RDMA/cxgb3: Flush the receive queue when closing
- for kernel mode cqs, call event notification handler when flushing.
- flush QP when moving from RTS -> CLOSING.
- fix logic to identify a kernel mode qp.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:45 -08:00
Ralph Campbell 4e1e93a418 IB/ipath: Trivial simplification of ipath_make_ud_req()
Move the increment of s_hdrwords into the existing if block that tests 
if we're doing a send with immediate, to save one test of the opcode.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Roland Dreier 950529e5c6 IB/mthca: Update latest "native Arbel" firmware revision
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Krishna Kumar 48fe5e594c IPoIB: Remove redundant check of netif_queue_stopped() in xmit handler
qdisc_run() now tests for queue_stopped() before calling
__qdisc_run(), and the same check is done in every iteration of
__qdisc_run(), so another check is not required in the driver xmit.
This means that ipoib_start_xmit() no longer needs to test
netif_queue_stopped(); the test was added to fix earlier kernels,
where the networking stack did not guarantee that the xmit method of
an LLTX driver would not be called after the queue was stopped, but
current kernels do provide this guarantee.

To validate, I put a debug in the TX_BUSY path which never hit with 64
threads running overnight exercising this code a few 100 million
times.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Ralph Campbell 3d68ea3261 IB/ipath: Add mappings from HW register to PortInfo port physical state
Add new mappings from port physical state (a HW register value) to the
IB SubnGet(PortInfo) port physical state.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Dave Olson 6ac50727bd IB/ipath: Changes to support PIO bandwidth check on IBA7220
The IBA7220 uses a count-based triggering mechanism, and therefore
can't use the same bandwidth verification mechanism as older chips.

To support the 7220, allow enabling and disabling armlaunch errors on
application request.  Minor robustness improvements as well.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:43 -08:00
Dave Olson ddb70c83a5 IB/ipath: Minor cleanup of unused fields and chip-specific errors
Clean up some unused header fields, minor related cleanup.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:43 -08:00
Michael Albaugh 359193ef43 IB/ipath: New sysfs entries to control 7220 features
IBA7220 includes many more configurable IB settings. Getting/setting
these is now grouped into a pair of chip specific functions accessed via
function pointers.  Provide sysfs access to these settings.

Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:32 -08:00
Dave Olson c4bce8032e IB/ipath: Add new chip-specific functions to older chips, consistent init
This adds the new (sometimes empty) chip-specific functions to the older
chips, and makes the initialization and related functions consistent across
all 3 chips.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:45 -08:00
Dave Olson 7387273307 IB/ipath: Remove unused MDIO interface code
This code has been unused for some time, but still had leftovers
from when it was used.

Signed-off-by: Dave Olson <dave.olson@qlogic.com
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:44 -08:00
Joachim Fenkes 2ec8e66241 IB/ehca: Prevent RDMA-related connection failures on some eHCA2 hardware
Some HW revisions of eHCA2 may cause an RC connection to break if they
received RDMA Reads over that connection before.  This can be
prevented by assuring that, after the first RDMA Read, the QP receives
a new RDMA Read every few million link packets.

Include code into the driver that inserts an empty (size 0) RDMA Read
into the message stream every now and then if the consumer doesn't
post them frequently enough.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:44 -08:00
Hoang-Nam Nguyen bbdd267ef2 IB/ehca: Add "port connection autodetect mode"
This patch enhances ehca with a capability to "autodetect" the ports
being connected physically. In order to utilize that function the
module option nr_ports must be set to -1 (default is 2 - two
ports). This feature is experimental and will made the default later.

More detail:

If the user connects only one port to the switch, current code requires
  1) port one to be connected and
  2) module option nr_ports=1 to be given.

If autodetect is enabled, ehca will not wait at creation of the GSI QP
for the respective port to become active. Since firmware does not
accept modify_qp() while the port is down at initialization, we need
to cache all calls to modify_qp() for the SMI/GSI QP and just return a
good return code.

When a port is activated and we get a PORT_ACTIVE event, we replay the
cached modify-qp() parms and re-trigger any posted recv WRs. Only then
do we forward the PORT_ACTIVE event to registered clients.

The result of this autodetect patch is that all ports will be
accessible by the users. Depending on their respective cabling only
those ports that are connected properly will become operable. If a
user tries to modify a regular QP of a non-connected port, modify_qp()
will fail. Furthermore, ibv_devinfo should show the port state
accordingly.

Note that this patch primarily improves the loading behaviour of
ehca. If the cable is removed while the driver is operating and
plugged in again, firmware will handle that properly by sending an
appropriate async event.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:44 -08:00
Hoang-Nam Nguyen b8b50e353b IB/ehca: Define array to store SMI/GSI QPs
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:44 -08:00
Hoang-Nam Nguyen 0c86e280fe IB/ehca: Remove CQ-QP-link before destroying QP in error path of create_qp()
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Erez Zilber 6410627eb9 IB/iser: Add change_queue_depth method
Add a .change_queue_depth handler to the scsi_host_template in the
iSER driver.  iscsi_change_queue_depth was added to iscsi_tcp in order
to solve the problem of queue depth which was too high for some
targets.  It is also applicable for iSER.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Erez Zilber a4ef1451df IB/iser: Print information about unhandled RDMA CM events
Some RDMA CM events are not supported or not handled in iSER.
This patch adds some info (printk) for the user about them.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Olaf Kirch a3cd7d9070 IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs
When a FMR is released via ib_fmr_pool_unmap(), the FMR usually ends
up on the free_list rather than the dirty_list (because we allow a
certain number of remappings before actually requiring a flush).

However, ib_fmr_batch_release() only looks at dirty_list when flushing
out old mappings.  This means that when ib_fmr_pool_flush() is used to
force a flush of the FMR pool, some dirty FMRs that have not reached
their maximum remap count will not actually be flushed.

Fix this by flushing all FMRs that have been used at least once in
ib_fmr_batch_release().

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Olaf Kirch a656eb758f IB/fmr_pool: Flush serial numbers can get out of sync
Normally, the serial numbers for flush requests and flushes executed
for an FMR pool should be in sync.

However, if the FMR pool flushes dirty FMRs because the
dirty_watermark was reached, we wake up the cleanup thread and let it
do its stuff.  As a side effect, the cleanup thread increments
pool->flush_ser, which leaves it one higher than pool->req_ser.  The
next time the user calls ib_flush_fmr_pool(), the cleanup thread will
be woken up, but ib_flush_fmr_pool() won't wait for the flush to
complete because flush_ser is already past req_ser.  This means the
FMRs that the user expects to be flushed may not have all been flushed
when the function returns.

Fix this by telling the cleanup thread to do work exclusively by
incrementing req_ser, and by moving the comparison of dirty_len and
dirty_watermark into ib_fmr_pool_unmap().

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
2008-01-25 14:15:42 -08:00
Roland Dreier 2fe7e6f7c9 IB/umad: Simplify and fix locking
In addition to being overly complex, the locking in user_mad.c is
broken: there were multiple reports of deadlocks and lockdep warnings.
In particular it seems that a single thread may end up trying to take
the same rwsem for reading more than once, which is explicitly
forbidden in the comments in <linux/rwsem.h>.

To solve this, we change the locking to use plain mutexes instead of
rwsems.  There is one mutex per open file, which protects the contents
of the struct ib_umad_file, including the array of agents and list of
queued packets; and there is one mutex per struct ib_umad_port, which
protects the contents, including the list of open files.  We never
hold the file mutex across calls to functions like ib_unregister_mad_agent(),
which can call back into other ib_umad code to queue a packet, and we
always hold the port mutex as long as we need to make sure that a
device is not hot-unplugged from under us.

This even makes things nicer for users of the -rt patch, since we
remove calls to downgrade_write() (which is not implemented in -rt).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:42 -08:00
Roland Dreier cf9542aa92 IB/ipath: Fix some sparse warnings about shadowed symbols
There are a few places in the ipath driver where a variable is
re-declared within a block where it is already in scope.  Most of these
extra declarations can simply be removed, since the variable from the
outer scope is used in a way so that it does not need to keep its
variable across the block with the re-declaration.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:42 -08:00
Roland Dreier 1d6e658e8e RDMA/cxgb3: Endianness annotation for irs field
t3_rdma_init_wr.irs is a big-endian field, so declare it as __be32.
This fixes one sparse warning.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:42 -08:00
Anton Blanchard 1a7d2dce41 IB/ehca: Use round_jiffies() for EQ polling timer
Use round_jiffies() to align ehca's 1-second timer with other timers
and potentially save power by sleeping cores for longer.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:41 -08:00
Sean Hefty 5851bb893e RDMA/cma: Override default responder_resources with user value
By default, the responder_resources parameter is set to that received
in a connection request.  The passive side may override this value
when accepting the connection.  Use the value provided by the passive
side when transitioning the QP to RTR state, rather than the value
given in the connect request.  Without this change, the RTR transition
may fail if the passive side supports fewer responder_resources than
that in the request.

For code consistency and to protect against QP destruction, restructure
overriding initiator_depth to match how responder_resources is set.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:41 -08:00
Dave Olson 1f813ca830 IB/ipath: Drop support for the original QHT7040 board
The original QHT7040 had significant performance issues so there was an
additional check in the driver for a newer serial number.  Support for
the small quantities of that board shipped has been dropped, so this
patch removes the special checks to simplify the code.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:40 -08:00
Arthur Jones 7da0498e7f IB/ipath: Add ipath_read_ireg() abstraction
Different chips have different width interrupt status registers, so add
a flag and accessor function to decide which width register read to use.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:40 -08:00
Ralph Campbell 4ea61b548b IB/ipath: Add flag and handling for chips with swapped register bug
The 6110 had a bug that caused some registers to be swapped; it was
fixed for the 7220 (and didn't affect the 6120 because it had fewer
registers).  This adds a flag and related code to handle that, and
includes some minor cleanups in the same area.

Signed-off-by:  Ralph Campbell <ralph.campbell@qlogic.com>
2008-01-25 14:15:39 -08:00
Ralph Campbell 60948a4158 IB/ipath: Port config has on-chip effects for 7220
The number of configured ports for the 7220 changes the number of eager
TIDs available per port, for all but port 0 (kernel port) which remains
constant, so add a field to give port0 count separate from the portdata
structure.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:39 -08:00
Ralph Campbell a18e26ae44 IB/ipath: Allow more flexible user register alignments
User registers have different alignments on different chips (4KB on
older, 64KB on 7220).  Allow mapping the user registers on kernels with
page sizes up to 64K.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:39 -08:00
Dave Olson 9e2ef36b5a IB/ipath: Clean up some comments
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:38 -08:00
Ralph Campbell 3029fcc3d4 IB/ipath: Export hardware counters more consistently
Various hardware counters are exported via the ipath file system (since
it is binary data).  The old file format was very dependent on the HW
offsets for these registers.  Newer HCA chips can have different
counters at different offsets.  This patch adds a level of indirection
to make the file format consistent across HCAs.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:38 -08:00
Ralph Campbell 6c719cae0b IB/ipath: MAD performance sampling registers support
Add support for QLogic HCAs which have hardware performance sampling
registers for PortSamplesControl and PortSamplesResult MADs.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:38 -08:00
David Dillow 7aa54bd730 IB/srp: Add identifying information to log messages
When you have multiple targets, it gets really confusing when you try
to track down who did a reset when there is no identifying information
in the log message, especially when the same extension ID is mapped
through two different local IB ports.  So, add an identifier that can
be used to track back to which local IB port/remote target pair is the
one having problems.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:38 -08:00
Pradeep Satyanarayana 586a693448 IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entries
Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG
entries for SRQs.  Currently IPoIB/CM implicitly assumes all HCAs will
support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This
patch removes that restriction by limiting the maximum MTU in connected
mode to what the maximum number of SRQ SG entries allows.

This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728>

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
David Dillow fff09a8e6e IB/srp: Enable SG list chaining
By default, the SCSI mid-layer seems to send down 512KB requests
(sg_tablesize = 256), with some requests occasionally combined. By
allowing the mid-layer to chain requests, we can easily grow to 1024KB
or larger -- I've tested 4096KB I/O requests with no problems.

I looked through the DMA paths on the hardware drivers to ensure they
could take advantage of the SG chaining, and it seems that every one
except ipath uses the system's DMA routines, which have been converted
to handle chaining.  ipath looks like it should be OK, but I have no
way to test it.

Signed-off-by: David Dillow <dillowda@ornl.gov>

[ Tested on ipath.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
David Dillow 8cba207732 IB/srp: Respect target credit limit
The current SRP initiator will send requests even if it has no credits
available.  The results of sending extra requests are vendor specific,
but on some devices, overrunning credits will cost 85% of peak
performance -- e.g. 100 MB/s vs 720 MB/s.  Other devices may just drop
the requests.

This patch will tell the SCSI midlayer to queue requests if there are
fewer than two credits remaining, and will not issue a task management
request if there are no credits remaining.  The mid-layer will retry
the queued command once an outstanding command completes.

The patch also removes the unlikely() in __srp_get_tx_iu(), as it is
not at all unlikely to hit this limit under heavy load.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Rolf Manderscheid a9e527e3f9 IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Dave Olson 755807a296 IB/ipath: Changes for fields moving from devdata to portdata
This patch moves some arrays that were defined per-device to be
variables defined in the per context data structure, thus avoiding extra
kzalloc() calls.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:36 -08:00
Dave Olson d8274869d7 IB/ipath: Generalize some xxx_SHIFT macros
In preparation for upcoming chips that have different values for
INFINIPATH_R_PORTENABLE_SHIFT, INFINIPATH_R_INTRAVAIL_SHIFT,
INFINIPATH_R_TAILUPD_SHIFT, and portcfg_shift, remove the shared
#defines and use device-specific variables instead.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:36 -08:00
Ralph Campbell c59a80aca0 IB/ipath: kreceive uses portdata rather than devdata
kreceive is now portdata * instead of devdata * and other kreceive
related cleanups....

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:35 -08:00
Ralph Campbell d65708f3a7 IB/ipath: Cleanup ipath_get_egrbuf()
Remove an unused parameter and fix up the comment.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:35 -08:00
Ralph Campbell cc65edcf0c IB/ipath: Fix RNR NAK handling
This patch fixes a couple of minor problems with RNR NAK handling:
 - The insertion sort was causing extra delay when inserting ahead
   vs. behind an existing entry on the list.
 - A resend of a first packet of a message which is still not ready,
   needs another RNR NAK (i.e., it was suppressed when it shouldn't).
 - Also, the resend tasklet doesn't need to be woken up unless the
   ACK/NAK actually indicates progress has been made.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:34 -08:00
Hoang-Nam Nguyen e57d62a147 IB/ehca: Forward event client-reregister-required to registered clients
This patch allows ehca to forward event client-reregister-required to
registered clients.  One such event is generated by a switch eg. after
its reboot.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:34 -08:00
Roland Dreier b3226184af IB/mlx4: Micro-optimize mlx4_ib_poll_one()
Rather than byte-swapping cqe->g_mlpath_rqpn each time we extract a
field from it, byte-swap it once into a temporary variable.  This 
results in smaller, better code -- eg, on 32-bit x86:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function                                     old     new   delta
mlx4_ib_poll_cq                             1188    1183      -5

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:34 -08:00
Adrian Bunk e57895d389 IB/mthca: Remove MSI support as scheduled
Remove MSI support from the mthca driver, as scheduled.  There is no
reason to use MSI instead of MSI-X, since MSI-X performs better.  No
one has spoken up since MSI support was deprecated in commit f6be6fbe
("IB/mthca: Schedule MSI support for removal"), so apparently the MSI
support is unused.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:33 -08:00
Oliver Pinter 38dc732f47 IB/iser: Typo fix (s/destory/destroy/)
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:32 -08:00
Erez Zilber bd5d7a8585 IB/iser: update URLs of iSER docs
Signed-off-by: Erez Zilber <erezz@voltaire.com>
2008-01-25 14:15:32 -08:00
Sean Hefty 88314e4dda RDMA/cma: add support for rdma_migrate_id()
This is based on user feedback from Doug Ledford at RedHat:

Events that occur on an rdma_cm_id are reported to userspace through an
event channel.  Connection request events are reported on the event
channel associated with the listen.  When the connection is accepted, a
new rdma_cm_id is created and automatically uses the listen event
channel.  This is suboptimal where the user only wants listen events on
that channel.

Additionally, it may be desirable to have events related to connection
establishment use a different event channel than those related to
already established connections.

Allow the user to migrate an rdma_cm_id between event channels. All
pending events associated with the rdma_cm_id are moved to the new event
channel.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:32 -08:00
Vladimir Sokolovsky 45d9478da1 RDMA/cma: Reenable device removal on passive side
Enable conn_id remove on the passive side after connection
establishment.  This corrects an issue where the IB driver can't be
unloaded after running applications over RDS.  The 'dev_remove' counter
does not reach 0 for established connections on the passive side.

This problem is limited to device removal, and only occurs on the
passive side if there are established connections.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:31 -08:00
Sean Hefty b61d92d8ae IB/mad: Fix incorrect access to items on local_list
In cancel_mads(), MADs are moved from the wait_list and local_list
to a cancel_list for processing.  However, the structures on these two
lists are not the same.  The wait_list references struct
ib_mad_send_wr_private, but local_list references struct
ib_mad_local_private.  Cancel_mads() treats all items moved to the
cancel_list as struct ib_mad_send_wr_private.  This leads to a system
crash when requests are moved from the local_list to the cancel_list.

Fix this by leaving local_list alone.  All requests on the local_list
have completed are just awaiting processing by a queued worker thread.

Bug (crash) reported by Dotan Barak <dotanb@dev.mellanox.co.il>.
Problem with local_list access reported by Robert Reynolds
<rreynolds@opengridcomputing.com>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:31 -08:00