Commit Graph

32 Commits

Author SHA1 Message Date
Steve Wise 4ab928f692 RDMA/cxgb3: Fixes for zero STag
Handling the zero STag in receive work request requires some extra
logic in the driver:

 - Only set the QP_PRIV bit for kernel mode QPs.

- Add a zero STag build function for recv wrs. The uP needs a PBL
  allocated and passed down in the recv WR so it can construct a HW
  PBL for the zero STag S/G entries.  Note: we need to place a few
  restrictions on zero STag usage because of this:

  1) all SGEs in a recv WR must either be zero STag or not.  No mixing.

  2) an individual SGE length cannot exceed 128MB for a zero-stag SGE.
     This should be OK since it's not really practical to allocate
     such a large chunk of pinned contiguous DMA mapped memory.

- Add an optimized non-zero-STag recv wr format for kernel users.
  This is needed to optimize both zero and non-zero STag cracking in
  the recv path for kernel users.

 - Remove the iwch_ prefix from the static build functions.

 - Bump required FW version.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
2008-07-14 23:48:53 -07:00
Steve Wise 96f15c0353 RDMA/core: Add local DMA L_Key support
- Change the IB_DEVICE_ZERO_STAG flag to the transport-neutral name
  IB_DEVICE_LOCAL_DMA_LKEY, which is used by iWARP RNICs to indicate 0
  STag support and IB HCAs to indicate reserved L_Key support.

- Add a u32 local_dma_lkey member to struct ib_device.  Drivers fill
  this in with the appropriate local DMA L_Key (if they support it).

- Fix up the drivers using this flag.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00
Steve Wise 70fe1796a5 RDMA/cxgb3: Set rkey field for new memory windows in iwch_alloc_mw()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Jon Mason 52c8084b74 RDMA/cxgb3: Propagate HW page size capabilities
cxgb3 does not currently report the page size capabilities, and
incorrectly reports them internally.

This version changes the bit-shifting to a static value (per Steve's
request).

Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Steve Wise 14cc180f7b RDMA/cxgb3: Add support for protocol statistics
- Add a new rdma ctl command called RDMA_GET_MIB to the cxgb3 low
  level driver to obtain the protocol mib from the rnic hardware.

- Add new iw_cxgb3 provider method to get the MIB from the low level
  driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Steve Wise 97d1cc8055 RDMA/cxgb3: Fix up some ib_device_attr fields
- set fw_ver
- set hw_ver
- set max_qp_wr to something reasonable
- set max_cqe to something reasonable

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Steve Wise e7e5582999 RDMA/cxgb3: MEM_MGT_EXTENSIONS support
- set IB_DEVICE_MEM_MGT_EXTENSIONS capability bit if fw supports it.
- set max_fast_reg_page_list_len device attribute.
- add iwch_alloc_fast_reg_mr function.
- add iwch_alloc_fastreg_pbl
- add iwch_free_fastreg_pbl
- adjust the WQ depth for kernel mode work queues to account for
  fastreg possibly taking 2 WR slots.
- add fastreg_mr work request support.
- add local_inv work request support.
- add send_with_inv and send_with_se_inv work request support.
- removed useless duplicate enums/defines for TPT/MW/MR stuff.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Steve Wise 5e19cf663b RDMA/cxgb3: Fix regression caused by class_device -> device conversion
The change to iwch_provider.c in commit f4e91eb4 ("IB: convert struct
class_device to struct device") undid the fix done in commit 7f049f2f
("RDMA/cxgb3: Hold rtnl_lock() around ethtool get_drvinfo call").  It
removed the calls to rtnl_lock() that serialized the iw_cxgb3 ethtool
ops calls into the cxgb3 driver.  This locking is needed to avoid
messing up the internal state of the cxgb3 driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-08 14:40:05 -07:00
Roland Dreier 273748cc90 RDMA/cxgb3: Fix severe limit on userspace memory registration size
Currently, iw_cxgb3 is severely limited on the amount of userspace
memory that can be registered in in a single memory region, which
causes big problems for applications that expect to be able to
register 100s of MB.

The problem is that the driver uses a single kmalloc()ed buffer to
hold the physical buffer list (PBL) for the entire memory region
during registration, which means that 8 bytes of contiguous memory are
required for each page of memory being registered.  For example, a 64
MB registration will require 128 KB of contiguous memory with 4 KB
pages, and it unlikely that such an allocation will succeed on a busy
system.

This is purely a driver problem: the temporary page list buffer is not
needed by the hardware, so we can fix this by writing the PBL to the
hardware in page-sized chunks rather than all at once.  We do this by
splitting the memory registration operation up into several steps:

 - Allocate PBL space in adapter memory for the full registration
 - Copy PBL to adapter memory in chunks
 - Allocate STag and enable memory region

This also allows several other cleanups to the __cxio_tpt_op()
interface and related parts of the driver.

This change leaves the reregister memory region and memory window
operations broken, but they already didn't work due to other
longstanding bugs, so fixing them will be left to a later patch.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-06 15:56:22 -07:00
Steve Wise ccaf10d0ad RDMA/cxgb3: Set the max_mr_size device attribute correctly
cxgb3 only supports 4GB memory regions.  The lustre RDMA code uses
this attribute and currently has to code around our bad setting.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Arthur Kepner cb9fbc5c37 IB: expand ib_umem_get() prototype
Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
when mapping user-allocated CQs with ib_umem_get().

Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Tony Jones f4e91eb4a8 IB: convert struct class_device to struct device
This converts the main ib_device to use struct device instead of struct
class_device as class_device is going away.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:30 -07:00
Roland Dreier 0f39cf3d54 IB/core: Add support for "send with invalidate" work requests
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions.  Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated.  Add this new union to struct
ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().

Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.

Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit.  The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing).  Remove the flag from
all existing drivers that set it until we know which ones are OK.

The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.

This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Harvey Harrison 3371836383 IB: Replace remaining __FUNCTION__ occurrences with __func__
__FUNCTION__ is gcc-specific, use __func__ instead.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Jon Mason 4fa45725df RDMA/cxgb3: Fix iwch_create_cq() off-by-one error
The cxbg3 driver is unnecessarily decreasing the number of CQ entries by
one when creating a CQ.  This will cause the CQ not to have as many
entries as requested by the user if the user requests a power of 2 size.

Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-09 13:54:12 -07:00
Jon Mason 1bab74e691 RDMA/cxgb3: Return correct max_inline_data when creating a QP
Set cap.max_inline_data to the actual max inline data that the adapter
support, so that userspace apps see the right value returned.

Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:53:18 -08:00
Steve Wise 8176d297c7 RDMA/cxgb3: Fix the T3A workaround checks
Correctly work around T3A issues by checking "hwtype != T3A" instead of
"hwtype == T3B".  This will be needed for new hardware types.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:47 -08:00
Steve Wise 7f049f2f42 RDMA/cxgb3: Hold rtnl_lock() around ethtool get_drvinfo call
Currently the call into cxgb3 to get the driver info is not serialized.
The iw_cxgb3 module needs to hold the rtnl_lock around the ethtool ops
call like dev_ioctl() does.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:26 -08:00
Steve Wise 9a7666494b RDMA/cxgb3: Set the max_qp_init_rd_atom attribute in query_device
The device attribute max_qp_init_rd_atom is not getting set in cxgb3's
query_device method.  Version 1.0.4 of librdmacm now validates the
user's requested initiator and responder resources against the max
supported by the device.  Since iw_cxgb3 wasn't setting this attribute
(and it defaulted to 0), all rdma_connect()s fail if there are
initiator resources requested by the app.  Fix this by setting the
correct value in iwch_query_device().

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-13 15:27:00 -08:00
WANG Cong 6abb6ea80b RDMA/cxgb3: Check return of kmalloc() in iwch_register_device()
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
[ Also remove cast from void * return of kmalloc() as suggested by  
  Jesper Juhl <jesper.juhl@gmail.com>. ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Roland Dreier f7c6a7b5d5 IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Roland Dreier ed23a72778 IB: Return "maybe missed event" hint from ib_req_notify_cq()
The semantics defined by the InfiniBand specification say that
completion events are only generated when a completions is added to a
completion queue (CQ) after completion notification is requested.  In
other words, this means that the following race is possible:

	while (CQ is not empty)
		ib_poll_cq(CQ);
	// new completion is added after while loop is exited
	ib_req_notify_cq(CQ);
	// no event is generated for the existing completion

To close this race, the IB spec recommends doing another poll of the
CQ after requesting notification.

However, it is not always possible to arrange code this way (for
example, we have found that NAPI for IPoIB cannot poll after
requesting notification).  Also, some hardware (eg Mellanox HCAs)
actually will generate an event for completions added before the call
to ib_req_notify_cq() -- which is allowed by the spec, since there's
no way for any upper-layer consumer to know exactly when a completion
was really added -- so the extra poll of the CQ is just a waste.

Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
ib_req_notify_cq() so that it can return a hint about whether the a
completion may have been added before the request for notification.
The return value of ib_req_notify_cq() is extended so:

	 < 0	means an error occurred while requesting notification
	== 0	means notification was requested successfully, and if
		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
		events were missed and it is safe to wait for another
		event.
	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
		passed in.  It means that the consumer must poll the
		CQ again to make sure it is empty to avoid the race
		described above.

We add a flag to enable this behavior rather than turning it on
unconditionally, because checking for missed events may incur
significant overhead for some low-level drivers, and consumers that
don't care about the results of this test shouldn't be forced to pay
for the test.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Michael S. Tsirkin f4fd0b224d IB: Add CQ comp_vector support
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Steve Wise 1860cdf802 RDMA/cxgb3: Fail qp creation if the requested max_inline is too large
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Joachim Fenkes 1912ffbb88 IB: Set class_dev->dev in core for nice device symlink
All RDMA drivers except ehca set class_dev->dev to their dma_device
value (ehca leaves this unset).  dma_device is the only value that
makes any sense, so move this assignment to core/sysfs.c.  This reduce
the duplicated code in the rest of the drivers and gives ehca a nice
/sys/class/infiniband/ehcaX/device symlink.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:38 -07:00
Steve Wise d601347188 RDMA/cxgb3: Handle build_phys_page_list() failure in iwch_reregister_phys_mem()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Steve Wise e64518f373 RDMA/cxgb3: Fix MR permission problems
Fix memory region permission problems:

- remove useless and redundant iwch_mem_perms enum.

- create ib_to_tpt_access_rights() for mapping ib access rights
  to T3 TPT permissions.

- create ib_to_mwbind_access_rights() for mapping ib access rights
  to T3 MWBIND WR permissions.

- fix up the mem reg code to utilize the new functions.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:51:02 -08:00
Steve Wise 2df50da00e RDMA/cxgb3: Move QP to error on destroy if the state is IDLE
Change iwch_destroy_qp() to always move the QP to ERROR and let
iwch_modify_qp() decide what to do.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:45 -08:00
Steve Wise aeb100e246 RDMA/cxgb3: Don't use mm after it's freed in iwch_mmap()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 11:48:17 -08:00
Adrian Bunk 2b540355cd RDMA/cxgb3: cleanups
- don't mark static functions in C files as inline - gcc should know
  best whether inlining makes sense
- never compile the unused cxio_dbg.c
- make the following needlessly global functions static:
  - cxio_hal.c: cxio_hal_clear_qp_ctx()
  - iwch_provider.c: iwch_get_qp()
- remove the following unused global functions:
  - cxio_hal.c: cxio_allocate_stag()
  - cxio_resource.: cxio_hal_get_rhdl()
  - cxio_resource.: cxio_hal_put_rhdl()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-23 13:10:43 -08:00
Steve Wise c52daa2976 RDMA/cxgb3: Remove Open Grid Computing copyrights in iw_cxgb3 driver
Remove the Open Grid Computing copyright.  It shouldn't be there.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Steve Wise b038ced7b3 RDMA/cxgb3: Add driver for Chelsio T3 RNIC
Add an RDMA/iWARP driver for the Chelsio T3 1GbE and 10GbE adapters.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:18 -08:00