linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	e3b1fd56f1	Main set of InfiniBand/RDMA updates for 3.17 merge window: - MR reregistration support - MAD support for RMPP in userspace - iSER and SRP initiator updates - ocrdma hardware driver updates - other fixes... -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJT7N2GAAoJEENa44ZhAt0hUiIQAKBqYIjpB3QY6Z/B19mxDxku I81B3OkirumbAaCLoLckvq4gnwQ+BAD+YUmXgP08TCIgrABZAYYw+WvMEY9WNyQB x3Pv+BzX+wKKNaQkSnB9JVdku+BSI76eW0YYrIX0F0x1o0Jq9JpSkia91KmRvqmX YSFy2R7BjEZ4lfo/uscydHT26Q6EdT3od4iv48K8qq5rKdjtyYNgD/75DLCN599Z uI1f98e6Tl+7nHWaioQB61zlYPkNPLAnZtMrY2j4tarTwYwX1KhF3eV7z39L1l81 nhMIXr+qBXtYuZw3I9rKqw3VVCCqB9e6E8FIA5K/d2jWqO+0TqIMUYOuZXuCezWg o+uGgwbDOweBqwrKRmiR2M0lk2I1Z16jBxYuaUBbLImG0/NPtuUB22t6RbPAojTa EjDkb9XBA7uFUMrYYnou+HxEzmJUYkin6wgGxtklYEKUqvh8G9ccGt6httWrCSrV mpjwJv+S4LdFM49wdP993lYthpCoZ42yxEzZ7zJ/KTt17/Wb5F4RtHIROGVFkHdT mo8RUz5DGDznfNQgn0m/jC3woFnMNpLOduI+CivhrwrWCwMpSUxIjqWu3263pIJ7 +H0kNOKDAbp6+F27j+AVznlyXnaEtYyM8EZnysG1Hkz24gCSWBYo5Ep8eSH8LgY4 9VPc75KVG6uddx5mhg5h =wOm0 -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma updates from Roland Dreier: "Main set of InfiniBand/RDMA updates for 3.17 merge window: - MR reregistration support - MAD support for RMPP in userspace - iSER and SRP initiator updates - ocrdma hardware driver updates - other fixes..." * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (52 commits) IB/srp: Fix return value check in srp_init_module() RDMA/ocrdma: report asic-id in query device RDMA/ocrdma: Update sli data structure for endianness RDMA/ocrdma: Obtain SL from device structure RDMA/uapi: Include socket.h in rdma_user_cm.h IB/srpt: Handle GID change events IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0] IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0] RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf() IPoIB: Remove unnecessary test for NULL before debugfs_remove() IB/mad: Add user space RMPP support IB/mad: add new ioctl to ABI to support new registration options IB/mad: Add dev_notice messages for various umad/mad registration failures IB/mad: Update module to [pr\|dev]_* style print messages IB/ipoib: Avoid multicast join attempts with invalid P_key IB/umad: Update module to [pr\|dev]_* style print messages IB/ipoib: Avoid flushing the workqueue from worker context IB/ipoib: Use P_Key change event instead of P_Key polling mechanism IB/ipath: Add P_Key change event support mlx4_core: Add support for secure-host and SMP firewall ...	2014-08-14 11:09:05 -06:00
Steve Wise	678ea9b5ba	RDMA/cxgb4: Only call CQ completion handler if it is armed The function __flush_qp() always calls the ULP's CQ completion handler functions even if the CQ was not armed. This can crash the system if the function pointer is NULL. The iSER ULP behaves this way: no completion handler and never arm the CQ for notification. So now we track whether the CQ is armed at flush time and only call the completion handlers if their CQs were armed. Also, if the RCQ and SCQ are the same CQ, the completion handler is getting called twice. It should only be called once after all SQ and RQ WRs are flushed from the QP. So rearrange the logic to fix this. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-08-01 14:54:37 -07:00
Hariprasad Shenai	91244bbd6b	iw_cxgb4: Don't limit TPTE count to 32KB Use the size advertised by FW Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-21 20:23:59 -07:00
Hariprasad Shenai	66eb19af0b	iw_cxgb4: advertise the correct device max attributes Advertise the actual max limits for things like qp depths, number of qps, cqs, etc. Clean up the queue allocation for qps and cqs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-21 20:23:59 -07:00
Hariprasad Shenai	7730b4c7e3	cxgb4/iw_cxgb4: work request logging feature This commit enhances the iwarp driver to optionally keep a log of rdma work request timining data for kernel mode QPs. If iw_cxgb4 module option c4iw_wr_log is set to non-zero, each work request is tracked and timing data maintained in a rolling log that is 4096 entries deep by default. Module option c4iw_wr_log_size_order allows specifing a log2 size to use instead of the default order of 12 (4096 entries). Both module options are read-only and must be passed in at module load time to set them. IE: modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10 The timing data is viewable via the iw_cxgb4 debugfs file "wr_log". Writing anything to this file will clear all the timing data. Data tracked includes: - The host time when the work request was posted, just before ringing the doorbell. The host time when the completion was polled by the application. This is also the time the log entry is created. The delta of these two times is the amount of time took processing the work request. - The qid of the EQ used to post the work request. - The work request opcode. - The cqe wr_id field. For sq completions requests this is the swsqe index. For recv completions this is the MSN of the ingress SEND. This value can be used to match log entries from this log with firmware flowc event entries. - The sge timestamp value just before ringing the doorbell when posting, the sge timestamp value just after polling the completion, and CQE.timestamp field from the completion itself. With these three timestamps we can track the latency from post to poll, and the amount of time the completion resided in the CQ before being reaped by the application. With debug firmware, the sge timestamp is also logged by firmware in its flowc history so that we can compute the latency from posting the work request until the firmware sees it. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:25:16 -07:00
Hariprasad Shenai	031cf4769b	cxgb4/iw_cxgb4: display TPTE on errors With ingress WRITE or READ RESPONSE errors, HW provides the offending stag from the packet. This patch adds logic to log the parsed TPTE in this case. cxgb4 now exports a function to read a TPTE entry from adapter memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:25:16 -07:00
Hariprasad Shenai	04e10e2164	iw_cxgb4: Detect Ing. Padding Boundary at run-time Updates iw_cxgb4 to determine the Ingress Padding Boundary from cxgb4_lld_info, and take subsequent actions. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:25:16 -07:00
Hariprasad Shenai	cf38be6d61	iw_cxgb4: Allocate and use IQs specifically for indirect interrupts Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA RXQs, which also handle direct interrupts for offload CPLs during RDMA connection setup/teardown. The intended T4 usage model, however, is to have indirect interrupts flow through dedicated IQs. IE not to mix indirect interrupts with CPL messages in an IQ. This patch adds the concept of RDMA concentrator IQs, or CIQs, setup and maintained by the LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will flow through the LLD's RDMA RXQs, and CQ interrupts flow through the CIQs. Design: cxgb4 creates and exports an array of CIQs for the RDMA ULD. These IQs are sized according to the max available CQs available at adapter init. In addition, these IQs don't need FL buffers since they only service indirect interrupts. One CIQ is setup per RX channel similar to the RDMA RXQs. iw_cxgb4 will utilize these CIQs based on the vector value passed into create_cq(). The num_comp_vectors advertised by iw_cxgb4 will be the number of CIQs configured, and thus the vector value will be the index into the array of CIQs. Based on original work by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 22:49:54 -07:00
Steve Wise	a03d9f94cc	RDMA/cxgb4: Max fastreg depth depends on DSGL support The max depth of a fastreg mr depends on whether the device supports DSGL or not. So compute it dynamically based on the device support and the module use_dsgl option. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:08 -07:00
Steve Wise	def4771f4b	RDMA/cxgb4: rmb() after reading valid gen bit Some HW platforms can reorder read operations, so we must rmb() after we see a valid gen bit in a CQE but before we read any other fields from the CQE. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:07 -07:00
Steve Wise	fa658a98a2	RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices Signed-off-by: Steve Wise <swise@opengridcomputing.com> [ Fix cast from u64* to integer. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:01 -07:00
Steve Wise	05eb23893c	cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes The current logic suffers from a slow response time to disable user DB usage, and also fails to avoid DB FIFO drops under heavy load. This commit fixes these deficiencies and makes the avoidance logic more optimal. This is done by more efficiently notifying the ULDs of potential DB problems, and implements a smoother flow control algorithm in iw_cxgb4, which is the ULD that puts the most load on the DB fifo. Design: cxgb4: Direct ULD callback from the DB FULL/DROP interrupt handler. This allows the ULD to stop doing user DB writes as quickly as possible. While user DB usage is disabled, the LLD will accumulate DB write events for its queues. Then once DB usage is reenabled, a single DB write is done for each queue with its accumulated write count. This reduces the load put on the DB fifo when reenabling. iw_cxgb4: Instead of marking each qp to indicate DB writes are disabled, we create a device-global status page that each user process maps. This allows iw_cxgb4 to only set this single bit to disable all DB writes for all user QPs vs traversing the idr of all the active QPs. If the libcxgb4 doesn't support this, then we fall back to the old approach of marking each QP. Thus we allow the new driver to work with an older libcxgb4. When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes via the status page and transition the DB state to STOPPED. As user processes see that DB writes are disabled, they call into iw_cxgb4 to submit their DB write events. Since the DB state is in STOPPED, the QP trying to write gets enqueued on a new DB "flow control" list. As subsequent DB writes are submitted for this flow controlled QP, the amount of writes are accumulated for each QP on the flow control list. So all the user QPs that are actively ringing the DB get put on this list and the number of writes they request are accumulated. When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq context, we change the DB state to FLOW_CONTROL, and begin resuming all the QPs that are on the flow control list. This logic runs on until the flow control list is empty or we exit FLOW_CONTROL mode (due to a DB DROP upcall, for example). QPs are removed from this list, and their accumulated DB write counts written to the DB FIFO. Sets of QPs, called chunks in the code, are removed at one time. The chunk size is 64. So 64 QPs are resumed at a time, and before the next chunk is resumed, the logic waits (blocks) for the DB FIFO to drain. This prevents resuming to quickly and overflowing the FIFO. Once the flow control list is empty, the db state transitions back to NORMAL and user QPs are again allowed to write directly to the user DB register. The algorithm is designed such that if the DB write load is high enough, then all the DB writes get submitted by the kernel using this flow controlled approach to avoid DB drops. As the load lightens though, we resume to normal DB writes directly by user applications. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-14 22:44:11 -04:00
Steve Wise	a2de1499b3	RDMA/cxgb4: Advertise ~0ULL as max MR size Lustre uses a advertised max MR size of ~0ULL to indicate it should use a dma_mr. Hence advertise max MR size as ~0ULL. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:48 -07:00
Steve Wise	b298881fcf	RDMA/cxgb4: Always do GTS write if cidx_inc == CIDXINC_MASK When polling, we do a GTS update if the accumulated cidx_inc == the CQ depth / 16. However, if the CQ is large enough, Cq depth / 16 exceeds the size of the field in the GTS word. So we also need to update if cidx_inc hits CIDXINC_MASK to avoid overflowing the field. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:47 -07:00
Steve Wise	1cf24dcef4	RDMA/cxgb4: Fix QP flush logic This patch makes following fixes in QP flush logic: - correctly flushes unsignaled WRs followed by a signaled WR - supports for flushing a CQ bound to multiple QPs - resets cidx_flush if a active queue starts getting HW CQEs again - marks WQ in error when we leave RTS. This was only being done for user queues, but we need it for kernel queues too so that post_send/post_recv will start returning the appropriate error synchronously - eats unsignaled read resp CQEs. HW always inserts CQEs so we must silently discard them if the read work request was unsignaled. - handles QP flushes with pending SW CQEs. The flush and out of order completion logic has a bug where if out of order completions are flushed but not yet polled by the consumer and the qp is then flushed then we end up inserting duplicate completions. - c4iw_flush_sq() should only flush wrs that have not already been flushed. Since we already track where in the SQ we've flushed via sq.cidx_flush, just start at that point and flush any remaining. This bug only caused a problem in the presence of unsignaled work requests. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> [ Fixed sparse warning due to htonl/ntohl confusion. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:45 -07:00
Vipul Pandya	42b6a94990	RDMA/cxgb4: Use DSGLs for fastreg and adapter memory writes for T5. It enables direct DMA by HW to memory region PBL arrays and fast register PBL arrays from host memory, vs the T4 way of passing these arrays in the WR itself. The result is lower latency for memory registration, and larger PBL array support for fast register operations. This patch also updates ULP_TX_MEM_WRITE command fields for T5. Ordering bit of ULP_TX_MEM_WRITE is at bit position 22 in T5 and at 23 in T4. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-14 11:35:59 -04:00
Vipul Pandya	f079af7a11	RDMA/cxgb4: Add Support for Chelsio T5 adapter Adds support for Chelsio T5 adapter. Enables T5's Write Combining feature. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-14 11:35:58 -04:00
Vipul Pandya	422eea0a8c	RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues Add module option db_fc_threshold which is the count of active QPs that trigger automatic db flow control mode. Automatically transition to/from flow control mode when the active qp count crosses db_fc_theshold. Add more db debugfs stats On DB DROP event from the LLD, recover all the iwarp queues. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 13:22:33 -07:00
Nishanth Aravamudan	e297d9dd5c	cxgb4: use pgprot_writecombine() on powerpc Commit `fe3cc0d99d` ("powerpc: Add pgprot_writecombine") in benh's tree exposes the pgprot_writecombine() API to drivers on powerpc. cxgb4 has an open-coded version of the same, so use the common API now that it's available. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Anton Blanchard <anton@samba.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 14:18:25 +10:00
Steve Wise	ffc3f7487f	RDMA/cxgb4: Do CIDX_INC updates every 1/16 CQ depth CQE reaps This avoids the CIDX_INC overflow issue with T4A2 when running kernel RDMA applications. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:11 -07:00
Steve Wise	40dbf6ee38	RDMA/cxgb4: Fastreg NSMR fixes - Remove dsgl support - doesn't work in T4. - Wrap the immediate PBL as needed when building it in the wr. - Adjust max pbl depth allowed based on ulptx alignment requirements. - Bump the slots per SQ to 5 to allow up to 128MB fast registers. - Advertise fastreg support by default. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-09-28 10:53:50 -07:00
Steve Wise	c6d7b26791	RDMA/cxgb4: Support on-chip SQs T4 support on-chip SQs to reduce latency. This patch adds support for this in iw_cxgb4: - Manage ocqp memory like other adapter mem resources. - Allocate user mode SQs from ocqp mem if available. - Map ocqp mem to user process using write combining. - Map PCIE_MA_SYNC reg to user process. Bump uverbs ABI. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-09-28 10:46:35 -07:00
Steve Wise	93fb72e443	RDMA/cxgb4: Obtain RDMA QID ranges from LLD/FW Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-08-07 23:08:47 -07:00
Steve Wise	d37ac31ddc	RDMA/cxgb4: Support variable sized work requests T4 EQ entries are in multiples of 64 bytes. Currently the RDMA SQ and RQ use fixed sized entries composed of 4 EQ entries for the SQ and 2 EQ entries for the RQ. For optimial latency with small IO, we need to change this so the HW only needs to DMA the EQ entries actually used by a given work request. Implementation: - add wq_pidx counter to track where we are in the EQ. cidx/pidx are used for the sw sq/rq tracking and flow control. - the variable part of work requests is the SGL. Add new functions to build the SGL and/or immediate data directly in the EQ memory wrapping when needed. - adjust the min burst size for the EQ contexts to 64B. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-07-21 11:16:20 -07:00
FUJITA Tomonori	f38926aa1d	RDMA/cxgb4: Use the DMA state API instead of the pci equivalents This replace the PCI DMA state API (include/linux/pci-dma.h) with the DMA equivalents since the PCI DMA state API will be obsolete. No functional change. For further information about the background: http://marc.info/?l=linux-netdev&m=127037540020276&w=2 Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-07-06 14:01:42 -07:00
Steve Wise	f64b88433c	RDMA/cxgb4: Update some HW limits Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-05-24 21:08:03 -07:00
Steve Wise	7ec45b9234	RDMA/cxgb4: Fix overflow bug in CQ arm - wrap cq->cqidx_inc based on cq size. - optimize t4_arm_cq logic. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-05-24 21:08:01 -07:00
Steve Wise	84172dee05	RDMA/cxgb4: Optimize CQ overflow detection 1) save the timestamp flit in the cq when we consume a CQE. 2) always compare the saved flit with the previous entry flit when reading the next CQE entry. If the flits don't compare, then we have overflowed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-05-24 21:08:01 -07:00
Roland Dreier	be4c9bad9d	MAINTAINERS: Add cxgb4 and iw_cxgb4 entries Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-05-05 14:45:40 -07:00
Steve Wise	cfdda9d764	RDMA/cxgb4: Add driver for Chelsio T4 RNIC Add an RDMA/iWARP driver for Chelsio T4 Ethernet adapters. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-04-21 15:30:06 -07:00

30 Commits