linux

Commit Graph

Author	SHA1	Message	Date
Eli Cohen	93b80ac297	IB/mlx4: Fix endless loop in resize CQ When calling get_sw_cqe() we need pass the consumer_index and not the masked value. Failure to do so will cause incorrect result of get_sw_cqe() possibly leading to endless loop. This problem was reported and analyzed by Michael Rice from HP. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 10:24:17 -08:00
Linus Torvalds	2f466d33f5	PCI changes for the v3.13 merge window: Resource management - Fix host bridge window coalescing (Alexey Neyman) - Pass type, width, and prefetchability for window alignment (Wei Yang) PCI device hotplug - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu) Power management - Remove pci_pm_complete() (Liu Chuansheng) MSI - Fail initialization if device is not in PCI_D0 (Yijing Wang) MPS (Max Payload Size) - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang) - Use pcie_set_readrq() to simplify code (Yijing Wang) - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang) SR-IOV - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas) - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang) Virtualization - Add x86 MSI masking ops (Konrad Rzeszutek Wilk) Freescale i.MX6 - Support i.MX6 PCIe controller (Sean Cross) - Increase link startup timeout (Marek Vasut) - Probe PCIe in fs_initcall() (Marek Vasut) - Fix imprecise abort handler (Tim Harvey) - Remove redundant of_match_ptr (Sachin Kamat) Renesas R-Car - Support Gen2 internal PCIe controller (Valentine Barshak) Samsung Exynos - Add MSI support (Jingoo Han) - Turn off power when link fails (Jingoo Han) - Add Jingoo Han as maintainer (Jingoo Han) - Add clk_disable_unprepare() on error path (Wei Yongjun) - Remove redundant of_match_ptr (Sachin Kamat) Synopsys DesignWare - Add irq_create_mapping() (Pratyush Anand) - Add header guards (Seungwon Jeon) Miscellaneous - Enable native PCIe services by default on non-ACPI (Andrew Murray) - Cleanup _OSC usage and messages (Bjorn Helgaas) - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas) - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman) - Remove unused pci_mem_start (Myron Stowe) - Make sysfs functions static (Sachin Kamat) - Warn on invalid return from driver probe (Stephen M. Cameron) - Remove Intel Haswell D3 delays (Todd E Brandt) - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu) - Use pci_is_pcie() to simplify code (Yijing Wang) - Use PCIe capability accessors to simplify code (Yijing Wang) - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang) - Removed unused "is_pcie" from struct pci_dev (Yijing Wang) - Simplify sysfs CPU affinity implementation (Yijing Wang)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJSgUzsAAoJEFmIoMA60/r8wmsQAJhwmtkUYR2L4T1g9smAyjJz bLm5zoC6WdywFcbTpTBfsTrS1CHIQG5akRgkEXGdr99epiho5F2lwmagWsUR4ijL 39Qn3knAUMgtNjoVXXI106h/DfTyxSmkZBfih2AQFyWobJq+0kg7hjQQA3+836b4 8ssWr1+NSl6JJTqYQ0Paw1kSqvvYoXsu5rWFEfCHk8D0s/1bvr5ldAUpk2jTg93I uo9/5+O264yt1YoKZOMqAMZLUfd5DaWY1mV3yeF0Uauy1pBmol5csE8ckqJPDrES PRdJT1+PhBeLYWcgXANOBZsW58ddxA0pQ5jQV6VJHQWsm5cE82OBpYJf6xUZ2moV o6DZ0KRnCPVA3NllYYR16H+wbMfADwwO83QoA+QTIZJy/WgpDH3Cst+m8KePGqbL uFgDdXSws9Bs1BCFs7bfYzAM3OdkBFnn+ac7JoPXKP5ibgAp9nDlurgK2r90zRnp j15vHMx0mV+e8B8/iwiW5eRtg7NoCHYiNfFy7JalOlsPmYr2KFazBVKclp13Hng7 fe/Jy6X4UhWoQPdqsy4ftvSQb0gm1MClxFJeZ3VAt6LY9j8OP6S/Vdf6lpAL85KR lAQoQzB+lOhTPdXxFY2xgGkITkqPDOQMjPfowYUYFwybqBuG6BHXZPJobL+niBlb Nh+M2WlUUA9Z3V6rWJB6 =CTPk -----END PGP SIGNATURE----- Merge tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Resource management - Fix host bridge window coalescing (Alexey Neyman) - Pass type, width, and prefetchability for window alignment (Wei Yang) PCI device hotplug - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu) Power management - Remove pci_pm_complete() (Liu Chuansheng) MSI - Fail initialization if device is not in PCI_D0 (Yijing Wang) MPS (Max Payload Size) - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang) - Use pcie_set_readrq() to simplify code (Yijing Wang) - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang) SR-IOV - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas) - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang) Virtualization - Add x86 MSI masking ops (Konrad Rzeszutek Wilk) Freescale i.MX6 - Support i.MX6 PCIe controller (Sean Cross) - Increase link startup timeout (Marek Vasut) - Probe PCIe in fs_initcall() (Marek Vasut) - Fix imprecise abort handler (Tim Harvey) - Remove redundant of_match_ptr (Sachin Kamat) Renesas R-Car - Support Gen2 internal PCIe controller (Valentine Barshak) Samsung Exynos - Add MSI support (Jingoo Han) - Turn off power when link fails (Jingoo Han) - Add Jingoo Han as maintainer (Jingoo Han) - Add clk_disable_unprepare() on error path (Wei Yongjun) - Remove redundant of_match_ptr (Sachin Kamat) Synopsys DesignWare - Add irq_create_mapping() (Pratyush Anand) - Add header guards (Seungwon Jeon) Miscellaneous - Enable native PCIe services by default on non-ACPI (Andrew Murray) - Cleanup _OSC usage and messages (Bjorn Helgaas) - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas) - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman) - Remove unused pci_mem_start (Myron Stowe) - Make sysfs functions static (Sachin Kamat) - Warn on invalid return from driver probe (Stephen M. Cameron) - Remove Intel Haswell D3 delays (Todd E Brandt) - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu) - Use pci_is_pcie() to simplify code (Yijing Wang) - Use PCIe capability accessors to simplify code (Yijing Wang) - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang) - Removed unused "is_pcie" from struct pci_dev (Yijing Wang) - Simplify sysfs CPU affinity implementation (Yijing Wang)" * tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (79 commits) PCI: Enable upstream bridges even for VFs on virtual buses PCI: Add pci_upstream_bridge() PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() PCI: Warn on driver probe return value greater than zero PCI: Drop warning about drivers that don't use pci_set_master() PCI: Workaround missing pci_set_master in pci drivers powerpc/pci: Use pci_is_pcie() to simplify code [fix] PCI: Update pcie_ports 'auto' behavior for non-ACPI platforms PCI: imx6: Probe the PCIe in fs_initcall() PCI: Add R-Car Gen2 internal PCI support PCI: imx6: Remove redundant of_match_ptr PCI: Report pci_pme_active() kmalloc failure mn10300/PCI: Remove useless pcibios_last_bus frv/PCI: Remove pcibios_last_bus PCI: imx6: Increase link startup timeout PCI: exynos: Remove redundant of_match_ptr PCI: imx6: Fix imprecise abort handler PCI: Fail MSI/MSI-X initialization if device is not in PCI_D0 PCI: imx6: Remove redundant dev_err() in imx6_pcie_probe() x86/PCI: Coalesce multiple overlapping host bridge windows ...	2013-11-14 14:02:00 +09:00
Al Viro	441a9d0e1e	qib_fs: fix (some) dcache abuses * lookup_one_len() really wants i_mutex held on directory. * leaks galore - just mount ipathfs, then cd /sys/bus/pci/drivers/qib_ib; echo ::. >unbind on a box with that card present and try to umount ipathfs... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-13 08:08:19 -05:00
Nicholas Bellinger	04d9cd1224	ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call This patch avoids a duplicate iscsit_increment_maxcmdsn() call for ISER_IB_RDMA_WRITE within isert_map_rdma() + isert_reg_rdma_frwr(), which will already be occuring once during isert_put_datain() -> iscsit_build_rsp_pdu() operation. It also removes the local conn->stat_sn assignment + increment, and changes the third parameter to iscsit_build_rsp_pdu() to signal this should be done by iscsi_target_mode code. Tested-by: Moussa Ba <moussaba@micron.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-11-12 18:05:07 -08:00
Vu Pham	f01b9f7339	iser-target: Avoid using FRMR for single dma entry requests This patch changes isert_reg_rdma_frwr() to not use FRMR for single dma entry requests from small I/Os, in order to avoid the associated memory registration overhead. Using DMA MR is sufficient here for the single dma entry requests, and addresses a >= v3.12 performance regression. Signed-off-by: Vu Pham <vu@mellanox.com> Cc: <stable@vger.kernel.org> # v3.12+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-11-12 13:07:52 -08:00
Michal Nazarewicz	352b905635	RDMA/cma: Remove unused argument and minor dead code The dev variable is never assigned after being initialised. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-11 10:46:54 -08:00
Sean Hefty	c6b21824c9	RDMA/ucma: Discard events for IDs not yet claimed by user space Problem reported by Avneesh Pant <avneesh.pant@oracle.com>: It looks like we are triggering a bug in RDMA CM/UCM interaction. The bug specifically hits when we have an incoming connection request and the connecting process dies BEFORE the passive end of the connection can process the request i.e. it does not call rdma_get_cm_event() to retrieve the initial connection event. We were able to triage this further and have some additional information now. In the example below when P1 dies after issuing a connect request as the CM id is being destroyed all outstanding connects (to P2) are sent a reject message. We see this reject message being received on the passive end and the appropriate CM ID created for the initial connection message being retrieved in cm_match_req(). The problem is in the ucma_event_handler() code when this reject message is delivered to it and the initial connect message itself HAS NOT been delivered to the client. In fact the client has not even called rdma_cm_get_event() at this stage so we haven't allocated a new ctx in ucma_get_event() and updated the new connection CM_ID to point to the new UCMA context. This results in the reject message not being dropped in ucma_event_handler() for the new connection request as the (if (!ctx->uid)) block is skipped since the ctx it refers to is the listen CM id context which does have a valid UID associated with it (I believe the new CMID for the connection initially uses the listen CMID -> context when it is created in cma_new_conn_id). Thus the assumption that new events for a connection can get dropped in ucma_event_handler() is incorrect IF the initial connect request has not been retrieved in the first case. We end up getting a CM Reject event on the listen CM ID and our upper layer code asserts (in fact this event does not even have the listen_id set as that only gets set up librdmacm for connect requests). The solution is to verify that the cm_id being reported in the event is the same as the cm_id referenced by the ucma context. A mismatch indicates that the ucma context corresponds to the listen. This fix was validated by using a modified version of librdmacm that was able to verify the problem and see that the reject message was indeed dropped after this patch was applied. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-11 10:45:06 -08:00
$Upinder Malhi $umalhi$$ Upinder Malhi $umalhi$	180771a370	IB/core: Add Cisco usNIC rdma node and transport types This patch adds new rdma node and new rdma transport, and supporting code used by Cisco's low latency driver called usNIC. usNIC uses its own transport, distinct from IB and iWARP. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-09 02:36:25 -08:00
Dave Jones	4127c365c9	RDMA/nes: Remove self-assignment from nes_query_qp() Assigning a value to itself is pointless. Spotted with coverity, no hardware to test. Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-09 02:34:27 -08:00
Bart Van Assche	cd4e38542a	IB/srp: Report receive errors correctly The IB spec does not guarantee that the opcode is available in error completions. Hence do not rely on it. See also commit `948d1e889e` ("IB/srp: Introduce srp_handle_qp_err()"). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> # v3.8 Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:18 -08:00
Bart Van Assche	99b6697a50	IB/srp: Avoid offlining operational SCSI devices If SCSI commands are submitted with a SCSI request timeout that is lower than the the IB RC timeout, it can happen that the SCSI error handler has already started device recovery before transport layer error handling starts. So it can happen that the SCSI error handler tries to abort a SCSI command after it has been reset by srp_rport_reconnect(). Tell the SCSI error handler that such commands have finished and that it is not necessary to continue its recovery strategy for commands that have been reset by srp_rport_reconnect(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:18 -08:00
Vu Pham	65d7dd2f34	IB/srp: Remove target from list before freeing Scsi_Host structure Remove an SRP target from the SRP target list before invoking the last scsi_host_put() call. This change is necessary because that last put frees the memory that holds the srp_target_port structure. This patch prevents the following kernel oops: RIP: 0010:[<ffffffff810b00d0>] __lock_acquire+0x500/0x1570 Call Trace: [<ffffffff810b11e4>] lock_acquire+0xa4/0x120 [<ffffffff81531206>] _spin_lock+0x36/0x70 [<ffffffffa01b6d8f>] srp_remove_work+0xef/0x180 [ib_srp] [<ffffffff8109125c>] worker_thread+0x21c/0x3d0 [<ffffffff81096e86>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 Signed-off-by: Vu Pham <vuhuong@mellanox.com> [ bvanassche - Modified path description and CC'ed stable. ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:17 -08:00
Jack Wang	71444b9781	IB/srp: Add change_queue_depth and change_queue_type support Currently, it's not possible to change queue depth for a device behind SRP host. Sometimes, we need to adjust queue_depth for performance reason (eg storage busy, we need lower queue_depth to avoid running into SCSI error handler), so this patch add support for SRP driver. Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:17 -08:00
Bart Van Assche	4d73f95f70	IB/srp: Make queue size configurable Certain storage configurations, e.g. a sufficiently large array of hard disks in a RAID configuration, need a queue depth above 64 to achieve optimal performance. Hence make the queue depth configurable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Tested-by: Jack Wang <xjtuwjp@gmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:17 -08:00
Bart Van Assche	b81d00bddf	IB/srp: Introduce srp_alloc_req_data() This patch does not change any functionality. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Cc: Roland Dreier <roland@purestorage.com> Cc: Vu Pham <vu@mellanox.com> Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:17 -08:00
Bart Van Assche	848b3082db	IB/srp: Export sgid to sysfs On an initiator system with multiple IB ports it is not yet possible to figure out what the originating port of an SRP connection is. Hence make the source GID available in sysfs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:16 -08:00
Bart Van Assche	a95cadb9da	IB/srp: Add periodic reconnect functionality After a transport layer occurred, periodically try to reconnect to the target until the dev_loss timer expires. Protect the callback functions that can be invoked from inside the SCSI EH against concurrent invocation with srp_reconnect_rport() via the rport mutex. Change the default dev_loss_tmo from 60s into 600s to give the reconnect mechanism a chance to kick in. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:16 -08:00
Bart Van Assche	8c64e4531c	scsi_transport_srp: Add periodic reconnect support Add support for periodically reconnecting to an SRP target until the dev_loss timer expires. After the tenth reconnection attempt, gradually slow down subsequent reconnect attempts. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:16 -08:00
Bart Van Assche	c1120f8981	IB/srp: Start timers if a transport layer error occurs Start the reconnect timer, fast_io_fail timer and dev_loss timers if a transport layer error occurs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:15 -08:00
Bart Van Assche	ed9b2264fb	IB/srp: Use SRP transport layer error recovery Enable fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP initiator. Add kernel module parameters that allow to specify default values for these parameters. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:15 -08:00
Bart Van Assche	9dd69a600a	IB/srp: Keep rport as long as the IB transport layer Keep the rport data structure around after srp_remove_host() has finished until cleanup of the IB transport layer has finished completely. This is necessary because later patches use the rport pointer inside the queuecommand callback. Without this patch accessing the rport from inside a queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback could get invoked after srp_remove_host() has finished. In other words, without this patch the queuecommand callback can get invoked after the rport data structure has been freed. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:15 -08:00
Vu Pham	7bb312e4a2	IB/srp: Make transport layer retry count configurable Allow the InfiniBand RC retry count to be configured by the user as an option in the target login string. Reducing this retry count allows to reduce the path failover time. Signed-off-by: Vu Pham <vu@mellanox.com> [ bvanassche: Rewrote patch description / changed default retry count ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:15 -08:00
Mike Marciniszyn	2fadd83184	IB/qib: Fix txselect regression Commit 7fac33014f54("IB/qib: checkpatch fixes") was overzealous in removing a simple_strtoul for a parse routine, setup_txselect(). That routine is required to handle a multi-value string. Unwind that aspect of the fix. Cc: <stable@vger.kernel.org> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:12 -08:00
Mike Marciniszyn	78a5886472	IB/qib: Fix checkpatch __packed warnings Convert __attribute__ ((packed)) to __packed. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:12 -08:00
Jan Kara	603e772992	IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() qib_user_sdma_queue_pkts() gets called with mmap_sem held for writing. Except for get_user_pages() deep down in qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even more interestingly the function qib_user_sdma_queue_pkts() (and also qib_user_sdma_coalesce() called somewhat later) call copy_from_user() which can hit a page fault and we deadlock on trying to get mmap_sem when handling that fault. So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and leave mmap_sem locking for mm. This deadlock has actually been observed in the wild when the node is under memory pressure. Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:11 -08:00
Jan Kara	4adcf7fb67	IB/ipath: Convert ipath_user_sdma_pin_pages() to use get_user_pages_fast() ipath_user_sdma_queue_pkts() gets called with mmap_sem held for writing. Except for get_user_pages() deep down in ipath_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even more interestingly the function ipath_user_sdma_queue_pkts() (and also ipath_user_sdma_coalesce() called somewhat later) call copy_from_user() which can hit a page fault and we deadlock on trying to get mmap_sem when handling that fault. So just make ipath_user_sdma_pin_pages() use get_user_pages_fast() and leave mmap_sem locking for mm. This deadlock has actually been observed in the wild when the node is under memory pressure. Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> [ Merged in fix for call to get_user_pages_fast from Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:11 -08:00
Naresh Gottumukkala	d5e3f37833	RDMA/ocrdma: Remove redundant check in ocrdma_build_fr() Remove the redundant check of comparing if a 32-bit value is greater than 0xffffffffULL. Reported by Dan Carpenter. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:06 -08:00
Naresh Gottumukkala	1852d1da3b	RDMA/ocrdma: Fix a crash in rmmod 1) ocrdma_remove_free() is called from a call_rcu callback funtion context, which can be a bottom-half context. So the code in ocrdma_remove_free should not sleep. But ocrdma_cleanup_hw() can sleep, So move it ocrdma_remove() instead of ocrdma_remove_free. 2) Fix a couple of kbuild test robot warnings. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:06 -08:00
Dan Carpenter	6ebacdfc07	RDMA/ocrdma: Silence an integer underflow warning We recently added a cap on "max_wqe_allocated" in `43a6b4025c` ('RDMA/ocrdma: Create IRD queue fix'). My static checker complains that the cap has a problem because it casts large values to negative. "attrs->cap.max_send_wr" is a u32. It comes from the user, but it's capped in ocrdma_check_qp_params() so it can't wrap here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:05 -08:00
Eli Cohen	1b77d2bd75	mlx5: Use enum to indicate adapter page size The Connect-IB adapter has an inherent page size which equals 4K. Define an new enum that equals the page shift and use it instead of using the value 12 throughout the code. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:01 -08:00
Eli Cohen	c2a3431e61	IB/mlx5: Update opt param mask for RTS2RTS RTS to RTS transition should allow update of alternate path. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:01 -08:00
Eli Cohen	07c9113fe8	IB/mlx5: Remove "Always false" comparison mlx5_cur and mlx5_new cannot have negative values so remove the redundant condition. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:01 -08:00
Eli Cohen	2d036fad94	IB/mlx5: Remove dead code in mr.c In mlx5_mr_cache_init() the size variable is not used so remove it to avoid compiler warnings when running with make W=1. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:00 -08:00
Eli Cohen	bf0bf77f65	mlx5: Support communicating arbitrary host page size to firmware Connect-IB firmware requires 4K pages to be communicated with the driver. This patch breaks larger pages to 4K units to enable support for architectures utilizing larger page size, such as PowerPC. This patch also fixes several places that referred to PAGE_SHIFT instead of explicit 12 which is the inherent page shift on Connect-IB. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:00 -08:00
Moshe Lazer	cfd8f1d49b	IB/mlx5: Fix srq free in destroy qp On destroy QP the driver walks over the relevant CQ and removes CQEs reported for the destroyed QP. It also frees the related SRQ entry without checking that this is actually an SRQ-related CQE. In case of a CQ used for both send and receive QP, we could free SRQ entries for send CQEs. This patch resolves this issue by verifying that this is a SRQ related CQE by checking the SRQ number in the CQE is not zero. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00
Eli Cohen	1faacf82df	IB/mlx5: Simplify mlx5_ib_destroy_srq Make use of destroy_srq_kernel() to clear SRQ resouces. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00
Eli Cohen	9641b74ebe	IB/mlx5: Fix overflow check in IB_WR_FAST_REG_MR Make sure not to overflow when reading the page list from struct ib_fast_reg_page_list. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00
Eli Cohen	746b5583c1	IB/mlx5: Multithreaded create MR Use asynchronous commands to execute up to eight concurrent create MR commands. This is to fill memory caches faster so we keep consuming from there. Also, increase timeout for shrinking caches to five minutes. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00
Eli Cohen	51ee86a4af	IB/mlx5: Fix check of number of entries in create CQ Verify that the value is non negative before rounding up to power of 2. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:58 -08:00
Mathias Krause	5476781bb9	IB/netlink: Remove superfluous RDMA_NL_GET_OP() masking 'op' is the already RDMA_NL_GET_OP() masked 'type'. No need to mask it again. Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:54 -08:00
Latchesar Ionkov	6b7d103c1b	IB/core: Pass imm_data from ib_uverbs_send_wr to ib_send_wr correctly Currently, we don't copy the immediate data from the userspace struct to the kernel one when UD messages are being sent. This patch makes sure that the immediate data is set correctly. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:54 -08:00
Michal Schmidt	7f1a38671c	IPoIB: lower NAPI weight Since commit `82dc3c63c6` ("net: introduce NAPI_POLL_WEIGHT") netif_napi_add() produces an error message if a NAPI poll weight greater than 64 is requested. Use the standard NAPI weight. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:50 -08:00
Erez Shitrit	94232d9ce8	IPoIB: Start multicast join process only on active ports The driver starts the mcast_join task whenever the netdev interface is UP without relation to the underlying IB port state. Until the port state is ACTIVE all the join requests are irrelevant, and the IB core returns -EINVAL. So the user will see errors such as: "multicast join failed for ff12:401b:... , status -22". Instead, have ipoib_mcast_join_task() return when the port is not active. It will be called again when the port state is changed and the low-level driver triggers the IB_EVENT_PORT_ACTIVE event or the IB_EVENT_CLIENT_REREGISTER event. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:49 -08:00
Erez Shitrit	a39c52ab88	IPoIB: Add path query flushing in ipoib_ib_dev_cleanup The path_rec_completion() callback may be invoked asynchronously even at the middle of "driver uninit" process. This can lead to scheduling a task that tries to touch members of the priv object that are no longer valid. For example the function cm_create_tx_qp can attempt to create qp with no valid priv->pd object. The following crash is one of the results: RIP: 0010:[<ffffffffa021bb47>] [<ffffffffa021bb47>] ipoib_cm_create_tx_qp+0x57/0x90 [ib_ipoib] Process ipoib (pid: 5916, threadinfo ffff8803786e4000, task ffff8804150e1500) Stack: Call Trace: [<ffffffff81309ef0>] ? get_random_bytes+0x20/0x30 [<ffffffffa021be2a>] ipoib_cm_tx_init+0xca/0x340 [ib_ipoib] [<ffffffffa021f765>] ipoib_cm_tx_start+0x215/0x3f0 [ib_ipoib] [<ffffffffa021f550>] ? ipoib_cm_tx_start+0x0/0x3f0 [ib_ipoib] [<ffffffff8108b2b0>] worker_thread+0x170/0x2a0 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8108b140>] ? worker_thread+0x0/0x2a0 [<ffffffff81090886>] kthread+0x96/0xa0 [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffff810907f0>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20 Fix that by flushing all pending path queries at this point. Signed-off-by: Alex Markuze <markuze@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:49 -08:00
Erez Shitrit	a9c8ba5884	IPoIB: Fix usage of uninitialized multicast objects The driver should avoid calling ib_sa_free_multicast on the mcast->mc object until it finishes its initialization state. Otherwise we can crash when ipoib_mcast_dev_flush() attempts to use the uninitialized multicast object. Instead, only call wait_for_completion() for multicast entries that started the join process, meaning that ib_sa_join_multicast() finished. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:49 -08:00
Erez Shitrit	aede25011f	IPoIB: Avoid flushing the driver workqueue on dev_down The driver should not flush the whole workqueue when only one work (the pkey poll one) needs to be cancelled. Use cancel_delayed_work_sync() instead. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:49 -08:00
Erez Shitrit	f47944cc2d	IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush() When ipoib interface is going down it takes all of its children with it, under mutex. For each child, dev_change_flags() is called. That function calls ipoib_stop() via the ndo, and causes flush of the workqueue. Sometimes in the workqueue an __ipoib_dev_flush work() is waiting and when invoked tries to get the same mutex, which leads to a deadlock, as seen below. The solution is to switch to rw-sem instead of mutex. The deadlock: [11028.165303] [<ffffffff812b0977>] ? vgacon_scroll+0x107/0x2e0 [11028.171844] [<ffffffff814eaac5>] schedule_timeout+0x215/0x2e0 [11028.178465] [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80 [11028.185962] [<ffffffff814ea743>] wait_for_common+0x123/0x180 [11028.192491] [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20 [11028.199504] [<ffffffff814ea85d>] wait_for_completion+0x1d/0x20 [11028.206224] [<ffffffff8108b4f1>] flush_cpu_workqueue+0x61/0x90 [11028.212948] [<ffffffff8108b5a0>] ? wq_barrier_func+0x0/0x20 [11028.219375] [<ffffffff8108bfc4>] flush_workqueue+0x54/0x80 [11028.225712] [<ffffffffa05a0576>] ipoib_mcast_stop_thread+0x66/0x90 [ib_ipoib] [11028.233988] [<ffffffffa059ccea>] ipoib_ib_dev_down+0x6a/0x100 [ib_ipoib] [11028.241678] [<ffffffffa059849a>] ipoib_stop+0x8a/0x140 [ib_ipoib] [11028.248692] [<ffffffff8142adf1>] dev_close+0x71/0xc0 [11028.254447] [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0 [11028.261062] [<ffffffffa059851b>] ipoib_stop+0x10b/0x140 [ib_ipoib] [11028.268172] [<ffffffff8142adf1>] dev_close+0x71/0xc0 [11028.273922] [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0 [11028.280452] [<ffffffff8148f20b>] devinet_ioctl+0x5eb/0x6a0 [11028.286786] [<ffffffff814903b8>] inet_ioctl+0x88/0xa0 [11028.292633] [<ffffffff8141591a>] sock_ioctl+0x7a/0x280 [11028.298576] [<ffffffff81189012>] vfs_ioctl+0x22/0xa0 [11028.304326] [<ffffffff81140540>] ? unmap_region+0x110/0x130 [11028.310756] [<ffffffff811891b4>] do_vfs_ioctl+0x84/0x580 [11028.316897] [<ffffffff81189731>] sys_ioctl+0x81/0xa0 and 11028.017533] [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80 [11028.025030] [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 [11028.031945] [<ffffffff814eb2ae>] __mutex_lock_slowpath+0x13e/0x180 [11028.039053] [<ffffffff814eb14b>] mutex_lock+0x2b/0x50 [11028.044910] [<ffffffffa059f7e7>] __ipoib_ib_dev_flush+0x37/0x210 [ib_ipoib] [11028.052894] [<ffffffffa059fa00>] ? ipoib_ib_dev_flush_light+0x0/0x20 [ib_ipoib] [11028.061363] [<ffffffffa059fa17>] ipoib_ib_dev_flush_light+0x17/0x20 [ib_ipoib] [11028.069738] [<ffffffff8108b120>] worker_thread+0x170/0x2a0 [11028.076068] [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40 [11028.083374] [<ffffffff8108afb0>] ? worker_thread+0x0/0x2a0 [11028.089709] [<ffffffff81090626>] kthread+0x96/0xa0 [11028.095266] [<ffffffff8100c0ca>] child_rip+0xa/0x20 [11028.100921] [<ffffffff81090590>] ? kthread+0x0/0xa0 [11028.106573] [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 [11028.112423] INFO: task ifconfig:23640 blocked for more than 120 seconds. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:49 -08:00
Tal Alon	22252b4e09	IPoIB: Change CM skb memory allocation to be non-atomic during init Change CM skb memory allocation to use GFP_KERNEL when possible. During device init there's no need to use GFP_ATOMIC when allocating memory for the CM skbs -- use GFP_KERNEL instead. Signed-off-by: Tal Alon <talal@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:48 -08:00
Erez Shitrit	c2bb5628db	IPoIB: Fix crash in dev_open error flow If napi has never been enabled when calling ipoib_ib_dev_stop, a kernel crash occurs, because the verbs layer completion handler (ipoib_ib_completion) calls napi_schedule unconditionally. If the napi structure passed in the napi_schedule call has not been initialized, napi will crash. The cleanest solution is to simply enable napi before calling ipoib_ib_dev_stop in the dev_open error flow. (dev_stop then immediately disables napi). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:48 -08:00
Ben Hutchings	649fb5ec0e	IB/cxgb4: Fix formatting of physical address Physical addresses may be wider than virtual addresses (e.g. on i386 with PAE) and must not be formatted with %p. Compile-tested only. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:30 -08:00
Doug Ledford	be9130cc92	IB/cma: Check for GID on listening device first As a simple optimization that should speed up the vast majority of connect attemps on IB devices, when we are searching for the GID of an incoming connection in the cached GID lists of devices, search the device that received the incoming connection request first. If we don't find it there, then move on to other devices. This reduces the time to perform 10,000 connections considerably. Prior to this patch, a bad run of cmtime would look like this: connect : 12399.26 12351.10 8609.00 1239.93 With this patch, it looks more like this: connect : 5864.86 5799.80 8876.00 586.49 Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:24 -08:00
Doug Ledford	29f27e8477	IB/cma: Use cached gids The cma_acquire_dev function was changed by commit `3c86aa70bf` ("RDMA/cm: Add RDMA CM support for IBoE devices") to use find_gid_port() because multiport devices might have either IB or IBoE formatted gids. The old function assumed that all ports on the same device used the same GID format. However, when it was changed to use find_gid_port(), we inadvertently lost usage of the GID cache. This turned out to be a very costly change. In our testing, each iteration through each index of the GID table takes roughly 35us. When you have multiple devices in a system, and the GID you are looking for is on one of the later devices, the code loops through all of the GID indexes on all of the early devices before it finally succeeds on the target device. This pathological search behavior combined with 35us per GID table index retrieval results in results such as the following from the cmtime application that's part of the latest librdmacm git repo: ib1: step total ms max ms min us us / conn create id : 29.42 0.04 1.00 2.94 bind addr : 186705.66 19.00 18556.00 18670.57 resolve addr : 41.93 9.68 619.00 4.19 resolve route: 486.93 0.48 101.00 48.69 create qp : 4021.95 6.18 330.00 402.20 connect : 68350.39 68588.17 24632.00 6835.04 disconnect : 1460.43 252.65-1862269.00 146.04 destroy : 41.16 0.04 2.00 4.12 ib0: step total ms max ms min us us / conn create id : 28.61 0.68 1.00 2.86 bind addr : 2178.86 2.95 201.00 217.89 resolve addr : 51.26 16.85 845.00 5.13 resolve route: 620.08 0.43 92.00 62.01 create qp : 3344.40 6.36 273.00 334.44 connect : 6435.99 6368.53 7844.00 643.60 disconnect : 5095.38 321.90 757.00 509.54 destroy : 37.13 0.02 2.00 3.71 Clearly, both the bind address and connect operations suffer a huge penalty for being anything other than the default GID on the first port in the system. After applying this patch, the numbers now look like this: ib1: step total ms max ms min us us / conn create id : 30.15 0.03 1.00 3.01 bind addr : 80.27 0.04 7.00 8.03 resolve addr : 43.02 13.53 589.00 4.30 resolve route: 482.90 0.45 100.00 48.29 create qp : 3986.55 5.80 330.00 398.66 connect : 7141.53 7051.29 5005.00 714.15 disconnect : 5038.85 193.63 918.00 503.88 destroy : 37.02 0.04 2.00 3.70 ib0: step total ms max ms min us us / conn create id : 34.27 0.05 1.00 3.43 bind addr : 26.45 0.04 1.00 2.64 resolve addr : 38.25 10.54 760.00 3.82 resolve route: 604.79 0.43 97.00 60.48 create qp : 3314.95 6.34 273.00 331.49 connect : 12399.26 12351.10 8609.00 1239.93 disconnect : 5096.76 270.72 1015.00 509.68 destroy : 37.10 0.03 2.00 3.71 It's worth noting that we still suffer a bit of a penalty on connect to the wrong device, but the penalty is much less than it used to be. Follow on patches deal with this penalty. Many thanks to Neil Horman for helping to track the source of slow function that allowed us to track down the fact that the original patch I mentioned above backed out cache usage and identify just how much that impacted the system. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:24 -08:00
Jack Morgenstein	571b8b92c7	net/mlx4_core: Initialize all mailbox buffers to zero before use To guarantee that all unused fields in all FW commands for both inboxes and outboxes are zeroed out, initialize the mailbox buffer to all zeroes. This is especially important for SRIOV comm-channel virtual commands (such as QUERY_FUNC_CAP), where if new fields are added to support new features, the driver can depend on older kernels passing zeroes in these fields. In addition to zeroing out the mailbox buffer at allocation time, all (now unnecessary) calls to memset by the callers of mlx4_alloc_cmd_mailbox() are removed. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-07 19:22:47 -05:00
Eyal Perry	eb072c4b8d	RDMA/cma: Set IBoE SL (user-priority) by egress map when using vlans On top of commit `366cddb40` "IB/rdma_cm: TOS <=> UP mapping for IBoE", add support for case vlan egress map is used. When the IBoE session is being set over a vlan, inherit the socket priority to vlan priority mapping which was configured for the vlan device egress map. Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-07 19:09:44 -05:00
Nicholas Bellinger	4a9a6c8d53	target: Drop left-over se_lun->lun_cmd_list shutdown code Now with percpu refcounting for se_lun in place, go ahead and drop the legacy per se_cmd accounting for se_lun shutdown. This includes __transport_clear_lun_from_sessions(), the associated transport_lun_wait_for_tasks() logic, along with a handful of now unused se_cmd structure members and ->transport_state bits. Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-11-07 14:25:02 -08:00
Nicholas Bellinger	95b60f0788	ib_isert: Add support for completion interrupt coalescing This patch adds support for completion interrupt coalescing that allows only every ISERT_COMP_BATCH_COUNT (8) to set IB_SEND_SIGNALED, thus avoiding completion interrupts for every posted iser_tx_desc. The batch processing is done using a per isert_conn llist that once IB_SEND_SIGNALED has been set is saved to tx_desc->comp_llnode_batch, and completion processing of previously posted iser_tx_descs is done in a single shot from within isert_send_completion() code. Note this is only done for response PDUs from ISCSI_OP_SCSI_CMD, and all other control type of PDU responses will force an implicit batch drain to occur. Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-11-06 12:48:22 -08:00
Jack Morgenstein	5a0d0a6161	mlx4: Structures and init/teardown for VF resource quotas This is step #1 for implementing SRIOV resource quotas for VFs. Quotas are implemented per resource type for VFs and the PF, to prevent any entity from simply grabbing all the resources for itself and leaving the other entities unable to obtain such resources. Resources which are allocated using quotas: QPs, CQs, SRQs, MPTs, MTTs, MAC, VLAN, and Counters. The quota system works as follows: Each entity (VF or PF) is given a max number of a given resource (its quota), and a guaranteed minimum number for each resource (starvation prevention). For QPs, CQs, SRQs, MPTs and MTTs: 50% of the available quantity for the resource is divided equally among the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool. The other 50% is the "free pool", allocated on a first-come-first-serve basis. For each VF/PF, resources are first allocated from its "guaranteed-minimum" pool. When that pool is exhausted, the driver attempts to allocate from the resource "free-pool". The quota (i.e., max) for the VFs and the PF is: The free-pool amount (50% of the real max) + the guaranteed minimum For MACs: Guarantee 2 MACs per VF/PF per port. As a result, since we have only 128 MACs per port, reduce the allowable number of VFs from 64 to 63. Any remaining MACs are put into a free pool. For VLANs: For the PF, the per-port quota is 128 and guarantee is 64 (to allow the PF to register at least a VLAN per VF in VST mode). For the VFs, the per-port quota is 64 and the guarantee is 0. We assume that VGT VFs are trusted not to abuse the VLAN resource. For Counters: For all functions (PF and VFs), the quota is 128 and the guarantee is 0. In this patch, we define the needed structures, which are added to the resource-tracker struct. In addition, we do initialization for the resource quota, and adjust the query_device response to use quotas rather than resource maxima. As part of the implementation, we introduce a new field in mlx4_dev: quotas. This field holds the resource quotas used to report maxima to the upper layers (ib_core, via query_device). The HCA maxima of these values are passed to the VFs (via QUERY_HCA) so that they may continue to use these in handling QPs, CQs, SRQs and MPTs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 16:19:07 -05:00
David S. Miller	394efd19d5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 13:48:30 -05:00
Linus Torvalds	acda24c47e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target fixes from Nicholas Bellinger: "Here are the outstanding target pending fixes for v3.12-rc7. This includes a number of EXTENDED_COPY related fixes as a result of Thomas and Doug's continuing testing and feedback. Also included is an important vhost/scsi fix that addresses a long standing issue where the 'write' parameter for get_user_pages_fast() was incorrectly set for virtio-scsi WRITEs -> DMA_TO_DEVICE, and not for virtio-scsi READs -> DMA_FROM_DEVICE. This resulted in random userspace segfaults and other unpleasantness on KVM host, and unfortunately has been an issue since the initial merge of vhost/scsi in v3.6. This patch is CC'ed to stable, along with two other less critical items" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter target/pscsi: fix return value check target: Fail XCOPY for non matching source + destination block_size target: Generate failure for XCOPY I/O with non-zero scsi_status target: Add missing XCOPY I/O operation sense_buffer iser-target: check device before dereferencing its variable target: Return an error for WRITE SAME with ANCHOR==1 target: Fix assignment of LUN in tracepoints target: Reject EXTENDED_COPY when emulate_3pc is disabled target: Allow non zero ListID in EXTENDED_COPY parameter list target: Make target_do_xcopy failures return INVALID_PARAMETER_LIST	2013-10-27 10:16:33 -07:00
Vu Pham	0a66614b93	iser-target: check device before dereferencing its variable This patch changes isert_connect_release() to correctly check for the existence struct isert_device *device before checking for isert_device->use_frwr. Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-10-23 21:42:33 -07:00
David S. Miller	c3fa32b976	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-23 16:49:34 -04:00
Yann Droneaud	7afbddfae9	IB/core: Temporarily disable create_flow/destroy_flow uverbs The create_flow/destroy_flow uverbs and the associated extensions to the user-kernel verbs ABI are under review and are too experimental to freeze at this point. So userspace is not exposed to experimental features and an uinstable ABI, temporarily disable this for v3.12 (with a Kconfig option behind staging to reenable it if desired). The feature will be enabled after proper cleanup for v3.13. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com [ Add a Kconfig option to reenable these verbs. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-21 09:44:17 -07:00
Linus Torvalds	ed8ada3933	Last batch of IB changes for 3.12: many mlx5 hardware driver fixes plus one trivial semicolon cleanup. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABCAAGBQJSXCYCAAoJEENa44ZhAt0hWKgP/3R5GUceKeDn9R0ENGPSpQ1/ dvsVmMccMGuJnyZSfUgoGb++6YY5rEjjj6epFlqXtkfoqUvNzDmw8nRO/Pkx+IAT e/FyrDpcpbuPyOQeGZLIAC2hoQRUPsmayfBOIN+mW8Qu3vUYTKjs12QRqDi3EP6m itJ07CfAX09LoiZ1S5QxSnEhPvR5MA7zy5ebgdk0QC+6tNcBWx7tOtCY7/BX4MnO zZL2ZVzxZbHIT7HY+gYID4QxGHFf7JvGX9ATLh9HUzOom3c1XLtdDhH/6mONsTTL BWTUJIa86DGJwY4fc6wDrOsC8DBo3o3YB98DUWUb6FQswQtx+PcyFg1dAhJuYFTQ Risjpty4y/EVfUTjBCirf2R8BLCKZyUIFL40ZJvgwhKsH569hS5sVTXMPrQNmsuY x7C17KJ1iabmtAswJCtM/aoeoodqZnAUg63aV+mbwQXQu9l06fx4UOo/TfG3tH1+ FxVVD3ord98nh77Nv+sGB7ek7x0d3XxEaP7pZscDqRTUx7TT7dXXQY9GC5qAjnfr YhE8Exxmey+oZ3y6QTYI6scF5x8j0CJlSURfzDgOpKxYnSsdhgujGaQI++e9VF+W pHAWRqAGsf3wkoMJKZI6DC3lZka81yiByeROSmk08FSVNp7SkVNjl6VC8cAxkVfM nfNjy6fP/UQp/tcHp68R =0XuA -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband updates from Roland Dreier: "Last batch of IB changes for 3.12: many mlx5 hardware driver fixes plus one trivial semicolon cleanup" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB: Remove unnecessary semicolons IB/mlx5: Ensure proper synchronization accessing memory IB/mlx5: Fix alignment of reg umr gather buffers IB/mlx5: Fix eq names to display nicely in /proc/interrupts mlx5: Fix error code translation from firmware to driver IB/mlx5: Fix opt param mask according to firmware spec mlx5: Fix opt param mask for sq err to rts transition IB/mlx5: Disable atomic operations mlx5: Fix layout of struct mlx5_init_seg mlx5: Keep polling to reclaim pages while any returned IB/mlx5: Avoid async events on invalid port number IB/mlx5: Decrease memory consumption of mr caches mlx5: Remove checksum on command interface commands IB/mlx5: Fix memory leak in mlx5_ib_create_srq IB/mlx5: Flush cache workqueue before destroying it IB/mlx5: Fix send work queue size calculation	2013-10-14 17:43:33 -07:00
Roland Dreier	59b5b28d1a	Merge branch 'misc' into for-next	2013-10-14 10:10:46 -07:00
Joe Perches	2b50176d11	IB: Remove unnecessary semicolons These aren't necessary after switch blocks. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-14 10:10:00 -07:00
Masanari Iida	8c88126bbb	treewide: Fix typo in Kconfig Correct spelling typo in Kconfig. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-10-14 15:23:02 +02:00
Eli Cohen	5431390707	IB/mlx5: Ensure proper synchronization accessing memory Call mlx5_ib_populate_pas() before mapping the DMA buffer to ensure the hardware reads the values written by the CPU. Found by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:24:00 -07:00
Eli Cohen	fe45f82704	IB/mlx5: Fix alignment of reg umr gather buffers The hardware requires that gather buffers for UMR work requests be aligned to 2K. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:59 -07:00
Sagi Grimberg	ada9f5d007	IB/mlx5: Fix eq names to display nicely in /proc/interrupts It's helpful for a driver to put the pci slot name in its interrupt names, so /proc/interrupts will show the pci slot of the device. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:59 -07:00
Eli Cohen	a4774e9095	IB/mlx5: Fix opt param mask according to firmware spec Failed to configure opt mask to configure rre from init to rtr. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:58 -07:00
Eli Cohen	75959f56fe	mlx5: Fix opt param mask for sq err to rts transition Add missing entry in the table for UC transport. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:58 -07:00
Eli Cohen	81bea28ffd	IB/mlx5: Disable atomic operations Currently Atomic operations don't work properly. Disable them for the time being. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:58 -07:00
Eli Cohen	a0c84c326f	IB/mlx5: Avoid async events on invalid port number On a single ported Connect-IB, its possible for the firmware to issue events on the non-existing 2nd port. Make sure to ignore events generated for such ports. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:57 -07:00
Eli Cohen	203099fd73	IB/mlx5: Decrease memory consumption of mr caches Change the logic so we do not allocate memory nor map the device before actually posting to the REG_UMR QP. In addition, unmap and free the memory after we get completion. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:56 -07:00
Moshe Lazer	56e1ab0f13	IB/mlx5: Fix memory leak in mlx5_ib_create_srq The patch fixes the rollback in case of failure in creating SRQ. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:55 -07:00
Moshe Lazer	3c4619114c	IB/mlx5: Flush cache workqueue before destroying it Destroying the workqueue without flushing it first can lead to a case in which the kernel tries to push a delayed work to the workqueue which does not exist anymore. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:55 -07:00
Eli Cohen	b125a54bfd	IB/mlx5: Fix send work queue size calculation 1. Make sure wqe_cnt does not exceed the limit published by firmware. 2. There is no requirement that the number of outstanding work requests will be a power of two. Remove the ilog2 in the calculation of sq.max_post to fix that. 3. Add case for IB_QPT_XRC_TGT in sq_overhead and return 0 as XRC target QPs do not have a send queue. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:55 -07:00
Bjorn Helgaas	03078633a6	IB/qib: Drop qib_tune_pcie_caps() and qib_tune_pcie_coalesce() return values The callers of qib_tune_pcie_caps() and qib_tune_pcie_coalesce() don't check the return values, so this patch drops the return values altogether. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>	2013-10-04 14:30:19 -06:00
Jack Wang	c807f64340	ib_srpt: always set response for task management The SRP specification requires: "Response data shall be provided in any SRP_RSP response that is sent in response to an SRP_TSK_MGMT request (see 6.7). The information in the RSP_CODE field (see table 24) shall indicate the completion status of the task management function." So fix this to avoid the SRP initiator interprets task management functions that succeeded as failed. Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Cc: stable@vger.kernel.org # 3.3+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-10-03 04:23:17 -07:00
Nicholas Bellinger	0b41d6ca61	ib_srpt: Destroy cm_id before destroying QP. This patch fixes a bug where ib_destroy_cm_id() was incorrectly being called after srpt_destroy_ch_ib() had destroyed the active QP. This would result in the following failed SRP_LOGIN_REQ messages: Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff1762bd, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c903009f8f41) Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff1758f9, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f8f42) Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff175941, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c90300a3cfb2) Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1) mlx4_core 0000:84:00.0: command 0x19 failed: fw status = 0x9 rejected SRP_LOGIN_REQ because creating a new RDMA channel failed. Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1) mlx4_core 0000:84:00.0: command 0x19 failed: fw status = 0x9 rejected SRP_LOGIN_REQ because creating a new RDMA channel failed. Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1) Reported-by: Navin Ahuja <navin.ahuja@saratoga-speed.com> Cc: stable@vger.kernel.org # 3.3+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-10-01 21:27:30 -07:00
Eric W. Biederman	0bbf87d852	net ipv4: Convert ipv4.ip_local_port_range to be per netns v3 - Move sysctl_local_ports from a global variable into struct netns_ipv4. - Modify inet_get_local_port_range to take a struct net, and update all of the callers. - Move the initialization of sysctl_local_ports into sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c v2: - Ensure indentation used tabs - Fixed ip.h so it applies cleanly to todays net-next v3: - Compile fixes of strange callers of inet_get_local_port_range. This patch now successfully passes an allmodconfig build. Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range Originally-by: Samya <samya@twitter.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 21:59:38 -07:00
Yijing Wang	0ce0e62f1f	IB/qib: Use pcie_set_mps() and pcie_get_mps() to simplify code Refactor qib_tune_pcie_caps(). Use pcie_get_mps(), pcie_set_mps(), pcie_get_readrq(), and pcie_set_readrq() to simplify the code. The PCI core caches the "PCIe Max Payload Size Supported" in pci_dev->pcie_mpss, so use that instead of pcie_capability_read_word(). Remove the unused val2fld() and fld2val(). Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>	2013-09-24 14:17:06 -06:00
Yijing Wang	dcaa73dc34	IB/qib: Use pci_is_root_bus() to check whether it is a root bus Use pci_is_root_bus() instead of "if (bus->parent)" statement for better readability. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>	2013-09-24 12:13:03 -06:00
Martin Schwidefsky	0244ad004a	Remove GENERIC_HARDIRQ config option After the last architecture switched to generic hard irqs the config options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code for !CONFIG_GENERIC_HARDIRQS can be removed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-09-13 15:09:52 +02:00
Linus Torvalds	48efe453e6	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "Lots of activity again this round for I/O performance optimizations (per-cpu IDA pre-allocation for vhost + iscsi/target), and the addition of new fabric independent features to target-core (COMPARE_AND_WRITE + EXTENDED_COPY). The main highlights include: - Support for iscsi-target login multiplexing across individual network portals - Generic Per-cpu IDA logic (kent + akpm + clameter) - Conversion of vhost to use per-cpu IDA pre-allocation for descriptors, SGLs and userspace page pointer list - Conversion of iscsi-target + iser-target to use per-cpu IDA pre-allocation for descriptors - Add support for generic COMPARE_AND_WRITE (AtomicTestandSet) emulation for virtual backend drivers - Add support for generic EXTENDED_COPY (CopyOffload) emulation for virtual backend drivers. - Add support for fast memory registration mode to iser-target (Vu) The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of particular significance, which make us the first and only open source target to support the full set of VAAI primitives. Currently Linux clients are lacking upstream support to actually utilize these primitives. However, with server side support now in place for folks like MKP + ZAB working on the client, this logic once reserved for the highest end of storage arrays, can now be run in VMs on their laptops" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits) target/iscsi: Bump versions to v4.1.0 target: Update copyright ownership/year information to 2013 iscsi-target: Bump default TCP listen backlog to 256 target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out iscsi-target; Bump default CmdSN Depth to 64 iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set iscsi-target: Add thread_set->ts_activate_sem + use common deallocate iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE target: remove unused including <linux/version.h> iser-target: introduce fast memory registration mode (FRWR) iser-target: generalize rdma memory registration and cleanup iser-target: move rdma wr processing to a shared function target: Enable global EXTENDED_COPY setup/release target: Add Third Party Copy (3PC) bit in INQUIRY response target: Enable EXTENDED_COPY setup in spc_parse_cdb target: Add support for EXTENDED_COPY copy offload emulation target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check target: Add global device list for EXTENDED_COPY target: Make helpers non static for EXTENDED_COPY command setup target: Make spc_parse_naa_6h_vendor_specific non static ...	2013-09-12 16:11:45 -07:00
Nicholas Bellinger	4c76251e8e	target: Update copyright ownership/year information to 2013 Update copyright ownership/year information for target-core, loopback, iscsi-target, tcm_qla2xx, vhost and iser-target. Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-09-10 20:23:36 -07:00
Vu Pham	59464ef4fb	iser-target: introduce fast memory registration mode (FRWR) This model was introduced in `00f7ec36c` "RDMA/core: Add memory management extensions support" and works when the IB device supports the IB_DEVICE_MEM_MGT_EXTENSIONS capability. Upon creating the isert device, ib_isert will test whether the HCA supports FRWR. If supported then set the flag and assign function pointers that handle fast registration and deregistration of appropriate resources (fast_reg descriptors). When new connection coming in, ib_isert will check frwr flag and create frwr resouces, if fail to do it will switch back to old model of using global dma key and turn off the frwr support. Registration is done using posting IB_WR_FAST_REG_MR to the QP and invalidations using posting IB_WR_LOCAL_INV. Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-09-10 16:48:51 -07:00
Vu Pham	d40945d8c2	iser-target: generalize rdma memory registration and cleanup Current driver uses global dma key to register the memory pointed by sg list provided by the target core. This is the preparation step for adding more methods like fast path memory registration, make the reg/unreg calls be function pointers. Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-09-10 16:48:50 -07:00
Vu Pham	90ecc6e251	iser-target: move rdma wr processing to a shared function isert_put_datain() and isert_get_dataout() share a lot of code in rdma wr processing, move this common code to a shared function. Use isert_unmap_cmd to cleanup for RDMA_READ completion. Remove duplicate field in isert_cmd and isert_rdma_wr structs Change misc debug messages to track isert_cmd Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-09-10 16:48:49 -07:00
Nicholas Bellinger	d703ce2f7f	iscsi/iser-target: Convert to command priv_size usage This command converts iscsi/isert-target to use allocations based on iscsit_transport->priv_size within iscsit_allocate_cmd(), instead of using an embedded isert_cmd->iscsi_cmd. This includes removing iscsit_transport->alloc_cmd() usage, along with updating isert-target code to use iscsit_priv_cmd(). Also, remove left-over iscsit_transport->release_cmd() usage for direct calls to iscsit_release_cmd(), and drop the now unused lio_cmd_cache and isert_cmd_cache. Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>	2013-09-09 14:29:21 -07:00
Nicholas Bellinger	6faaa85f37	iser-target: Updates for login negotiation multi-plexing support This patch updates iser-target code to support login negotiation multi-plexing. This includes only using isert_conn->conn_login_comp for the first login request PDU, pushing the subsequent processing to iscsi_conn->login_work -> iscsi_target_do_login_rx(), and turning isert_get_login_rx() into a NOP. v3 changes: - Drop unnecessary LOGIN_FLAGS_READ_ACTIVE bit set in isert_rx_login_req() Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-09-09 14:27:21 -07:00
Linus Torvalds	2e515bf096	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "The usual trivial updates all over the tree -- mostly typo fixes and documentation updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits) doc: Documentation/cputopology.txt fix typo treewide: Convert retrun typos to return Fix comment typo for init_cma_reserved_pageblock Documentation/trace: Correcting and extending tracepoint documentation mm/hotplug: fix a typo in Documentation/memory-hotplug.txt power: Documentation: Update s2ram link doc: fix a typo in Documentation/00-INDEX Documentation/printk-formats.txt: No casts needed for u64/s64 doc: Fix typo "is is" in Documentations treewide: Fix printks with 0x%# zram: doc fixes Documentation/kmemcheck: update kmemcheck documentation doc: documentation/hwspinlock.txt fix typo PM / Hibernate: add section for resume options doc: filesystems : Fix typo in Documentations/filesystems scsi/megaraid fixed several typos in comments ppc: init_32: Fix error typo "CONFIG_START_KERNEL" treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks page_isolation: Fix a comment typo in test_pages_isolated() doc: fix a typo about irq affinity ...	2013-09-06 09:36:28 -07:00
Linus Torvalds	7c049d0869	Main batch of InfiniBand/RDMA changes for 3.12 merge window: - Large ocrdma HW driver update: add "fast register" work requests, fixes, cleanups - Add receive flow steering support for raw QPs - Fix IPoIB neighbour race that leads to crash - iSER updates including support for using "fast register" memory registration - IPv6 support for iWARP - XRC transport fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABCAAGBQJSJ2drAAoJEENa44ZhAt0hpaUQAJ6EdNFh2bon9c/uHz1lw58A DIvVniUwWGCRgpbsv/IxZDBVX3G5IKSW6U0Y3/JxMXeIF/cGOUaJZgCeKPiYNoNA yxiEGI0ffvpWGAJUHSE2ARJKvgdjrpqGo5UzsiinEhJx3uczeZBcooosQTWDXxya /qSWH3rARkic9abuaizkuVzdWEjWLNjyh2bWnqJ4HNZplE8RfSK4bk8QtvEmpn9Z dNBKFFujysfHHGflLuSkFAkP1NdqlZeQ4/2uzD23p3YofbJwrJoJNFxkCfx3UUml fjPptlhU6kZMCwSXsD24tAV8Exr/CCgmxriFIN/Xqhu4gvBELScRPPqPqFquhtXI pP5hbG9/9P5C8BLgABe6IAvlU1lUyraR67bA78nYeKm2JpATE6T23Nx7nUgMgzRS Ee3ZvZGZvk8BZ7zyQh8D7Aig1XOtzqA9D5nLUyEJ2aRPSnXJxyi0S3Fbn0vmOMb7 8CF+99KECG8fb4sjNAjX5Zm48+7WJd+WCd3t89oUzCbJKKpa1Chctph8VH7dEfke JkEjOJhuAGqau4bIMZ2nZkISoTfSD7vbjPAxMPASS7fOdR7fe6AqDn9/LMAhWjpw 4Fb25ZW7rYf9+6jqA4MgMRFCTkXhR25FsqUUL9h8NaWai4F1dfVoZHbRvGIZah5n 5HQupXGquwES4y6pvHjc =rabl -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull main batch of InfiniBand/RDMA changes from Roland Dreier: - Large ocrdma HW driver update: add "fast register" work requests, fixes, cleanups - Add receive flow steering support for raw QPs - Fix IPoIB neighbour race that leads to crash - iSER updates including support for using "fast register" memory registration - IPv6 support for iWARP - XRC transport fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (54 commits) RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch IB/iser: Fix redundant pointer check in dealloc flow IB/iser: Fix possible memory leak in iser_create_frwr_pool() IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards RDMA/ocrdma: Fix passing wrong opcode to modify_srq RDMA/ocrdma: Fill PVID in UMC case RDMA/ocrdma: Add ABI versioning support RDMA/ocrdma: Consider multiple SGES in case of DPP RDMA/ocrdma: Fix for displaying proper link speed RDMA/ocrdma: Increase STAG array size RDMA/ocrdma: Dont use PD 0 for userpace CQ DB RDMA/ocrdma: FRMA code cleanup RDMA/ocrdma: For ERX2 irrespective of Qid, num_posted offset is 24 RDMA/ocrdma: Fix to work with even a single MSI-X vector RDMA/ocrdma: Remove the MTU check based on Ethernet MTU RDMA/ocrdma: Add support for fast register work requests (FRWR) RDMA/ocrdma: Create IRD queue fix IB/core: Better checking of userspace values for receive flow steering IB/mlx4: Add receive flow steering support IB/core: Export ib_create/destroy_flow through uverbs ...	2013-09-05 09:39:27 -07:00
Linus Torvalds	27703bb4a6	PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJSJo+1AAoJENkgDmzRrbjxIC4QALJK95o8AUXuwUkl+2fmFkUt hh2/PJ1vDYgk4Xt0J6hyoK7XMa0H1RkbBrROuDdsBnorMFpEsGcgdkUZte9ufoAS 97Bg+7N0KPbTB/S8vOwtW1vbERTJIVPN2uf6h1Wqm9Xc2puCh3HbMMr1AWMGu0WQ NqY5+Zz8zecy1UOrMhEP6H1CjeQcL1w1DO6YM5ydeqlKNzAz+JMfDXriLPDwiE7+ XFPDF/O3Vtd2ckA7L70Lio7hfHwxV5U4WwFVfiwls98XB4jcZqDKIoh1r8z4SRgR +0Rae2DN3BaOabGMr//5XdrzQVpwJTh5m2w8BAOHJvCJ9HR7Sq29UIN4u+TowZBy L2xYo4dvFxkympwu5zEd3c7vHYWKIaqmSq5PIjr4gF/uIo2OeOTrpPIK782ZEYb7 e+qUgOEM05V9AmQZCrSZeP9u474Sj8ow3sCtWxfdRtwNfoEIcUXsNNJd/zDHlVtW cEtXqc2xXIpcuUJQWlSaGp8fmRQjVZPzrLKYLM2m39ZcOOJbf5rzQAYS7hHPosIa SK+YVux/+Zzi+Xo/vXq1OlM/SruCr5S7JOgCxLowoQ88vupgXME6uPyC8EO+QQ50 GsrHes5ZNLbk0uVsfcexIyojkUnyvDmmnDpv+1zdC6RgZLJQn8OXp5yNhHhnhrFT BiHX6YFWtDDqRlVv8Q0F =LeaW -----END PGP SIGNATURE----- Merge tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull PTR_RET() removal patches from Rusty Russell: "PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle" [ There are still some PTR_RET users scattered about, with some of them possibly being new, but most of them existing in Rusty's tree too. We have that #define PTR_RET(p) PTR_ERR_OR_ZERO(p) thing in <linux/err.h>, so they continue to work for now - Linus ] * tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR(). staging/zcache: don't use PTR_RET(). remoteproc: don't use PTR_RET(). pinctrl: don't use PTR_RET(). acpi: Replace weird use of PTR_RET. s390: Replace weird use of PTR_RET. PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. PTR_RET is now PTR_ERR_OR_ZERO	2013-09-04 17:31:11 -07:00
Roland Dreier	82af24ac6f	Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next	2013-09-03 09:01:08 -07:00
Roland Dreier	33ccbd858f	RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch Fix: drivers/infiniband/hw/ocrdma/ocrdma_verbs.c: In function 'ocrdma_build_fr': >> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1832:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] mr = (struct ocrdma_mr *)qp->dev->stag_arr[(hdr->lkey >> 8) & ^ drivers/infiniband/hw/ocrdma/ocrdma_verbs.c: In function 'ocrdma_alloc_frmr': >> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:2661:64: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] dev->stag_arr[(mr->hwmr.lkey >> 8) & (OCRDMA_MAX_STAG - 1)] = (u64) mr; Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-03 09:00:08 -07:00
Sagi Grimberg	2e02d653fe	IB/iser: Fix redundant pointer check in dealloc flow This bug was discovered by Smatch static checker run by Dan Carpenter. If in free_rx_descriptors(), rx_descs are not NULL then the iser device is definately not NULL, so no need to check it before dereferencing it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:26:16 -07:00
Roi Dayan	27ae2d1ea5	IB/iser: Fix possible memory leak in iser_create_frwr_pool() Fix leak where desc is not being freed in error flows. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:24:08 -07:00
Ira Weiny	0318f68521	IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards Commit `36a8f01cd2` ("IB/qib: Add congestion control agent implementation") caused statements to leak pass the header guard. Fix this. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:22:20 -07:00
Naresh Gottumukkala	d7e19c0ad9	RDMA/ocrdma: Fix passing wrong opcode to modify_srq Fix passing wrong opcode to ocrdma_modify_srq and query SRQ. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:18:46 -07:00

1 2 3 4 5 ...

3645 Commits