linux_old1

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	1ea406c0e0	Main batch of InfiniBand/RDMA changes for 3.13: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2 r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3 gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546 yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7 uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy 5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz VDvT9DEy6zjSMCLpMcdo =xxC+ -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma updates from Roland Dreier: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) IB/core: Re-enable create_flow/destroy_flow uverbs IB/core: extended command: an improved infrastructure for uverbs commands IB/core: Remove ib_uverbs_flow_spec structure from userspace IB/core: Use a common header for uverbs flow_specs IB/core: Make uverbs flow structure use names like verbs ones IB/core: Rename 'flow' structs to match other uverbs structs IB/core: clarify overflow/underflow checks on ib_create/destroy_flow IB/ucma: Convert use of typedef ctl_table to struct ctl_table IB/cm: Convert to using idr_alloc_cyclic() IB/mlx5: Fix page shift in create CQ for userspace IB/mlx4: Fix device max capabilities check IB/mlx5: Fix list_del of empty list IB/mlx5: Remove dead code IB/core: Encorce MR access rights rules on kernel consumers IB/mlx4: Fix endless loop in resize CQ RDMA/cma: Remove unused argument and minor dead code RDMA/ucma: Discard events for IDs not yet claimed by user space IB/core: Add Cisco usNIC rdma node and transport types RDMA/nes: Remove self-assignment from nes_query_qp() IB/srp: Report receive errors correctly ...	2013-11-18 15:36:04 -08:00
Roland Dreier	b4fdf52b3f	Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next	2013-11-17 08:22:19 -08:00
Matan Barak	69ad5da41b	IB/core: Re-enable create_flow/destroy_flow uverbs This commit reverts commit `7afbddfae9` ("IB/core: Temporarily disable create_flow/destroy_flow uverbs"). Since the uverbs extensions functionality was experimental for v3.12, this patch re-enables the support for them and flow-steering for v3.13. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:09 -08:00
Yann Droneaud	f21519b23c	IB/core: extended command: an improved infrastructure for uverbs commands Commit `400dbc9658` ("IB/core: Infrastructure for extensible uverbs commands") added an infrastructure for extensible uverbs commands while later commit `436f2ad05a` ("IB/core: Export ib_create/destroy_flow through uverbs") exported ib_create_flow()/ib_destroy_flow() functions using this new infrastructure. According to the commit `400dbc9658`, the purpose of this infrastructure is to support passing around provider (eg. hardware) specific buffers when userspace issue commands to the kernel, so that it would be possible to extend uverbs (eg. core) buffers independently from the provider buffers. But the new kernel command function prototypes were not modified to take advantage of this extension. This issue was exposed by Roland Dreier in a previous review[1]. So the following patch is an attempt to a revised extensible command infrastructure. This improved extensible command infrastructure distinguish between core (eg. legacy)'s command/response buffers from provider (eg. hardware)'s command/response buffers: each extended command implementing function is given a struct ib_udata to hold core (eg. uverbs) input and output buffers, and another struct ib_udata to hold the hw (eg. provider) input and output buffers. Having those buffers identified separately make it easier to increase one buffer to support extension without having to add some code to guess the exact size of each command/response parts: This should make the extended functions more reliable. Additionally, instead of relying on command identifier being greater than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on unused bits in command field: on the 32 bits provided by command field, only 6 bits are really needed to encode the identifier of commands currently supported by the kernel. (Even using only 6 bits leaves room for about 23 new commands). So this patch makes use of some high order bits in command field to store flags, leaving enough room for more command identifiers than one will ever need (eg. 256). The new flags are used to specify if the command should be processed as an extended one or a legacy one. While designing the new command format, care was taken to make usage of flags itself extensible. Using high order bits of the commands field ensure that newer libibverbs on older kernel will properly fail when trying to call extended commands. On the other hand, older libibverbs on newer kernel will never be able to issue calls to extended commands. The extended command header includes the optional response pointer so that output buffer length and output buffer pointer are located together in the command, allowing proper parameters checking. This should make implementing functions easier and safer. Additionally the extended header ensure 64bits alignment, while making all sizes multiple of 8 bytes, extending the maximum buffer size: legacy extended Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) For the purpose of doing proper buffer size accounting, the headers size are no more taken in account in "in_words". One of the odds of the current extensible infrastructure, reading twice the "legacy" command header, is fixed by removing the "legacy" command header from the extended command header: they are processed as two different parts of the command: memory is read once and information are not duplicated: it's making clear that's an extended command scheme and not a different command scheme. The proposed scheme will format input (command) and output (response) buffers this way: - command: legacy header + extended header + command data (core + hw): +----------------------------------------+ \| flags \| 00 00 \| command \| \| in_words \| out_words \| +----------------------------------------+ \| response \| \| response \| \| provider_in_words \| provider_out_words \| \| padding \| +----------------------------------------+ \| \| . <uverbs input> . . (in_words * 8) . \| \| +----------------------------------------+ \| \| . <provider input> . . (provider_in_words * 8) . \| \| +----------------------------------------+ - response, if present: +----------------------------------------+ \| \| . <uverbs output space> . . (out_words * 8) . \| \| +----------------------------------------+ \| \| . <provider output space> . . (provider_out_words * 8) . \| \| +----------------------------------------+ The overall design is to ensure that the extensible infrastructure is itself extensible while begin more reliable with more input and bound checking. Note: The unused field in the extended header would be perfect candidate to hold the command "comp_mask" (eg. bit field used to handle compatibility). This was suggested by Roland Dreier in a previous review[2]. But "comp_mask" field is likely to be present in the uverb input and/or provider input, likewise for the response, as noted by Matan Barak[3], so it doesn't make sense to put "comp_mask" in the header. [1]: http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com [2]: http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com [3]: http://marc.info/?i=525C1149.6000701@mellanox.com Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com [ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:09 -08:00
Yann Droneaud	2490f20be4	IB/core: Remove ib_uverbs_flow_spec structure from userspace The structure holding any types of flow_spec is of no use to userspace. It would be wrong for userspace to do: struct ib_uverbs_flow_spec flow_spec; flow_spec.type = IB_FLOW_SPEC_TCP; flow_spec.size = sizeof(flow_spec); Instead, userspace should use the dedicated flow_spec structure for - Ethernet : struct ib_uverbs_flow_spec_eth, - IPv4 : struct ib_uverbs_flow_spec_ipv4, - TCP/UDP : struct ib_uverbs_flow_spec_tcp_udp. In other words, struct ib_uverbs_flow_spec is a "virtual" data structure that can only be use by the kernel as an alias to the other. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:08 -08:00
Yann Droneaud	b68c956021	IB/core: Make uverbs flow structure use names like verbs ones This patch adds "flow" prefix to most of data structure added as part of commit `436f2ad05a` ("IB/core: Export ib_create/destroy_flow through uverbs") to keep those names in sync with the data structures added in commit `319a441d13` ("IB/core: Add receive flow steering support"). It's just a matter of translating 'ib_flow' to 'ib_uverbs_flow'. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:08 -08:00
Yann Droneaud	d82693dad0	IB/core: Rename 'flow' structs to match other uverbs structs Commit `436f2ad05a` ("IB/core: Export ib_create/destroy_flow through uverbs") added public data structures to support receive flow steering. The new structs are not following the 'uverbs' pattern: they're lacking the common prefix 'ib_uverbs'. This patch replaces ib_kern prefix by ib_uverbs. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:08 -08:00
Matan Barak	f884827438	IB/core: clarify overflow/underflow checks on ib_create/destroy_flow This patch fixes the following issues: 1. Unneeded checks were removed 2. Removed the fixed size out of flow_attr.size, thus simplifying the checks. 3. Remove a 32bit hole on 64bit systems with strict alignment in struct ib_kern_flow_att by adding a reserved field. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:07 -08:00
Joe Perches	f3a5e3e37e	IB/ucma: Convert use of typedef ctl_table to struct ctl_table This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-16 10:32:24 -08:00
Zhao Hongjiang	ab626d1a68	IB/cm: Convert to using idr_alloc_cyclic() Commit `3e6628c4b3` ("idr: introduce idr_alloc_cyclic()") adds a new idr_alloc_cyclic() routine and converts several of these users to it. This is just a missed one - add it. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-16 10:30:42 -08:00
Linus Torvalds	9073e1a804	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual earth-shaking, news-breaking, rocket science pile from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) doc: usb: Fix typo in Documentation/usb/gadget_configs.txt doc: add missing files to timers/00-INDEX timekeeping: Fix some trivial typos in comments mm: Fix some trivial typos in comments irq: Fix some trivial typos in comments NUMA: fix typos in Kconfig help text mm: update 00-INDEX doc: Documentation/DMA-attributes.txt fix typo DRM: comment: `halve' -> `half' Docs: Kconfig: `devlopers' -> `developers' doc: typo on word accounting in kprobes.c in mutliple architectures treewide: fix "usefull" typo treewide: fix "distingush" typo mm/Kconfig: Grammar s/an/a/ kexec: Typo s/the/then/ Documentation/kvm: Update cpuid documentation for steal time and pv eoi treewide: Fix common typo in "identify" __page_to_pfn: Fix typo in comment Correct some typos for word frequency clk: fixed-factor: Fix a trivial typo ...	2013-11-15 16:47:22 -08:00
Eli Cohen	cf1c5e1f1c	IB/mlx5: Fix page shift in create CQ for userspace When creating a CQ, we must use mlx5 adapter page shift. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 14:36:36 -08:00
Eli Cohen	79d3da9c51	IB/mlx4: Fix device max capabilities check Move the check on max supported CQEs after the final number of entries is evaluated. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 14:36:36 -08:00
Eli Cohen	7e2e19210a	IB/mlx5: Remove dead code The value of the local variable index is never used in reg_mr_callback(). Signed-off-by: Eli Cohen <eli@mellanox.com> [ Remove now-unused variable delta too. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 14:36:14 -08:00
Eli Cohen	1c636f8016	IB/core: Encorce MR access rights rules on kernel consumers Enforce the rule that when requesting remote write or atomic permissions, local write must be indicated as well. See IB spec 11.2.8.2. Spotted by: Hagay Abramovsky <hagaya@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 10:25:32 -08:00
Eli Cohen	93b80ac297	IB/mlx4: Fix endless loop in resize CQ When calling get_sw_cqe() we need pass the consumer_index and not the masked value. Failure to do so will cause incorrect result of get_sw_cqe() possibly leading to endless loop. This problem was reported and analyzed by Michael Rice from HP. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 10:24:17 -08:00
Linus Torvalds	2f466d33f5	PCI changes for the v3.13 merge window: Resource management - Fix host bridge window coalescing (Alexey Neyman) - Pass type, width, and prefetchability for window alignment (Wei Yang) PCI device hotplug - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu) Power management - Remove pci_pm_complete() (Liu Chuansheng) MSI - Fail initialization if device is not in PCI_D0 (Yijing Wang) MPS (Max Payload Size) - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang) - Use pcie_set_readrq() to simplify code (Yijing Wang) - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang) SR-IOV - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas) - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang) Virtualization - Add x86 MSI masking ops (Konrad Rzeszutek Wilk) Freescale i.MX6 - Support i.MX6 PCIe controller (Sean Cross) - Increase link startup timeout (Marek Vasut) - Probe PCIe in fs_initcall() (Marek Vasut) - Fix imprecise abort handler (Tim Harvey) - Remove redundant of_match_ptr (Sachin Kamat) Renesas R-Car - Support Gen2 internal PCIe controller (Valentine Barshak) Samsung Exynos - Add MSI support (Jingoo Han) - Turn off power when link fails (Jingoo Han) - Add Jingoo Han as maintainer (Jingoo Han) - Add clk_disable_unprepare() on error path (Wei Yongjun) - Remove redundant of_match_ptr (Sachin Kamat) Synopsys DesignWare - Add irq_create_mapping() (Pratyush Anand) - Add header guards (Seungwon Jeon) Miscellaneous - Enable native PCIe services by default on non-ACPI (Andrew Murray) - Cleanup _OSC usage and messages (Bjorn Helgaas) - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas) - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman) - Remove unused pci_mem_start (Myron Stowe) - Make sysfs functions static (Sachin Kamat) - Warn on invalid return from driver probe (Stephen M. Cameron) - Remove Intel Haswell D3 delays (Todd E Brandt) - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu) - Use pci_is_pcie() to simplify code (Yijing Wang) - Use PCIe capability accessors to simplify code (Yijing Wang) - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang) - Removed unused "is_pcie" from struct pci_dev (Yijing Wang) - Simplify sysfs CPU affinity implementation (Yijing Wang)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJSgUzsAAoJEFmIoMA60/r8wmsQAJhwmtkUYR2L4T1g9smAyjJz bLm5zoC6WdywFcbTpTBfsTrS1CHIQG5akRgkEXGdr99epiho5F2lwmagWsUR4ijL 39Qn3knAUMgtNjoVXXI106h/DfTyxSmkZBfih2AQFyWobJq+0kg7hjQQA3+836b4 8ssWr1+NSl6JJTqYQ0Paw1kSqvvYoXsu5rWFEfCHk8D0s/1bvr5ldAUpk2jTg93I uo9/5+O264yt1YoKZOMqAMZLUfd5DaWY1mV3yeF0Uauy1pBmol5csE8ckqJPDrES PRdJT1+PhBeLYWcgXANOBZsW58ddxA0pQ5jQV6VJHQWsm5cE82OBpYJf6xUZ2moV o6DZ0KRnCPVA3NllYYR16H+wbMfADwwO83QoA+QTIZJy/WgpDH3Cst+m8KePGqbL uFgDdXSws9Bs1BCFs7bfYzAM3OdkBFnn+ac7JoPXKP5ibgAp9nDlurgK2r90zRnp j15vHMx0mV+e8B8/iwiW5eRtg7NoCHYiNfFy7JalOlsPmYr2KFazBVKclp13Hng7 fe/Jy6X4UhWoQPdqsy4ftvSQb0gm1MClxFJeZ3VAt6LY9j8OP6S/Vdf6lpAL85KR lAQoQzB+lOhTPdXxFY2xgGkITkqPDOQMjPfowYUYFwybqBuG6BHXZPJobL+niBlb Nh+M2WlUUA9Z3V6rWJB6 =CTPk -----END PGP SIGNATURE----- Merge tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Resource management - Fix host bridge window coalescing (Alexey Neyman) - Pass type, width, and prefetchability for window alignment (Wei Yang) PCI device hotplug - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu) Power management - Remove pci_pm_complete() (Liu Chuansheng) MSI - Fail initialization if device is not in PCI_D0 (Yijing Wang) MPS (Max Payload Size) - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang) - Use pcie_set_readrq() to simplify code (Yijing Wang) - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang) SR-IOV - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas) - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang) Virtualization - Add x86 MSI masking ops (Konrad Rzeszutek Wilk) Freescale i.MX6 - Support i.MX6 PCIe controller (Sean Cross) - Increase link startup timeout (Marek Vasut) - Probe PCIe in fs_initcall() (Marek Vasut) - Fix imprecise abort handler (Tim Harvey) - Remove redundant of_match_ptr (Sachin Kamat) Renesas R-Car - Support Gen2 internal PCIe controller (Valentine Barshak) Samsung Exynos - Add MSI support (Jingoo Han) - Turn off power when link fails (Jingoo Han) - Add Jingoo Han as maintainer (Jingoo Han) - Add clk_disable_unprepare() on error path (Wei Yongjun) - Remove redundant of_match_ptr (Sachin Kamat) Synopsys DesignWare - Add irq_create_mapping() (Pratyush Anand) - Add header guards (Seungwon Jeon) Miscellaneous - Enable native PCIe services by default on non-ACPI (Andrew Murray) - Cleanup _OSC usage and messages (Bjorn Helgaas) - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas) - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman) - Remove unused pci_mem_start (Myron Stowe) - Make sysfs functions static (Sachin Kamat) - Warn on invalid return from driver probe (Stephen M. Cameron) - Remove Intel Haswell D3 delays (Todd E Brandt) - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu) - Use pci_is_pcie() to simplify code (Yijing Wang) - Use PCIe capability accessors to simplify code (Yijing Wang) - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang) - Removed unused "is_pcie" from struct pci_dev (Yijing Wang) - Simplify sysfs CPU affinity implementation (Yijing Wang)" * tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (79 commits) PCI: Enable upstream bridges even for VFs on virtual buses PCI: Add pci_upstream_bridge() PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() PCI: Warn on driver probe return value greater than zero PCI: Drop warning about drivers that don't use pci_set_master() PCI: Workaround missing pci_set_master in pci drivers powerpc/pci: Use pci_is_pcie() to simplify code [fix] PCI: Update pcie_ports 'auto' behavior for non-ACPI platforms PCI: imx6: Probe the PCIe in fs_initcall() PCI: Add R-Car Gen2 internal PCI support PCI: imx6: Remove redundant of_match_ptr PCI: Report pci_pme_active() kmalloc failure mn10300/PCI: Remove useless pcibios_last_bus frv/PCI: Remove pcibios_last_bus PCI: imx6: Increase link startup timeout PCI: exynos: Remove redundant of_match_ptr PCI: imx6: Fix imprecise abort handler PCI: Fail MSI/MSI-X initialization if device is not in PCI_D0 PCI: imx6: Remove redundant dev_err() in imx6_pcie_probe() x86/PCI: Coalesce multiple overlapping host bridge windows ...	2013-11-14 14:02:00 +09:00
Al Viro	441a9d0e1e	qib_fs: fix (some) dcache abuses * lookup_one_len() really wants i_mutex held on directory. * leaks galore - just mount ipathfs, then cd /sys/bus/pci/drivers/qib_ib; echo ::. >unbind on a box with that card present and try to umount ipathfs... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-13 08:08:19 -05:00
Michal Nazarewicz	352b905635	RDMA/cma: Remove unused argument and minor dead code The dev variable is never assigned after being initialised. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-11 10:46:54 -08:00
Sean Hefty	c6b21824c9	RDMA/ucma: Discard events for IDs not yet claimed by user space Problem reported by Avneesh Pant <avneesh.pant@oracle.com>: It looks like we are triggering a bug in RDMA CM/UCM interaction. The bug specifically hits when we have an incoming connection request and the connecting process dies BEFORE the passive end of the connection can process the request i.e. it does not call rdma_get_cm_event() to retrieve the initial connection event. We were able to triage this further and have some additional information now. In the example below when P1 dies after issuing a connect request as the CM id is being destroyed all outstanding connects (to P2) are sent a reject message. We see this reject message being received on the passive end and the appropriate CM ID created for the initial connection message being retrieved in cm_match_req(). The problem is in the ucma_event_handler() code when this reject message is delivered to it and the initial connect message itself HAS NOT been delivered to the client. In fact the client has not even called rdma_cm_get_event() at this stage so we haven't allocated a new ctx in ucma_get_event() and updated the new connection CM_ID to point to the new UCMA context. This results in the reject message not being dropped in ucma_event_handler() for the new connection request as the (if (!ctx->uid)) block is skipped since the ctx it refers to is the listen CM id context which does have a valid UID associated with it (I believe the new CMID for the connection initially uses the listen CMID -> context when it is created in cma_new_conn_id). Thus the assumption that new events for a connection can get dropped in ucma_event_handler() is incorrect IF the initial connect request has not been retrieved in the first case. We end up getting a CM Reject event on the listen CM ID and our upper layer code asserts (in fact this event does not even have the listen_id set as that only gets set up librdmacm for connect requests). The solution is to verify that the cm_id being reported in the event is the same as the cm_id referenced by the ucma context. A mismatch indicates that the ucma context corresponds to the listen. This fix was validated by using a modified version of librdmacm that was able to verify the problem and see that the reject message was indeed dropped after this patch was applied. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-11 10:45:06 -08:00
$Upinder Malhi $umalhi$$ Upinder Malhi $umalhi$	180771a370	IB/core: Add Cisco usNIC rdma node and transport types This patch adds new rdma node and new rdma transport, and supporting code used by Cisco's low latency driver called usNIC. usNIC uses its own transport, distinct from IB and iWARP. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-09 02:36:25 -08:00
Dave Jones	4127c365c9	RDMA/nes: Remove self-assignment from nes_query_qp() Assigning a value to itself is pointless. Spotted with coverity, no hardware to test. Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-09 02:34:27 -08:00
Bart Van Assche	cd4e38542a	IB/srp: Report receive errors correctly The IB spec does not guarantee that the opcode is available in error completions. Hence do not rely on it. See also commit `948d1e889e` ("IB/srp: Introduce srp_handle_qp_err()"). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> # v3.8 Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:18 -08:00
Bart Van Assche	99b6697a50	IB/srp: Avoid offlining operational SCSI devices If SCSI commands are submitted with a SCSI request timeout that is lower than the the IB RC timeout, it can happen that the SCSI error handler has already started device recovery before transport layer error handling starts. So it can happen that the SCSI error handler tries to abort a SCSI command after it has been reset by srp_rport_reconnect(). Tell the SCSI error handler that such commands have finished and that it is not necessary to continue its recovery strategy for commands that have been reset by srp_rport_reconnect(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:18 -08:00
Vu Pham	65d7dd2f34	IB/srp: Remove target from list before freeing Scsi_Host structure Remove an SRP target from the SRP target list before invoking the last scsi_host_put() call. This change is necessary because that last put frees the memory that holds the srp_target_port structure. This patch prevents the following kernel oops: RIP: 0010:[<ffffffff810b00d0>] __lock_acquire+0x500/0x1570 Call Trace: [<ffffffff810b11e4>] lock_acquire+0xa4/0x120 [<ffffffff81531206>] _spin_lock+0x36/0x70 [<ffffffffa01b6d8f>] srp_remove_work+0xef/0x180 [ib_srp] [<ffffffff8109125c>] worker_thread+0x21c/0x3d0 [<ffffffff81096e86>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 Signed-off-by: Vu Pham <vuhuong@mellanox.com> [ bvanassche - Modified path description and CC'ed stable. ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:17 -08:00
Jack Wang	71444b9781	IB/srp: Add change_queue_depth and change_queue_type support Currently, it's not possible to change queue depth for a device behind SRP host. Sometimes, we need to adjust queue_depth for performance reason (eg storage busy, we need lower queue_depth to avoid running into SCSI error handler), so this patch add support for SRP driver. Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:17 -08:00
Bart Van Assche	4d73f95f70	IB/srp: Make queue size configurable Certain storage configurations, e.g. a sufficiently large array of hard disks in a RAID configuration, need a queue depth above 64 to achieve optimal performance. Hence make the queue depth configurable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Tested-by: Jack Wang <xjtuwjp@gmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:17 -08:00
Bart Van Assche	b81d00bddf	IB/srp: Introduce srp_alloc_req_data() This patch does not change any functionality. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Cc: Roland Dreier <roland@purestorage.com> Cc: Vu Pham <vu@mellanox.com> Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:17 -08:00
Bart Van Assche	848b3082db	IB/srp: Export sgid to sysfs On an initiator system with multiple IB ports it is not yet possible to figure out what the originating port of an SRP connection is. Hence make the source GID available in sysfs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:16 -08:00
Bart Van Assche	a95cadb9da	IB/srp: Add periodic reconnect functionality After a transport layer occurred, periodically try to reconnect to the target until the dev_loss timer expires. Protect the callback functions that can be invoked from inside the SCSI EH against concurrent invocation with srp_reconnect_rport() via the rport mutex. Change the default dev_loss_tmo from 60s into 600s to give the reconnect mechanism a chance to kick in. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:16 -08:00
Bart Van Assche	8c64e4531c	scsi_transport_srp: Add periodic reconnect support Add support for periodically reconnecting to an SRP target until the dev_loss timer expires. After the tenth reconnection attempt, gradually slow down subsequent reconnect attempts. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:16 -08:00
Bart Van Assche	c1120f8981	IB/srp: Start timers if a transport layer error occurs Start the reconnect timer, fast_io_fail timer and dev_loss timers if a transport layer error occurs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:15 -08:00
Bart Van Assche	ed9b2264fb	IB/srp: Use SRP transport layer error recovery Enable fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP initiator. Add kernel module parameters that allow to specify default values for these parameters. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:15 -08:00
Bart Van Assche	9dd69a600a	IB/srp: Keep rport as long as the IB transport layer Keep the rport data structure around after srp_remove_host() has finished until cleanup of the IB transport layer has finished completely. This is necessary because later patches use the rport pointer inside the queuecommand callback. Without this patch accessing the rport from inside a queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback could get invoked after srp_remove_host() has finished. In other words, without this patch the queuecommand callback can get invoked after the rport data structure has been freed. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:15 -08:00
Vu Pham	7bb312e4a2	IB/srp: Make transport layer retry count configurable Allow the InfiniBand RC retry count to be configured by the user as an option in the target login string. Reducing this retry count allows to reduce the path failover time. Signed-off-by: Vu Pham <vu@mellanox.com> [ bvanassche: Rewrote patch description / changed default retry count ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:15 -08:00
Mike Marciniszyn	2fadd83184	IB/qib: Fix txselect regression Commit 7fac33014f54("IB/qib: checkpatch fixes") was overzealous in removing a simple_strtoul for a parse routine, setup_txselect(). That routine is required to handle a multi-value string. Unwind that aspect of the fix. Cc: <stable@vger.kernel.org> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:12 -08:00
Mike Marciniszyn	78a5886472	IB/qib: Fix checkpatch __packed warnings Convert __attribute__ ((packed)) to __packed. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:12 -08:00
Jan Kara	603e772992	IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() qib_user_sdma_queue_pkts() gets called with mmap_sem held for writing. Except for get_user_pages() deep down in qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even more interestingly the function qib_user_sdma_queue_pkts() (and also qib_user_sdma_coalesce() called somewhat later) call copy_from_user() which can hit a page fault and we deadlock on trying to get mmap_sem when handling that fault. So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and leave mmap_sem locking for mm. This deadlock has actually been observed in the wild when the node is under memory pressure. Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:11 -08:00
Jan Kara	4adcf7fb67	IB/ipath: Convert ipath_user_sdma_pin_pages() to use get_user_pages_fast() ipath_user_sdma_queue_pkts() gets called with mmap_sem held for writing. Except for get_user_pages() deep down in ipath_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even more interestingly the function ipath_user_sdma_queue_pkts() (and also ipath_user_sdma_coalesce() called somewhat later) call copy_from_user() which can hit a page fault and we deadlock on trying to get mmap_sem when handling that fault. So just make ipath_user_sdma_pin_pages() use get_user_pages_fast() and leave mmap_sem locking for mm. This deadlock has actually been observed in the wild when the node is under memory pressure. Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> [ Merged in fix for call to get_user_pages_fast from Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:11 -08:00
Naresh Gottumukkala	d5e3f37833	RDMA/ocrdma: Remove redundant check in ocrdma_build_fr() Remove the redundant check of comparing if a 32-bit value is greater than 0xffffffffULL. Reported by Dan Carpenter. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:06 -08:00
Naresh Gottumukkala	1852d1da3b	RDMA/ocrdma: Fix a crash in rmmod 1) ocrdma_remove_free() is called from a call_rcu callback funtion context, which can be a bottom-half context. So the code in ocrdma_remove_free should not sleep. But ocrdma_cleanup_hw() can sleep, So move it ocrdma_remove() instead of ocrdma_remove_free. 2) Fix a couple of kbuild test robot warnings. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:06 -08:00
Dan Carpenter	6ebacdfc07	RDMA/ocrdma: Silence an integer underflow warning We recently added a cap on "max_wqe_allocated" in `43a6b4025c` ('RDMA/ocrdma: Create IRD queue fix'). My static checker complains that the cap has a problem because it casts large values to negative. "attrs->cap.max_send_wr" is a u32. It comes from the user, but it's capped in ocrdma_check_qp_params() so it can't wrap here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:05 -08:00
Eli Cohen	1b77d2bd75	mlx5: Use enum to indicate adapter page size The Connect-IB adapter has an inherent page size which equals 4K. Define an new enum that equals the page shift and use it instead of using the value 12 throughout the code. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:01 -08:00
Eli Cohen	c2a3431e61	IB/mlx5: Update opt param mask for RTS2RTS RTS to RTS transition should allow update of alternate path. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:01 -08:00
Eli Cohen	07c9113fe8	IB/mlx5: Remove "Always false" comparison mlx5_cur and mlx5_new cannot have negative values so remove the redundant condition. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:01 -08:00
Eli Cohen	2d036fad94	IB/mlx5: Remove dead code in mr.c In mlx5_mr_cache_init() the size variable is not used so remove it to avoid compiler warnings when running with make W=1. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:00 -08:00
Eli Cohen	bf0bf77f65	mlx5: Support communicating arbitrary host page size to firmware Connect-IB firmware requires 4K pages to be communicated with the driver. This patch breaks larger pages to 4K units to enable support for architectures utilizing larger page size, such as PowerPC. This patch also fixes several places that referred to PAGE_SHIFT instead of explicit 12 which is the inherent page shift on Connect-IB. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:00 -08:00
Moshe Lazer	cfd8f1d49b	IB/mlx5: Fix srq free in destroy qp On destroy QP the driver walks over the relevant CQ and removes CQEs reported for the destroyed QP. It also frees the related SRQ entry without checking that this is actually an SRQ-related CQE. In case of a CQ used for both send and receive QP, we could free SRQ entries for send CQEs. This patch resolves this issue by verifying that this is a SRQ related CQE by checking the SRQ number in the CQE is not zero. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00
Eli Cohen	1faacf82df	IB/mlx5: Simplify mlx5_ib_destroy_srq Make use of destroy_srq_kernel() to clear SRQ resouces. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00
Eli Cohen	9641b74ebe	IB/mlx5: Fix overflow check in IB_WR_FAST_REG_MR Make sure not to overflow when reading the page list from struct ib_fast_reg_page_list. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00

1 2 3 4 5 ...

3606 Commits