linux

Commit Graph

Author	SHA1	Message	Date
David Miller	e55684fadb	infiniband: nes: Convert nes_addr_resolve_neigh() over to dst_neigh_lookup(). Now we must provide the IP destination address, and a reference has to be dropped when we're done with the entry. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-25 21:30:37 -05:00
David Miller	64b7007eb9	infiniband: cxgb4: Convert import_ep() over to dst_neigh_lookup(). Now we must provide the IP destination address, and a reference has to be dropped when we're done with the entry. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-25 21:30:37 -05:00
David Miller	02b619555a	infiniband: Convert dst_fetch_ha() over to dst_neigh_lookup(). Now we must provide the IP destination address, and a reference has to be dropped when we're done with the entry. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-25 21:30:36 -05:00
Linus Torvalds	f59e842fc0	Merge branch 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending * 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: ib_srpt: Initial SRP Target merge for v3.3-rc1	2012-01-18 16:29:42 -08:00
Rusty Russell	90ab5ee941	module_param: make bool parameters really bool (drivers & misc) module_param(bool) used to counter-intuitively take an int. In `fddd5201` (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-01-13 09:32:20 +10:30
Rusty Russell	69116f279a	module_param: avoid bool abuse, add bint for special cases. For historical reasons, we allow module_param(bool) to take an int (or an unsigned int). That's going away. A few drivers really want an int: they set it to -1 and a parameter will set it to 0 or 1. This sucks: reading them from sysfs will give 'Y' for both -1 and 1, but if we change it to an int, then the users might be broken (if they did "param" instead of "param=1"). Use a new 'bint' parser for them. (ntfs has a different problem: it needs an int for debug_msgs because it's also exposed via sysctl.) Cc: Steve Glendinning <steve.glendinning@smsc.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Guenter Roeck <guenter.roeck@ericsson.com> Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Cc: Christoph Raisch <raisch@de.ibm.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux390@de.ibm.com Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: lm-sensors@lm-sensors.org Cc: linux-rdma@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: alsa-devel@alsa-project.org Acked-by: Takashi Iwai <tiwai@suse.de> (For the sound part) Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> (For the hwmon driver) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-01-13 09:32:17 +10:30
Linus Torvalds	48fa57ac2c	infiniband changes for 3.3 merge window -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCAAGBQJPBQQDAAoJEENa44ZhAt0hy1EP/A/dz741mEd2QZxYHgloK8XU sSoHvq+vDxHnOvBrDuaHXT47FoY+OSVE+ESeJJQJ9L+B6g3yacP3hNSIcguXFs8Z v011AZAeRvQx3bLu5R9+eDL+YTyotkR0sl/huoLkSwlqrEqGA85eLqf5RSQdxYZf iC1ZXfg0KTrtb6rBvohNcijpmIVEe83SWfnD/ZuCGuWq++DyVJxzECnR7p5D8a9q eMJEnIKVEIpqkqXrPQr/blVSfGQL54QuUdYtoKAS8ZW6BzjIwCGUmKdoT1vaNqX5 sIntxXMcgIgE2r0y/nDK+QIFS4U784eUevIC/LeunbhWUEQX05f3l6+V566/T9hX lvp5M6aonsSSvtqrVi6SF5rvSHFlwPpvAY3+jhjXKLpZ5OxMqf/ZlTN1xN4bin1A whGnznU+51Tjzph6Or8iXo5yExDUQhowX1Z3CYDmh/UqzKHRqaFAuiC071r8GZW3 BEOV9yf/+qPsgtXAiO4jSKlLrOJbMgEI4BoITXTO9HvZH9dHGXDYLvULdHDmFaBi XLg5zcAjou24855miv/gnBQzDc0NWW184BGS9hPE9zmbQlJr7gA4zI0Eggtj+3MO 7z/SLTrxKSfjZJR8Z3cGsnBjCs1VFqV+YQnTkyZYLORLf4F3RbDLe6aJQ+9WBA1g 86J11MjrG30erg3gbXun =8ZmW -----END PGP SIGNATURE----- Merge tag 'infiniband-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband infiniband changes for 3.3 merge window * tag 'infiniband-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: rdma/core: Fix sparse warnings RDMA/cma: Fix endianness bugs RDMA/nes: Fix terminate during AE RDMA/nes: Make unnecessarily global nes_set_pau() static RDMA/nes: Change MDIO bus clock to 2.5MHz IB/cm: Fix layout of APR message IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE IB/qib: Default some module parameters optimally IB/qib: Optimize locking for get_txreq() IB/qib: Fix a possible data corruption when receiving packets IB/qib: Eliminate 64-bit jiffies use IB/qib: Fix style issues IB/uverbs: Protect QP multicast list	2012-01-08 14:05:48 -08:00
Linus Torvalds	972b2c7199	Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits) reiserfs: Properly display mount options in /proc/mounts vfs: prevent remount read-only if pending removes vfs: count unlinked inodes vfs: protect remounting superblock read-only vfs: keep list of mounts for each superblock vfs: switch ->show_options() to struct dentry * vfs: switch ->show_path() to struct dentry * vfs: switch ->show_devname() to struct dentry * vfs: switch ->show_stats to struct dentry * switch security_path_chmod() to struct path * vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb vfs: trim includes a bit switch mnt_namespace ->root to struct mount vfs: take /proc//mounts and friends to fs/proc_namespace.c vfs: opencode mntget() mnt_set_mountpoint() vfs: spread struct mount - remaining argument of next_mnt() vfs: move fsnotify junk to struct mount vfs: move mnt_devname vfs: move mnt_list to struct mount vfs: switch pnode.h macros to struct mount ...	2012-01-08 12:19:57 -08:00
Roland Dreier	1583676d9e	Merge branches 'cma', 'misc', 'mlx4', 'nes', 'qib' and 'uverbs' into for-next	2012-01-04 09:18:20 -08:00
Sean Hefty	c89d1bedf8	rdma/core: Fix sparse warnings Clean up sparse warnings in the rdma core layer. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-04 09:17:45 -08:00
Sean Hefty	46ea5061c7	RDMA/cma: Fix endianness bugs Fix endianness bugs reported by sparse in the RDMA core stack. Note that these are real bugs, but don't affect any existing code to the best of my knowledge. The mlid issue would only affect kernel users of rdma_join_multicast which have the rdma_cm attach/detach its QP. There are no current in tree users that do this. (rdma_join_multicast may be used called by user space applications, which does not have this issue.) And the pkey setting is simply returned as informational. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-04 09:13:52 -08:00
Tatyana Nikolova	196f40c846	RDMA/nes: Fix terminate during AE Fix for reset which happens right after sending a terminate message. Terminate timer is not deleted when the connection is closed. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-04 09:12:39 -08:00
Tatyana Nikolova	b0fda90f2a	RDMA/nes: Make unnecessarily global nes_set_pau() static Warned about by sparse. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-04 09:07:24 -08:00
Tatyana Nikolova	30b7e117af	RDMA/nes: Change MDIO bus clock to 2.5MHz Change the PHY clock divisor to make the MDIO clock 2.5MHz, instead of 3.5MHz (which is out of spec). Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-04 09:02:15 -08:00
Eli Cohen	6f233d300d	IB/cm: Fix layout of APR message Add a missing 16-bit reserved field between ap_status and info fields. Signed-off-by: Eli Cohen <eli@mellanox.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-03 21:04:18 -08:00
Or Gerlitz	9106c41069	IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE For IBoE, SLs 0-7 are mapped to Ethernet 802.1Q user priority bits (pbits) which are part of the VLAN tag, SLs 8-15 are reserved. Under Ethernet, the ConnectX firmware treats (decode/encode) the four bit SL field in various constructs such as QPC / UD WQE / CQE as PPP0 and not as 0PPP. This correlates well to the fact that within the vlan tag the pbits are located in bits 15-13 and not 12-14. The current code wasn't consistent around that area - the encoding was correct for the IBoE QPC.path.schedule_queue field, but was wrong for IBoE CQEs and when MLX header was built. These inconsistencies resulted in wrong SL <--> wire 802.1Q pbits mapping, which is fixed by using SL <--> PPP0 all around the place. Signed-off-by: Oren Duer <oren@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-03 21:00:02 -08:00
Mike Marciniszyn	8d4548f2b7	IB/qib: Default some module parameters optimally Minimize the need for users to have to set module parameters to get good performance. The following two parameters are changed: - rcvhdrcnt to twice the rcvegrcnt - pcie_caps=0x51 The rcvhdrcnt at twice the egrcount allows the preemptive NAK code during reception to function in 100% of the cases rather than a sender jiffies-based timeout. The pcie_caps default of 0x51 will set the proposed MaxPayload and MaxReceiveReqest to 256 and 4096 respectively. The capabilities on the root complex will be used to limit those values. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-03 20:54:01 -08:00
Mike Marciniszyn	4894710951	IB/qib: Optimize locking for get_txreq() The current code locks the QP s_lock, followed by the pending_lock, I guess to to protect against the allocate failing. This patch only locks the pending_lock, assuming that the empty case is an exeception, in which case the pending_lock is dropped, and the original code is executed. This will save a lock of s_lock in the normal case. The observation is that the sdma descriptors will deplete at twice the rate of txreq's, so this should be rare. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-03 20:53:31 -08:00
Ram Vepa	eddfb67525	IB/qib: Fix a possible data corruption when receiving packets Prevent a receive data corruption by ensuring that the write to update the rcvhdrheadn register to generate an interrupt is at the very end of the receive processing. Signed-off-by: Ramkrishna Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-03 20:53:02 -08:00
Mike Marciniszyn	8482d5d1bc	IB/qib: Eliminate 64-bit jiffies use The qib driver makes use of the the 64-bit jiffies API. Code inspection reveals that that version of the API is not really required. This patch converts to use the "normal" jiffies. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-03 20:52:12 -08:00
Mike Marciniszyn	865b64be86	IB/qib: Fix style issues More style issues revealed with checkpatch.pl -f. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-03 20:51:42 -08:00
Eli Cohen	e214a0fe2b	IB/uverbs: Protect QP multicast list Userspace verbs multicast attach/detach operations on a QP are done while holding the rwsem of the QP for reading. That's not sufficient since a reader lock allows more than one reader to acquire the lock. However, multicast attach/detach does list manipulation that can corrupt the list if multiple threads run in parallel. Fix this by acquiring the rwsem as a writer to serialize attach/detach operations. Add idr_write_qp() and put_qp_write() to encapsulate this. This fixes oops seen when running applications that perform multicast joins/leaves. Reported by: Mike Dubman <miked@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-01-03 20:36:48 -08:00
Al Viro	f9ec80061a	infiniband: umode_t noise, including open-coded S_ISDIR() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:55:03 -05:00
Al Viro	587a1f1659	switch ->is_visible() to returning umode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:55 -05:00
Al Viro	2c9ede55ec	switch device_get_devnode() and ->devnode() to umode_t * both callers of device_get_devnode() are only interested in lower 16bits and nobody tries to return anything wider than 16bit anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:55 -05:00
David S. Miller	abb434cb05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-23 17:13:56 -05:00
Roland Dreier	480390c8f3	Merge branches 'cma', 'mlx4' and 'qib' into for-next	2011-12-19 09:19:49 -08:00
Mike Marciniszyn	29d1b16145	IB/qib: Correct sense on freectxts increment and decrement Commit `53ab1c6498` ("IB/qib: Correct nfreectxts for multiple HCAs") reversed the increments and decrements of dd->nfreectxts. Fix it. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-12-19 09:19:34 -08:00
Sean Hefty	04ded16724	RDMA/cma: Verify private data length private_data_len is defined as a u8. If the user specifies a large private_data size (> 220 bytes), we will calculate a total length that exceeds 255, resulting in private_data_len wrapping back to 0. This can lead to overwriting random kernel memory. Avoid this by verifying that the resulting size fits into a u8. Reported-by: B. Thery <benjamin.thery@bull.net> Addresses: <http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2335> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-12-19 09:15:33 -08:00
Bart Van Assche	a42d985bd5	ib_srpt: Initial SRP Target merge for v3.3-rc1 This patch adds the kernel module ib_srpt SCSI RDMA Protocol (SRP) target implementation conforming to the SRP r16a specification for the mainline drivers/target infrastructure. This driver was originally developed by Vu Pham and has been optimized by Bart Van Assche and merged into upstream LIO based on his srpt-lio-4.1 branch here: https://github.com/bvanassche/srpt-lio/commits/srpt-lio-4.1/ This updated patch also contains the following two changes from lio-core-2.6.git/master. One is to fix a bug with 1 >= task->task_sg[] chained mappings in ib_srpt, and the other to convert the configfs control plane to reference IB Port GUID and struct srpt_port directly following mainline v4.x target_core_fabric_configfs.c convertion for ib_srpt to work with rtslib/rtsadmin v2 code. These seperate patches can be found here: ib_srpt: Fix bug with chainged SGLs in srpt_map_sg_to_ib_sge http://www.risingtidesystems.com/git/?p=lio-core-2.6.git;a=commitdiff;h=ea485147563b6555a97dbf811825fbb586519252 ib_srpt: Convert se_wwn endpoint reference to struct srpt_port->port_wwn http://www.risingtidesystems.com/git/?p=lio-core-2.6.git;a=commitdiff;h=4e544a210acb227df1bb4ca5086e65bdf4e648ea This also includes the following recent v1 -> v2 review changes: ib_srpt: Fix potential out-of-bounds array access ib_srpt: Avoid failed multipart RDMA transfers ib_srpt: Fix srpt_alloc_fabric_acl failure case return value ib_srpt: Update comments to reference $driver/$port layout ib_srpt: Fix sport->port_guid formatting code ib_srpt: Remove legacy use_port_guid_in_session_name module parameter ib_srpt: Convert srp_max_rdma_size into per port configfs attribute ib_srpt: Convert srp_max_rsp_size into per port configfs attribute ib_srpt: Convert srpt_sq_size into per port configfs attribute and v2 -> v3 review changes: ib_srpt: Fix possible race with srp_sq_size in srpt_create_ch_ib ib_srpt: Fix possible race with srp_max_rsp_size in srpt_release_channel_work ib_srpt: Fix up MAX_SRPT_RDMA_SIZE define ib_srpt: Make srpt_map_sg_to_ib_sge() failure case return -EAGAIN ib_srpt: Convert port_guid to use subnet_prefix + interface_id formatting ib_srpt: Make srpt_check_stop_free return kref_put status ib_srpt: Make compilation with BUG=n proceed` ib_srpt: Use new target_core_fabric.h include ib_srpt: Check hex2bin() return code to silence build warning Cc: Bart Van Assche <bvanassche@acm.org> Cc: Roland Dreier <roland@purestorage.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Vu Pham <vu@mellanox.com> Cc: David Dillow <dillowda@ornl.gov> Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>	2011-12-16 06:33:56 +00:00
Jack Morgenstein	8e59d254fe	mlx4_ib: disable SRIOV mode for IB ports (not yet supported) Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:07 -05:00
Jack Morgenstein	f9baff509f	mlx4_core: Add "native" argument to mlx4_cmd and its callers (where needed) For SRIOV, some Hypervisor commands can be executed directly (native = 1). Others should go through the command wrapper flow (for tracking resource usage, for example, or for changing some HCA configurations that slaves need to be notified of). This patch sets the groundwork for this capability -- adding the correct value of "native" in each case. Note that if SRIOV is not activated, this parameter has no effect. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:05 -05:00
Jack Morgenstein	65dab25deb	mlx4: Extanding port_mask functionality Port mask now has additional state. Port can be set as "none". In this case neither the mlx4_en or mlx4_ib drivers take ownership of the port. In multifunction mode there is an option to set the vfs as single ported devices. (in single function mode, both physical ports belong to same function) Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:05 -05:00
Roland Dreier	4af3ce0de0	IB/mlx4: Fix shutdown crash accessing a non-existent bitmap Commit `cfcde11c3d` ("IB/mlx4: Use flow counters on IBoE ports") added code that sets elements of counters[] to -1 if no counter is allocated, but then goes ahead and passes every entry to mlx4_counter_free() on shutdown. This is a bad idea, especially if MLX4_DEV_CAP_FLAG_COUNTERS isn't set so there isn't even an underlying bitmap to free from. Tested-by: Sean Hefty <sean.hefty@intel.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-12-06 10:47:37 -08:00
David Miller	17e6abeec4	infiniband: ipoib: Sanitize neighbour handling in ipoib_main.c Reduce the number of dst_get_neighbour_noref() calls within a single call chain. Primarily by passing the neighbour pointer down to the helper functions. Handle dst_get_neighbour_noref() returning NULL in ipoib_start_xmit() by incrementing the dropped counter and freeing the packet. We don't want it to fall through into the ARP/RARP/multicast handling, since that should only happen when skb_dst() is NULL. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>	2011-12-05 15:20:20 -05:00
David Miller	3786cf189f	infiniband: cxgb4: Consolidate 3 copies of the same operation into 1 helper function. Three pieces of code do the same thing, create a l2t entry and then import this information into the c4iw_ep object. Create a helper function and call it from these 3 locations instead. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>	2011-12-05 15:20:20 -05:00
David Miller	40e2bb588f	infiniband: nes: Use dst's neighbour entry. Do this instead of performing a by-hand lookup. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>	2011-12-05 15:20:19 -05:00
David Miller	a4757123ae	cxgb3: Rework t3_l2t_get to take a dst_entry instead of a neighbour. This way we consolidate the RCU locking down into the place where it actually matters, and also we can make the code handle dst_get_neighbour_noref() returning NULL properly. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-05 15:20:19 -05:00
David Miller	51d4597451	infiniband: addr: Consolidate code to fetch neighbour hardware address from dst. IPV4 should do exactly what the IPV6 code does here, which is use the neighbour obtained via the dst entry. And now that the two code paths do the same thing, use a common helper function to perform the operation. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Roland Dreier <roland@purestorage.com>	2011-12-05 15:20:19 -05:00
David Miller	2721745501	net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}. To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>	2011-12-05 15:20:19 -05:00
David S. Miller	b3613118eb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2011-12-02 13:49:21 -05:00
David Miller	596b9b68ef	neigh: Add infrastructure for allocating device neigh privates. netdev->neigh_priv_len records the private area length. This will trigger for neigh_table objects which set tbl->entry_size to zero, and the first instances of this will be forthcoming. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-30 18:46:43 -05:00
Roland Dreier	a493f1a24a	Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-next	2011-11-29 18:01:53 -08:00
Eric Dumazet	580da35a31	IB: Fix RCU lockdep splats Commit `f2c31e32b3` ("net: fix NULL dereferences in check_peer_redir()") forgot to take care of infiniband uses of dst neighbours. Many thanks to Marc Aurele who provided a nice bug report and feedback. Reported-by: Marc Aurele La France <tsi@ualberta.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-29 13:37:11 -08:00
Mike Marciniszyn	3874397c0b	IB/ipoib: Prevent hung task or softlockup processing multicast response This following can occur with ipoib when processing a multicast reponse: BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982] Modules linked in: ... CPU 0: Modules linked in: ... Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5 RIP: 0010:[<ffffffff814ddb27>] [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20 RSP: 0018:ffff8802119ed860 EFLAGS: 00000246 0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299 RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000 R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860 R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib] [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib] [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib] [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0 [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0 [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0 [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib] [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib] [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa] [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad] [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core] [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa] [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa] [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa] [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad] [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad] [<ffffffff81088830>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0 [<ffffffff8108ddf6>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 Coinciding with stack trace is the following message: ib0: ib_address_create failed The code below in ipoib_mcast_join_finish() will note the above failure in the address handle but otherwise continue: ah = ipoib_create_ah(dev, priv->pd, &av); if (!ah) { ipoib_warn(priv, "ib_address_create failed\n"); } else { The while loop at the bottom of ipoib_mcast_join_finish() will attempt to send queued multicast packets in mcast->pkt_queue and eventually end up in ipoib_mcast_send(): if (!mcast->ah) { if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE) skb_queue_tail(&mcast->pkt_queue, skb); else { ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); } My read is that the code will requeue the packet and return to the ipoib_mcast_join_finish() while loop and the stage is set for the "hung" task diagnostic as the while loop never sees a non-NULL ah, and will do nothing to resolve. There are GFP_ATOMIC allocates in the provider routines, so this is possible and should be dealt with. The test that induced the failure is associated with a host SM on the same server during a shutdown. This patch causes ipoib_mcast_join_finish() to exit with an error which will flush the queued mcast packets. Nothing is done to unwind the QP attached state so that subsequent sends from above will retry the join. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Reviewed-by: Gary Leshner <gary.leshner@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-29 13:20:02 -08:00
Mike Marciniszyn	8ee887d74b	IB/qib: Fix over-scheduling of QSFP work Don't over-schedule QSFP work on driver initialization. It could end up being run simultaneously on two different CPUs resulting in bad EEPROM reads. In combination with setting the physical IB link state prior to the IBC being brought out of reset, this can cause the link state machine to start training early with wrong settings. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-28 12:17:33 -08:00
Kumar Sanghvi	01b225e18f	RDMA/cxgb4: Fix retry with MPAv1 logic for MPAv2 Fix logic so that we don't retry with MPAv1 once we have done that already. Otherwise, we end up retrying with MPAv1 even when its not needed on getting peer aborts - and this could lead to kernel panic. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-28 11:58:07 -08:00
Jonathan Lallinger	c34c97ad8c	RDMA/cxgb4: Fix iw_cxgb4 count_rcqes() logic Fix another place in the code where logic dealing with the t4_cqe was using the wrong QID. This fixes the counting logic so that it tests against the SQ QID instead of the RQ QID when counting RCQES. Signed-off by: Jonathan Lallinger <jonathan@ogc.us> Signed-off by: Steve Wise <swise@ogc.us> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-28 11:53:05 -08:00
Alexey Dobriyan	4e3fd7a06d	net: remove ipv6_addr_copy() C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-22 16:43:32 -05:00
David S. Miller	9ca36f7db2	infiniband: Update net drivers for netdev_features_t changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 18:05:50 -05:00
Mike Marciniszyn	042f36e156	IB/qib: Don't use schedule_work() It was mistakenly introduced by `dde05cbdf8` ("IB/qib: Hold links until tuning data is available"). Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-08 10:37:53 -08:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Roland Dreier	b8108d6886	Merge branches 'iser', 'mthca' and 'qib' into for-next	2011-11-04 09:36:04 -07:00
Mike Marciniszyn	30ab7e230b	IB/qib: Fix panic in RC error flushing logic The following panic can occur when flushing a QP: RIP: 0010:[<ffffffffa0168e8b>] [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib] RSP: 0018:ffff8803cdc6fc90 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff8803d84ba000 RCX: 0000000000000000 RDX: 0000000000000005 RSI: ffffc90015a53430 RDI: ffff8803d84ba000 RBP: ffff8803cdc6fce0 R08: ffff8803cdc6fc90 R09: 0000000000000001 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8803d84ba0c0 R13: ffff8803d84ba5cc R14: 0000000000000800 R15: 0000000000000246 FS: 0000000000000000(0000) GS:ffff880036600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000034 CR3: 00000003e44f9000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qib/0 (pid: 1350, threadinfo ffff8803cdc6e000, task ffff88042728a100) Stack: 53544c5553455201 0000000100000005 0000000000000000 ffff8803d84ba000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000001 ffff8803cdc6fd30 ffffffffa0165d7a Call Trace: [<ffffffffa0165d7a>] qib_make_rc_req+0x36a/0xe80 [ib_qib] [<ffffffffa0165a10>] ? qib_make_rc_req+0x0/0xe80 [ib_qib] [<ffffffffa01698b3>] qib_do_send+0xf3/0xb60 [ib_qib] [<ffffffff814db757>] ? thread_return+0x4e/0x777 [<ffffffffa01697c0>] ? qib_do_send+0x0/0xb60 [ib_qib] [<ffffffff81088bf0>] worker_thread+0x170/0x2a0 [<ffffffff8108e530>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81088a80>] ? worker_thread+0x0/0x2a0 [<ffffffff8108e1c6>] kthread+0x96/0xa0 [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffff8108e130>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 RIP [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib] The RC error state flush logic in qib_make_rc_req() could return all of the acked wqes and potentially have emptied the queue. It would then unconditionally try return a flush completion via qib_send_complete() for an invalid wqe, or worse a valid one that is not queued. The panic results when the completion code tries to maintain an MR reference count for a NULL MR. This fix modifies logic to only send one completion per qib_make_rc_req() call and changing the completion status from IB_WC_SUCCESS to IB_WC_WR_FLUSH_ERR as the completions progress. The outer loop will call as many times as necessary to flush the queue. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-04 09:35:44 -07:00
Or Gerlitz	52439540ea	IB/iser: DMA unmap TX bufs used for iSCSI/iSER headers The current driver never does DMA unmapping on these buffers. Fix that by adding DMA unmapping to the task cleanup callback, and DMA mapping to the task init function (drop the headers_initialized micro-optimization). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-04 09:32:44 -07:00
Or Gerlitz	2c4ce60934	IB/iser: Use separate buffers for the login request/response The driver counted on the transactional nature of iSCSI login/text flows and used the same buffer for both the request and the response. We also went further and did DMA mapping only once, with DMA_FROM_DEVICE, which violates the DMA mapping API. Fix that by using different buffers, one for requests and one for responses, and use the correct DMA mapping direction for each. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-04 09:30:52 -07:00
Roland Dreier	e4221314a5	IB/mthca: Fix buddy->num_free allocation size The num_free field of mthca_buddy has a type of array of unsigned int while it was allocated as an array of pointers. On 64-bit platforms this allocates twice more than required. Fix this by allocating the correct size for the type. This is the same bug just fixed in mlx4 by Eli Cohen <eli@mellanox.co.il>. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-03 17:48:25 -07:00
Linus Torvalds	f470f8d4e7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits) mlx4_core: Deprecate log_num_vlan module param IB/mlx4: Don't set VLAN in IBoE WQEs' control segment IB/mlx4: Enable 4K mtu for IBoE RDMA/cxgb4: Mark QP in error before disabling the queue in firmware RDMA/cxgb4: Serialize calls to CQ's comp_handler RDMA/cxgb3: Serialize calls to CQ's comp_handler IB/qib: Fix issue with link states and QSFP cables IB/mlx4: Configure extended active speeds mlx4_core: Add extended port capabilities support IB/qib: Hold links until tuning data is available IB/qib: Clean up checkpatch issue IB/qib: Remove s_lock around header validation IB/qib: Precompute timeout jiffies to optimize latency IB/qib: Use RCU for qpn lookup IB/qib: Eliminate divide/mod in converting idx to egr buf pointer IB/qib: Decode path MTU optimization IB/qib: Optimize RC/UC code by IB operation IPoIB: Use the right function to do DMA unmap pages RDMA/cxgb4: Use correct QID in insert_recv_cqe() RDMA/cxgb4: Make sure flush CQ entries are collected on connection close ...	2011-11-01 10:51:38 -07:00
Roland Dreier	504255f8d0	Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next	2011-11-01 09:37:08 -07:00
Christoph Lameter	bc3e53f682	mm: distinguish between mlocked and pinned pages Some kernel components pin user space memory (infiniband and perf) (by increasing the page count) and account that memory as "mlocked". The difference between mlocking and pinning is: A. mlocked pages are marked with PG_mlocked and are exempt from swapping. Page migration may move them around though. They are kept on a special LRU list. B. Pinned pages cannot be moved because something needs to directly access physical memory. They may not be on any LRU list. I recently saw an mlockalled process where mm->locked_vm became bigger than the virtual size of the process (!) because some memory was accounted for twice: Once when the page was mlocked and once when the Infiniband layer increased the refcount because it needt to pin the RDMA memory. This patch introduces a separate counter for pinned pages and accounts them seperately. Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Mike Marciniszyn <infinipath@qlogic.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-10-31 17:30:46 -07:00
Paul Gortmaker	fec14d2fce	infiniband: add moduleparam.h to drivers/infiniband as required These files were getting the moduleparam infrastructure from the implicit presence of module.h being everywhere, but that is going away soon. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:36 -04:00
Paul Gortmaker	b108d9764c	infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE These were getting it implicitly via device.h --> module.h but we are going to stop that when we clean up the headers. Fix these in advance so the tree remains biscect-clean. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:35 -04:00
Paul Gortmaker	e4dd23d753	infiniband: Fix up module files that need to include module.h They had been getting it implicitly via device.h but we can't rely on that for the future, due to a pending cleanup so fix it now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:35 -04:00
Paul Gortmaker	fc87af74af	infiniband: Fix up users implicitly relying on getting stat.h They get it via module.h (via device.h) but we want to clean that up. When we do, we'll get things like: CC [M] drivers/infiniband/core/sysfs.o sysfs.c:361: error: 'S_IRUGO' undeclared here (not in a function) sysfs.c:654: error: 'S_IWUSR' undeclared here (not in a function) so add in the stat header it is using explicitly in advance. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:34 -04:00
Or Gerlitz	80a2dcd8d0	IB/mlx4: Don't set VLAN in IBoE WQEs' control segment There's no need to set the vlan-related fields in an IBoE send WQE control segment: - the vlan to be used by a UD QP is set in the datagram segment. - for GSI (CM) QP, all the headers down to 8021q and MAC are built by the software anyway. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 11:57:51 -07:00
Or Gerlitz	bcacb89756	IB/mlx4: Enable 4K mtu for IBoE The IBoE port MTU is derived from the corresponding Ethernet netdevice MTU, which can support jumbo frames of 9K, and hence surely supports the max IB mtu of 4K. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 11:55:15 -07:00
Tom Tucker	d32ae393db	RDMA/cxgb4: Mark QP in error before disabling the queue in firmware QPs need to be moved to error before telling the firwmare to shutdown the queue. Otherwise, the application can submit WRs that will never get fetched by the hardware and never flushed by the driver. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swsie@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 11:36:08 -07:00
Kumar Sanghvi	581bbe2cd0	RDMA/cxgb4: Serialize calls to CQ's comp_handler Commit `01e7da6ba5` ("RDMA/cxgb4: Make sure flush CQ entries are collected on connection close") introduced a potential problem where a CQ's comp_handler can get called simultaneously from different places in the iw_cxgb4 driver. This does not comply with Documentation/infiniband/core_locking.txt, which states that at a given point of time, there should be only one callback per CQ should be active. This problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com>. Based on discussion between Parav Pandit and Steve Wise, this patch fixes the above problem by serializing the calls to a CQ's comp_handler using a spin_lock. Reported-by: Parav Pandit <Parav.Pandit@Emulex.Com> Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 11:34:53 -07:00
Kumar Sanghvi	f7cc25d018	RDMA/cxgb3: Serialize calls to CQ's comp_handler iw_cxgb3 has a potential problem where a CQ's comp_handler can get called simultaneously from different places in iw_cxgb3 driver. This does not comply with Documentation/infiniband/core_locking.txt, which states that at a given point of time, there should be only one callback per CQ should be active. Such problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com> for iw_cxgb4 driver. Based on discussion between Parav Pandit and Steve Wise, this patch fixes the above problem by serializing the calls to a CQ's comp_handler using a spin_lock. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 11:33:17 -07:00
Mitko Haralanov	16d99812d5	IB/qib: Fix issue with link states and QSFP cables Fix an issue where the link would come up after replugging a cable even if it has been DISABLED manually. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 10:57:59 -07:00
Linus Torvalds	ec7ae51753	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits) [SCSI] qla4xxx: export address/port of connection (fix udev disk names) [SCSI] ipr: Fix BUG on adapter dump timeout [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer [SCSI] hpsa: change confusing message to be more clear [SCSI] iscsi class: fix vlan configuration [SCSI] qla4xxx: fix data alignment and use nl helpers [SCSI] iscsi class: fix link local mispelling [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA [SCSI] aacraid: use lower snprintf() limit [SCSI] lpfc 8.3.27: Change driver version to 8.3.27 [SCSI] lpfc 8.3.27: T10 additions for SLI4 [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes [SCSI] megaraid_sas: Changelog and version update [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts ...	2011-10-28 16:44:18 -07:00
Marcel Apfelbaum	a5e12dff75	IB/mlx4: Configure extended active speeds Set the extended active speeds based on the hardware configuration. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Hal Rosenstock <hal@mellanox.com> [ Move FDR-10 handling into ib_link_query_port(). - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-28 11:36:16 -07:00
Mitko Haralanov	dde05cbdf8	IB/qib: Hold links until tuning data is available Hold the link state machine until the tuning data is read from the QSFP EEPROM so correct tuning settings are applied before the state machine attempts to bring the link up. Link is also held on cable unplug in case a different cable is used. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 15:08:20 -07:00
Mike Marciniszyn	44d75d3d92	IB/qib: Clean up checkpatch issue This was probably present from initial submission. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 15:08:18 -07:00
Mike Marciniszyn	9fd5473deb	IB/qib: Remove s_lock around header validation Review of qib_ruc_check_hdr() shows that the s_lock is not required in the normal case. The r_lock is held in all cases, and protects the qp fields that are read. The s_lock will be needed to around the call to qib_migrate_qp() to insure that the send engine sees a consistent set of fields. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:57 -07:00
Mike Marciniszyn	d0f2faf72d	IB/qib: Precompute timeout jiffies to optimize latency A new field is added to qib_qp called timeout_jiffies. It is initialized upon create and modify. The field is now used instead of a computation based on qp->timeout. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:56 -07:00
Mike Marciniszyn	af061a644a	IB/qib: Use RCU for qpn lookup The heavy weight spinlock in qib_lookup_qpn() is replaced with RCU. The hash list itself is now accessed via jhash functions instead of mod. The changes should benefit multiple receive contexts in different processors by not contending for the lock just to read the hash structures. The patch also adds a lookaside_qp (pointer) and a lookaside_qpn in the context. The interrupt handler will test the current packet's qpn against lookaside_qpn if the lookaside_qp pointer is non-NULL. The pointer is NULL'ed when the interrupt handler exits. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:54 -07:00
Mike Marciniszyn	9e1c0e4325	IB/qib: Eliminate divide/mod in converting idx to egr buf pointer The context init now saves a shift from rcvegrbufs_perchunk rcvegrbufs_perchunk_shift using ilog2. A BUG_ON() protects the power of 2 assumption. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:52 -07:00
Mike Marciniszyn	cc6ea1385b	IB/qib: Decode path MTU optimization Store both the encoded and decoded MTU in the QP structure as a minor optimization for UC/RC receive routines. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:50 -07:00
Mike Marciniszyn	2fc109c890	IB/qib: Optimize RC/UC code by IB operation The memset for zeroing work completions had been unconditional. This patch removes the memset and moves the zeroing into the work completion with a more explicit field by field set. With this patch, non-ONLY/non-LAST packets will avoid the overhead since they will not generate a completion. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:49 -07:00
Eric Dumazet	9e903e0852	net: add skb frag size accessors To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set\|add\|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-19 03:10:46 -04:00
Dotan Barak	787adb9d6a	IPoIB: Use the right function to do DMA unmap pages Pages that were mapped using ib_dma_map_page() should be unmapped using ib_dma_unmap_page(). Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-18 10:08:31 -07:00
Jonathan Lallinger	e14d62c05c	RDMA/cxgb4: Use correct QID in insert_recv_cqe() When creating flushed receive CQEs, set the QPID field in the t4_cqe to the SQ QID and not the RQ QID. Otherwise the poll code will not find the correct QP context. Signed-off by: Jonathan Lallinger <jonathan@ogc.us> Signed-off by: Steve Wise <swise@ogc.us> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-14 14:23:40 -07:00
Kumar Sanghvi	01e7da6ba5	RDMA/cxgb4: Make sure flush CQ entries are collected on connection close At the time when a peer closes the connection, iw_cxgb4 will not send a cq event if ibqp.uobject exists. In that case, its possible for a user application to get blocked in ibv_get_cq_event(). To resolve this, call the cq's comp_handler to unblock any read from ibv_get_cq_event(). This will trigger userspace to poll the cq and collect flush status completions for any pending work requests. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-14 14:23:04 -07:00
Sean Hefty	42849b2697	RDMA/uverbs: Export ib_open_qp() capability to user space Allow processes that share the same XRC domain to open an existing shareable QP. This permits those processes to receive events on the shared QP and transfer ownership, so that any process may modify the QP. The latter allows the creating process to exit, while a remaining process can still transition it for path migration purposes. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:50:56 -07:00
Sean Hefty	0e0ec7e063	RDMA/core: Export ib_open_qp() to share XRC TGT QPs XRC TGT QPs are shared resources among multiple processes. Since the creating process may exit, allow other processes which share the same XRC domain to open an existing QP. This allows us to transfer ownership of an XRC TGT QP to another process. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:49:51 -07:00
Sean Hefty	0a1405da99	IB/mlx4: Add support for XRC QPs Support the creation of XRC INI and TGT QPs. To handle the case where a CQ or PD is not provided, we allocate them internally with the xrcd. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:44:18 -07:00
Sean Hefty	18abd5ea57	IB/mlx4: Add support for XRC SRQs Allow the user to create XRC SRQs. This patch is based on a patch from Jack Morgenstrein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:43:46 -07:00
Sean Hefty	012a8ff577	IB/mlx4: Add support for XRC domains Support creating and destroying XRC domains. Any sharing of the XRCD is managed above the low-level driver. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:43:03 -07:00
Sean Hefty	2622e18ef4	IB/cm: Do not automatically disconnect XRC TGT QPs Because an XRC TGT QP can end up being shared among multiple processes, don't have the ib_cm automatically send a DREQ when the userspace process that owns the ib_cm_id exits. Disconnect can be initiated by the user directly; otherwise, the owner of the XRC INI QP controls the connection. Note that as a result of the process exiting, the ib_cm will stop tracking the XRC connection on the target side. For the purposes of disconnecting, this isn't a big deal. The ib_cm will respond to the DREQ appropriately. For other messages, mainly LAP, the CM will reject the request, since there's no one available to route the request to. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:42:06 -07:00
Sean Hefty	18c441a6c3	RDMA/cma: Support XRC QPs Allow users to connect XRC QPs through the rdma_cm. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:41:36 -07:00
Sean Hefty	638ef7a6c6	RDMA/ucm: Allow user to specify QP type when creating id Allow the user to indicate the QP type separately from the port space when allocating an rdma_cm_id. With RDMA_PS_IB, there is no longer a 1:1 relationship between the QP type and port space, so we need to switch on the QP type to select between UD and connected QPs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:40:36 -07:00
Sean Hefty	2d2e941529	RDMA/cm: Define new RDMA port space specific to IB Add RDMA_PS_IB. XRC QP types will use the IB port space when operating over the RDMA CM. For the 'IP protocol' field value, we select 0x3F, which is listed as being for 'any local network'. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:39:52 -07:00
Sean Hefty	ef70044647	IB/cm: Update XRC support based on XRC annex errata The XRC annex was updated to have XRC behave more like RD. Specifically, the XRC TGT QPN moves from the local QPN to local EECN field. Lookup of SRQN is done using the REQ/REP protocol. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:38:35 -07:00
Sean Hefty	d26a360b77	IB/cm: Update protocol to support XRC Update the REQ and REP messages to support XRC connection setup according to the XRC Annex. Several existing fields must be set to 0 or 1 when connecting XRC QPs, and a reserved field is changed to an extended transport type. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:37:55 -07:00
Sean Hefty	b93f3c1872	RDMA/uverbs: Export XRC TGT QPs to user space Allow user space to operate on XRC TGT QPs the same way as other types of QPs, with one notable exception: since XRC TGT QPs may be shared among multiple processes, the XRC TGT QP is allowed to exist beyond the lifetime of the creating process. The process that creates the QP is allowed to destroy it, but if the process exits without destroying the QP, then the QP will be left bound to the lifetime of the XRCD. TGT QPs are not associated with CQs or a PD. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:37:07 -07:00
Sean Hefty	9977f4f64b	RDMA/uverbs: Export XRC INI QPs to userspace XRC INI QPs are similar to send only RC QPs. Allow user space to create INI QPs. Note that INI QPs do not require receive CQs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:32:27 -07:00
Sean Hefty	8541f8de05	RDMA/uverbs: Export XRC SRQs to user space We require additional information to create XRC SRQs than we can exchange using the existing create SRQ ABI. Provide an enhanced create ABI for extended SRQ types. Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il> and Roland Dreier <roland@purestorage.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:29:18 -07:00
Sean Hefty	53d0bd1e7f	RDMA/uverbs: Export XRC domains to user space Allow user space to create XRC domains. Because XRCDs are expected to be shared among multiple processes, we use inodes to identify an XRCD. Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:21:24 -07:00
Sean Hefty	d3d72d909e	RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD XRC TGT QPs are intended to be shared among multiple users and processes. Allow the destruction of an XRC TGT QP to be done explicitly through ib_destroy_qp() or when the XRCD is destroyed. To support destroying an XRC TGT QP, we need to track TGT QPs with the XRCD. When the XRCD is destroyed, all tracked XRC TGT QPs are also cleaned up. To avoid stale reference issues, if a user is holding a reference on a TGT QP, we increment a reference count on the QP. The user releases the reference by calling ib_release_qp. This releases any access to the QP from a user above verbs, but allows the QP to continue to exist until destroyed by the XRCD. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:20:27 -07:00
Sean Hefty	b42b63cf0d	RDMA/core: Add XRC QPs XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). XRC communication is between an initiator (INI) QP and a target (TGT) QP. Target QPs are associated with SRQs through an XRCD. An XRC TGT QP behaves like a receive-only RD QP. XRC INI QPs behave similarly to RC QPs, except that work requests posted to an XRC INI QP must specify the remote SRQ that is the target of the work request. We define two new QP types for XRC, to distinguish between INI and TGT QPs, and update the core layer to support XRC QPs. This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:16:19 -07:00
Sean Hefty	418d51307d	RDMA/core: Add XRC SRQ type XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). XRC defines SRQs that are specifically used by XRC connections. Expand the SRQ code to support XRC SRQs. An XRC SRQ is currently restricted to only XRC use according to the IB XRC Annex. Portions of this patch were derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:14:31 -07:00
Sean Hefty	96104eda01	RDMA/core: Add SRQ type field Currently, there is only a single ("basic") type of SRQ, but with XRC support we will add a second. Prepare for this by defining an SRQ type and setting all current users to IB_SRQT_BASIC. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:13:26 -07:00
Sean Hefty	59991f94eb	RDMA/core: Add XRC domain support XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). A few new concepts are introduced to support this. This patch adds: - A new device capability flag, IB_DEVICE_XRC, which low-level drivers set to indicate that a device supports XRC. - A new object type, XRC domains (struct ib_xrcd), and new verbs ib_alloc_xrcd()/ib_dealloc_xrcd(). XRCDs are used to limit which XRC SRQs an incoming message can target. This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-12 10:32:26 -07:00
Marcel Apfelbaum	e36fb88a9a	IPoIB: Handle extended rates in debugfs Use new function ib_rate_to_mbps() to handle printing rate in debugfs, so that we handle extended rates. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-11 11:57:08 -07:00
Marcel Apfelbaum	71eeba161d	IB: Add new InfiniBand link speeds Introduce support for the following extended speeds: FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with 64b/66b encoding rather than 8b/10b encoding. FDR: IBA extended speed 14.0625 Gbps. EDR: IBA extended speed 25.78125 Gbps. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-11 11:53:47 -07:00
Randy Dunlap	3e60a77ea2	IB/ipath: Add missing <linux/stat.h> in ipath_chip_init.c Fix build errors: drivers/infiniband/hw/ipath/ipath_init_chip.c:54:1: error: 'S_IRUGO' undeclared here (not in a function) drivers/infiniband/hw/ipath/ipath_init_chip.c:54:1: error: bit-field '<anonymous>' width not an integer constant drivers/infiniband/hw/ipath/ipath_init_chip.c:67:1: error: 'S_IWUSR' undeclared here (not in a function) drivers/infiniband/hw/ipath/ipath_init_chip.c:67:1: error: bit-field '<anonymous>' width not an integer constant Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-10 12:01:22 -07:00
Faisal Latif	0f0bee8bbc	RDMA/nes: Support for Packed And Unaligned fpdus Support for Packed and Unaligned (PAU) FPDUs is needed for interoperability between NES and non-NES nodes. When the NES hardware detects a PAU frame, it will pass it to the driver to process the frame. NES driver creates a new frame for each FPDU and forwards it to the hardware to be sent to its associated qp. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-10 10:54:47 -07:00
Faisal Latif	6224c7eeff	RDMA/nes: Print IP address for critcal errors Print the IP address of the remote host when a critical asynchronous event is received. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-10 10:51:21 -07:00
Faisal Latif	bab3a9f43f	RDMA/nes: Fix terminate connection Fixes a crash that occurs during close when error async event is received. Terminate message is not sent to the remote node if already processing close. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-10 10:47:44 -07:00
David S. Miller	88c5100c28	Merge branch 'master' of github.com:davem330/net Conflicts: net/batman-adv/soft-interface.c	2011-10-07 13:38:43 -04:00
Ian Campbell	5d6bcdfe38	net: use DMA_x_DEVICE and dma_mapping_error with skb_frag_dma_map When I converted some drivers from pci_map_page to skb_frag_dma_map I neglected to convert PCI_DMA_xDEVICE into DMA_x_DEVICE and pci_dma_mapping_error into dma_mapping_error. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-06 16:17:20 -04:00
Tatyana Nikolova	615eb715ae	RDMA/nes: Add support for MPAv2 Enhanced RDMA Negotiation This patch adds support for Enhanced RDMA Connection Establishment (draft-ietf-storm-mpa-peer-connect-06), aka MPAv2. Details of draft can be obtained from: <http://www.ietf.org/id/draft-ietf-storm-mpa-peer-connect-06.txt> For backwards compatibility, the MPAv2 enabled driver reverts to MPAv1 if the remote node doesn't support MPAv2. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:39:45 -07:00
Kumar Sanghvi	d2fe99e86b	RDMA/cxgb4: Add support for MPAv2 Enhanced RDMA Negotiation This patch adds support for Enhanced RDMA Connection Establishment (draft-ietf-storm-mpa-peer-connect-06), aka MPAv2. Details of draft can be obtained from: <http://www.ietf.org/id/draft-ietf-storm-mpa-peer-connect-06.txt> The patch updates the following functions for initiator perspective: - send_mpa_request - process_mpa_reply - post_terminate for TERM error codes - destroy_qp for TERM related change - adds layer/etype/ecode to c4iw_qp_attrs for sending with TERM - peer_abort for retrying connection attempt with MPA_v1 message - added c4iw_reconnect function The patch updates the following functions for responder perspective: - process_mpa_request - send_mpa_reply - c4iw_accept_cr - passes ird/ord to upper layers Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:39:24 -07:00
Kumar Sanghvi	56da00fc92	RDMA/{amso1100,cxgb3}: Minimal MPAv2 support As part of MPAv2 Enhanced RDMA Negotiation, pass max supported ird/ord values upwards for the time being in iw_cxgb3 and amso1100. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:39:01 -07:00
Kumar Sanghvi	3ebeebc38b	RDMA/iwcm: Propagate ird/ord values upwards Update struct iw_cm_event to support propagating the ird/ord values upwards to the application. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:37:52 -07:00
Mike Marciniszyn	53ab1c6498	IB/qib: Correct nfreectxts for multiple HCAs The code that was recently introduced to report the number of free contexts is flawed for multiple HCAs: /* Return the number of free user ports (contexts) available. */ return scnprintf(buf, PAGE_SIZE, "%u\n", dd->cfgctxts - dd->first_user_ctxt - (u32)qib_stats.sps_ctxts); The qib_stats is global to the module, not per HCA, so the code is broken for multiple HCAs. This patch adds a qib_devdata field, freectxts, that reflects the free contexts for this HCA. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:35 -07:00
Julia Lawall	e2e435f290	RDMA/nes: Add missing calls to ib_umem_release() Add calls to ib_umem_release(), as in the other error-handling code in nes_reg_user_mr(). Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:24 -07:00
Hefty, Sean	caf6e3f221	RDMA/ucm: Removed checks for unsigned value < 0 cmd is unsigned, no need to check for < 0. Found by code inspection. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:05 -07:00
Hefty, Sean	b7ab0b19a8	IB/mad: Verify mgmt class in received MADs If a received MAD contains an invalid or reserved mgmt class, we will attempt to access method_table outside of its range. Add a check to ensure that mgmt class falls within the handled range. Found by code inspection. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:05 -07:00
Hefty, Sean	f45ee80eb0	RDMA/cma: Check for NULL conn_param in rdma_accept Check that conn_param is not null before dereferencing it when processing rdma_accept(). This is necessary to prevent a possible system crash, which can be caused by user space. Problem found by code inspection. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:04 -07:00
Yong Zhang	10889a3643	IB/ehca: Remove IRQF_DISABLED, since it's a no-op Since commit `e58aa3d2d0` ("genirq: Run irq handlers with interrupts disabled"), we run all interrupt handlers with interrupts disabled and we even check and yell when an interrupt handler returns with interrupts enabled -- cf commit `b738a50a20` ("genirq: Warn when handler enables interrupts"). So now this flag is a no-op and can be removed. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:04 -07:00
Steve Wise	9efe10a1e1	RDMA/cxgb4: Fail RDMA initialization for unsupported cards The iw_cxgb4 module crashes at init time if the T4 card does not support RDMA. So clean up the init logic to correctly deal with non-RDMA cards. - If any RDMA resources are not available, then fail the initialization logging an info message. - Clean up properly on initialization failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:32:44 -07:00
Hefty, Sean	9595480c5d	RDMA/cma: Fix crash in cma_req_handler The RDMA CM uses the local qp_type to determine how to process an incoming request. This can result in an incoming REQ being treated as a SIDR REQ and vice versa. Fix this by switching off the event type instead, and for good measure verify that the listener supports the incoming connection request. This problem showed up when a user space application mismatched the QP types between a client and server app. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:32:33 -07:00
Andy Shevchenko	2be6053318	RDMA/amso1100: Use '%pM' format option to print MAC Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:32:24 -07:00
Neil Horman	e48f129c2f	[SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference This oops was reported recently: d:mon> e cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120] pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3] lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i] sp: c0000000fd4c73a0 msr: 8000000000009032 dar: 0 dsisr: 40000000 current = 0xc0000000fd640d40 paca = 0xc00000000054ff80 pid = 5085, comm = iscsid d:mon> t [c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i] [c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi] [c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18 [scsi_transport_iscsi2] [c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4 [c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c [c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c [c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8 [c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac [c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c [c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 00000080da560cfc The root cause was an EEH error, which sent us down the offload_close path in the cxgb3 driver, which in turn sets cdev->l2opt to NULL, without regard for upper layer driver (like the cxgbi drivers) which might have execution contexts in the middle of its use. The result is the oops above, when t3_l2t_get attempts to dereference L2DATA(cdev)->nentries in arp_hash right after the EEH error handler sets it to NULL. The fix is to prevent the setting of the NULL pointer until after there are no further users of it. The t3cdev->l2opt pointer is now converted to be an rcu pointer and the L2DATA macro is now called under the protection of the rcu_read_lock(). When the EEH error path: t3_adapter_error->offload_close->cxgb3_offload_deactivate Is exectured, setting of that l2opt pointer to NULL, is now gated on an rcu quiescence point, preventing, allowing L2DATA callers to safely check for a NULL pointer without concern that the underlying data will be freeded before the pointer is dereferenced. This has been tested by the reporter and shown to fix the reproted oops [nhorman: fix up unitinialised variable reported by Dan Carpenter] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Karen Xie <kxie@chelsio.com> Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-09-26 09:28:01 -05:00
David S. Miller	8decf86879	Merge branch 'master' of github.com:davem330/net Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c	2011-09-22 03:23:13 -04:00
Mike Christie	f27fb2ef7b	[SCSI] iscsi class: sysfs group is_visible callout for iscsi host attrs The iscsi class currently does not support writable sysfs attrs for LLD sysfs settings. This patch converts the iscsi class and driver's host attrs to use the attribute container sysfs group and the sysfs group's is_visible callout to be able to support readable or writable sysfs attrs. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-08-27 08:36:14 -06:00
Mike Christie	1d063c1729	[SCSI] iscsi class: sysfs group is_visible callout for session attrs The iscsi class currently does not support writable sysfs attrs for LLD sysfs settings. This patch converts the iscsi class and driver's session attrs to use the attribute container sysfs group and the sysfs group's is_visible callout to be able to support readable or writable sysfs attrs. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-08-27 08:36:06 -06:00
Mike Christie	3128c6c73c	[SCSI] iscsi cls: sysfs group is_visible callout for conn attrs The iscsi class currently does not support writable sysfs attrs for LLD sysfs settings. This patch converts the iscsi class and drivers to use the attribute container sysfs group and the sysfs group's is_visible callout to be able to support readable or writable sysfs attrs. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-08-27 08:36:03 -06:00
Ian Campbell	5581be3b48	IPoIB: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-26 12:38:42 -04:00
Ian Campbell	cf383ebb13	IB: nes: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Faisal Latif <faisal.latif@intel.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-26 12:38:42 -04:00
Ian Campbell	70b1052ad2	IB: amso1100: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tom Tucker <tom@opengridcomputing.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-26 12:38:42 -04:00
Jiri Pirko	afc4b13df1	net: remove use of ndo_set_multicast_list in drivers replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:22:03 -07:00
Roland Dreier	80b43de837	Merge branches 'ipoib' and 'iser' into for-next	2011-08-17 10:57:43 -07:00
Or Gerlitz	200ae1a08b	IB/iser: Support iSCSI PDU padding RFC3270 mandates that iSCSI PDUs are padded to the closest integer number of four byte words. Fix the iser code to support that on both the TX/RX flows. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-08-17 09:45:07 -07:00
Or Gerlitz	0ace64b85e	IBiser: Fix wrong mask when sizeof (dma_addr_t) > sizeof (unsigned long) The code that prepares the SG associated with SCSI command for FMR was buggy for systems with DMA addresses that don't fit in unsigned long, e.g under the 32-bit based XenServer dom0 sizeof(dma_addr_t) is 8. Fix that by casting to unsigned long long a masking constant used by the code. This resolves a crash in iser_sg_to_page_vec on this system. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-08-17 09:40:55 -07:00
Bernd Schubert	22cfb0bf67	IPoIB: Fix possible NULL dereference in ipoib_start_xmit() Fix a bug introduced in `69cce1d140` ("net: Abstract dst->neighbour accesses behind helpers.") where we might dereference skb_dst(skb) even if it is NULL, which causes: [ 240.944030] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 [ 240.948007] IP: [<ffffffffa0366ce9>] ipoib_start_xmit+0x39/0x280 [ib_ipoib] [...] [ 240.948007] Call Trace: [ 240.948007] <IRQ> [ 240.948007] [<ffffffff812cd5e0>] dev_hard_start_xmit+0x2a0/0x590 [ 240.948007] [<ffffffff8131f680>] ? arp_create+0x70/0x200 [ 240.948007] [<ffffffff812e8e1f>] sch_direct_xmit+0xef/0x1c0 Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=41212 Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-08-16 10:19:20 -07:00
David S. Miller	af3dcd2f44	mlx4: Fix infiniband Kconfig dependencies. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-11 23:05:05 -07:00
Jeff Kirsher	f7917c009c	chelsio: Move the Chelsio drivers Moves the drivers for the Chelsio chipsets into drivers/net/ethernet/chelsio/ and the necessary Kconfig and Makefile changes. CC: Divy Le Ray <divy@chelsio.com> CC: Dimitris Michailidis <dm@chelsio.com> CC: Casey Leedom <leedom@chelsio.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2011-08-10 19:54:52 -07:00
Linus Torvalds	91d41fdf31	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage iscsi-target: Add iSCSI fabric support for target v4.1 iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h iscsi: Use struct scsi_lun in iscsi structs instead of u8[8] iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi	2011-07-27 13:21:40 -07:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Nicholas Bellinger	123521830c	iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi This patch renames the following iscsi_proto.h structures to avoid namespace issues with drivers/target/iscsi/iscsi_target_core.h: ) struct iscsi_cmd -> struct iscsi_scsi_req ) struct iscsi_cmd_rsp -> struct iscsi_scsi_rsp ) struct iscsi_login -> struct iscsi_login_req This patch includes useful ISCSI_FLAG_LOGIN_[CURRENT,NEXT]_STAGE, and ISCSI_FLAG_SNACK_TYPE_* definitions used by iscsi_target_mod, and fixes the incorrect definition of struct iscsi_snack to following RFC-3720 Section 10.16. SNACK Request. Also, this patch updates libiscsi, iSER, be2iscsi, and bn2xi to use the updated structure definitions in a handful of locations. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>	2011-07-25 07:18:45 +00:00
Linus Torvalds	ece236ce2f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits) IB/qib: Defer HCA error events to tasklet mlx4_core: Bump the driver version to 1.0 RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit() IB/mlx4: Support PMA counters for IBoE IB/mlx4: Use flow counters on IBoE ports IB/pma: Add include file for IBA performance counters definitions mlx4_core: Add network flow counters mlx4_core: Fix location of counter index in QP context struct mlx4_core: Read extended capabilities into the flags field mlx4_core: Extend capability flags to 64 bits IB/mlx4: Generate GID change events in IBoE code IB/core: Add GID change event RDMA/cma: Don't allow IPoIB port space for IBoE RDMA: Allow for NULL .modify_device() and .modify_port() methods IB/qib: Update active link width IB/qib: Fix potential deadlock with link down interrupt IB/qib: Add sysfs interface to read free contexts IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP IB/qib: Remove double define IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP ...	2011-07-22 14:50:12 -07:00
Roland Dreier	4460207561	Merge branches 'cma', 'cxgb4', 'ipath', 'misc', 'mlx4', 'mthca', 'qib' and 'srp' into for-next	2011-07-22 11:56:11 -07:00
Mike Marciniszyn	e67306a380	IB/qib: Defer HCA error events to tasklet With ib_qib options: options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4 a run of ib_write_bw -a yields the following: ------------------------------------------------------------------ #bytes #iterations BW peak[MB/sec] BW average[MB/sec] 1048576 5000 2910.64 229.80 ------------------------------------------------------------------ The top cpu use in a profile is: CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 1002300 Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % samples % app name symbol name 15237 29.2642 964 17.1195 ib_qib.ko qib_7322intr 12320 23.6618 1040 18.4692 ib_qib.ko handle_7322_errors 4106 7.8860 0 0 vmlinux vsnprintf Analysis of the stats, profile, the code, and the annotated profile indicate: - All of the overflow interrupts (one per packet overflow) are serviced on CPU0 with no mitigation on the frequency. - All of the receive interrupts are being serviced by CPU0. (That is the way truescale.cmds statically allocates the kctx IRQs to CPU) - The code is spending all of its time servicing QIB_I_C_ERROR RcvEgrFullErr interrupts on CPU0, starving the packet receive processing. - The decode_err routine is very inefficient, using a printf variant to format a "%s" and continues to loop when the errs mask has been cleared. - Both qib_7322intr and handle_7322_errors read pci registers, which is very inefficient. The fix does the following: - Adds a tasklet to service QIB_I_C_ERROR - Replaces the very inefficient scnprintf() with a memcpy(). A field is added to qib_hwerror_msgs to save the sizeof("string") at compile time so that a strlen is not needed during err_decode(). - The most frequent errors (Overflows) are serviced first to exit the loop as early as possible. - The loop now exits as soon as the errs mask is clear rather than fruitlessly looping through the msp array. With this fix the performance changes to: ------------------------------------------------------------------ #bytes #iterations BW peak[MB/sec] BW average[MB/sec] 1048576 5000 2990.64 2941.35 ------------------------------------------------------------------ During testing of the error handling overflow patch, it was determined that some CPU's were slower when servicing both overflow and receive interrupts on CPU0 with different MSI interrupt vectors. This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI interrupt for kctx's < 2 and to service them on the default interrupt. For some CPUs, the cost of the interrupt enter/exit is more costly than then the additional PCI read in the default handler. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-22 11:56:05 -07:00
Jiri Pirko	7033c4ad87	nes: do vlan cleanup - unify vlan and nonvlan rx path - kill nesvnic->vlan_grp and nes_netdev_vlan_rx_register - allow to turn on/off rx/tx vlan accel via ethtool (set_features) Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-21 13:47:53 -07:00
Manuel Zerpies	3cbe182a5b	RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit() Since printk_ratelimit() shouldn't be used anymore (see comment in include/linux/printk.h), replace it with printk_ratelimited(). Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:18:39 -07:00
Or Gerlitz	c37791349c	IB/mlx4: Support PMA counters for IBoE Use the per port counter attached to all QPs created on that port to implement port level packets/bytes performance counters a la IB. Derived from a patch by Eli Cohen <eli@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:36 -07:00
Or Gerlitz	cfcde11c3d	IB/mlx4: Use flow counters on IBoE ports Allocate flow counter per Ethernet/IBoE port, and attach this counter to all the QPs created on that port. Based on patch by Eli Cohen <eli@mellanox.co.il>. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:36 -07:00
Or Gerlitz	6aea213a62	IB/pma: Add include file for IBA performance counters definitions Move the various definitions and mad structures needed for software implementation of IBA PM agent from the ipath and qib drivers into a single include file, which in turn could be used by more consumers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:35 -07:00
Or Gerlitz	6451c712fe	IB/mlx4: Generate GID change events in IBoE code IBoE doesn't use LIDs. Use the GID change event to update the IB core cache for addition/deletion of GIDs. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:31 -07:00
Or Gerlitz	761d90ed4c	IB/core: Add GID change event Add IB GID change event type. This is needed for IBoE when the HW driver updates the GID (e.g when new VLANs are added/deleted) table and the change should be reflected to the IB core cache. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:30 -07:00
Moni Shoua	2efdd6a038	RDMA/cma: Don't allow IPoIB port space for IBoE This patch fixes a kernel crash in cma_set_qkey(). When the link layer is Ethernet, it is wrong to use IPoIB port space since no IPoIB interface is available. Specifically, setting the Q_Key when port space is RDMA_PS_IPOIB requires MGID calculation and an SA query, which doesn't make sense over Ethernet. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 16:50:25 -07:00
Bart Van Assche	10e1b54bbb	RDMA: Allow for NULL .modify_device() and .modify_port() methods These methods don't make sense for iWARP devices, so rather than forcing them to implement stubs, just return -ENOSYS in the core if the hardware driver doesn't set .modify_device and/or .modify_port. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 16:44:30 -07:00
Mitko Haralanov	e800bd032c	IB/qib: Update active link width Update the active link width on QLE7220 chips when link goes down if chip width does not match shadowed width. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 12:09:26 -07:00
Ram Vepa	4356d0b64b	IB/qib: Fix potential deadlock with link down interrupt There is a possibility of a deadlock due to the way locks are acquired and released in qib_set_uevent_bits(). The function qib_set_uevent_bits() is called in process context and it uses spin_lock() and spin_unlock(). This same lock is acquired/released in interrupt context which can lead to a deadlock when running on the same cpu. The fix is to replace spin_lock() and spin_unlock() with spin_lock_irqsave() and spin_unlock_irqrestore() respectively in qib_set_uevent_bits(). Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 12:09:23 -07:00
Ram Vepa	2df4f7579d	IB/qib: Add sysfs interface to read free contexts Indicate the number of free user contexts via the sysfs file /sys/class/infiniband/qib0/nfreectxts as required for PSM. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 12:09:19 -07:00
Jon Mason	9b89925c0d	IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP The PCIE capability offset is saved during PCI bus walking. It will remove an unnecessary search in the PCI configuration space if this value is referenced instead of reacquiring it. Also, pci_is_pcie is a better way of determining if the device is PCIE or not (as it uses the same saved PCIE capability offset). Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 12:01:22 -07:00
Edwin van Vliet	ac0cae4495	IB/qib: Remove double define Signed-off-by: Edwin van Vliet <edwin@cheatah.nl> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 11:59:23 -07:00
Jon Mason	7f27cda037	IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP The PCIE capability offset is saved during PCI bus walking. It will remove an unnecessary search in the PCI configuration space if this value is referenced instead of reacquiring it. Signed-off-by: Jon Mason <jdmason@kudzu.us> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 11:57:52 -07:00
Motohiro KOSAKI	5763181172	IB/ipath: Convert old cpumask api into new one Adapt to new api. We plan to remove old one later. Almost all changes are trivial, but there is one real fix: the following code is unsafe: int ncpus = num_online_cpus() for (i = 0; i < ncpus; i++) { .. } because 1) we don't guarantee last bit of online cpus is equal to num_online_cpus(). some arch assign sparse cpu number. 2) cpu hotplugging may change cpu_online_mask at same time. we need to pin it by get_online_cpus(). Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 11:56:18 -07:00
Motohiro KOSAKI	0cd85e6738	IB/qib: Convert old cpumask api into new one Adapt to use new APIs. We plan to remove old one later and plan to change current->cpus_allowed implementation. No functional change. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 11:56:03 -07:00
Jack Morgenstein	0c9361fcde	RDMA/cma: Avoid assigning an IS_ERR value to cm_id pointer in CMA id object Avoid assigning an IS_ERR value to the cm_id pointer. This fixes a few anomalies in the error flow due to confusion about checking for NULL vs IS_ERR, and eliminates the need to test for the IS_ERR value every time we wish to determine if the cma_id object has a cm device associated with it. Also, eliminate the now-unnecessary procedure cma_has_cm_dev (we can check directly for the existence of the device pointer -- for a non-NULL check, makes no difference if it is the iwarp or the ib pointer). Finally, make a few code changes here to improve coding consistency. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 09:44:55 -07:00
David S. Miller	69cce1d140	net: Abstract dst->neighbour accesses behind helpers. dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-17 23:11:35 -07:00
Goldwyn Rodrigues	cdb73db0b6	IB/mthca: Stop returning separate error and status from FW commands Instead of having firmware command functions return an error and also a status, leading to code like: err = mthca_FW_COMMAND(..., &status); if (err) goto out; if (status) { err = -E...; goto out; } all over the place, just handle the FW status inside the FW command handling code (the way mlx4 does it), so we can simply write: err = mthca_FW_COMMAND(...); if (err) goto out; In addition to simplifying the source code, this also saves a healthy chunk of text: add/remove: 0/0 grow/shrink: 10/88 up/down: 510/-3357 (-2847) function old new delta static.trans_table 324 584 +260 mthca_cmd_poll 352 477 +125 mthca_cmd_wait 511 567 +56 mthca_table_put 213 240 +27 mthca_cleanup_db_tab 372 387 +15 __mthca_remove_one 314 323 +9 mthca_cleanup_user_db_tab 275 283 +8 __mthca_init_one 1738 1746 +8 mthca_cleanup 20 21 +1 mthca_MAD_IFC 1081 1082 +1 mthca_MGID_HASH 43 40 -3 mthca_MAP_ICM_AUX 23 20 -3 mthca_MAP_ICM 19 16 -3 mthca_MAP_FA 23 20 -3 mthca_READ_MGM 43 38 -5 mthca_QUERY_SRQ 43 38 -5 mthca_QUERY_QP 59 54 -5 mthca_HW2SW_SRQ 43 38 -5 mthca_HW2SW_MPT 60 55 -5 mthca_HW2SW_EQ 43 38 -5 mthca_HW2SW_CQ 43 38 -5 mthca_free_icm_table 120 114 -6 mthca_query_srq 214 206 -8 mthca_free_qp 662 654 -8 mthca_cmd 38 28 -10 mthca_alloc_db 1321 1311 -10 mthca_setup_hca 1067 1055 -12 mthca_WRITE_MTT 35 22 -13 mthca_WRITE_MGM 40 27 -13 mthca_UNMAP_ICM_AUX 36 23 -13 mthca_UNMAP_FA 36 23 -13 mthca_SYS_DIS 36 23 -13 mthca_SYNC_TPT 36 23 -13 mthca_SW2HW_SRQ 35 22 -13 mthca_SW2HW_MPT 35 22 -13 mthca_SW2HW_EQ 35 22 -13 mthca_SW2HW_CQ 35 22 -13 mthca_RUN_FW 36 23 -13 mthca_DISABLE_LAM 36 23 -13 mthca_CLOSE_IB 36 23 -13 mthca_CLOSE_HCA 38 25 -13 mthca_ARM_SRQ 39 26 -13 mthca_free_icms 178 164 -14 mthca_QUERY_DDR 389 375 -14 mthca_resize_cq 1063 1048 -15 mthca_unmap_eq_icm 123 107 -16 mthca_map_eq_icm 396 380 -16 mthca_cmd_box 90 74 -16 mthca_SET_IB 433 417 -16 mthca_RESIZE_CQ 369 353 -16 mthca_MAP_ICM_page 240 224 -16 mthca_MAP_EQ 183 167 -16 mthca_INIT_IB 473 457 -16 mthca_INIT_HCA 745 729 -16 mthca_map_user_db 816 798 -18 mthca_SYS_EN 157 139 -18 mthca_cleanup_qp_table 78 59 -19 mthca_cleanup_eq_table 168 149 -19 mthca_UNMAP_ICM 143 121 -22 mthca_modify_srq 172 149 -23 mthca_unmap_fmr 198 174 -24 mthca_query_qp 814 790 -24 mthca_query_pkey 343 319 -24 mthca_SET_ICM_SIZE 34 10 -24 mthca_QUERY_DEV_LIM 1870 1846 -24 mthca_map_cmd 1130 1105 -25 mthca_ENABLE_LAM 401 375 -26 mthca_modify_port 247 220 -27 mthca_query_device 884 850 -34 mthca_NOP 75 41 -34 mthca_table_get 287 249 -38 mthca_init_qp_table 333 293 -40 mthca_MODIFY_QP 348 308 -40 mthca_close_hca 131 89 -42 mthca_free_eq 435 390 -45 mthca_query_port 755 705 -50 mthca_free_cq 581 528 -53 mthca_alloc_icm_table 578 524 -54 mthca_multicast_attach 1041 986 -55 mthca_init_hca 326 271 -55 mthca_query_gid 487 431 -56 mthca_free_srq 524 468 -56 mthca_free_mr 168 111 -57 mthca_create_eq 1560 1501 -59 mthca_multicast_detach 790 728 -62 mthca_write_mtt 918 854 -64 mthca_register_device 1406 1342 -64 mthca_fmr_alloc 947 883 -64 mthca_mr_alloc 652 582 -70 mthca_process_mad 1242 1164 -78 mthca_dev_lim 910 830 -80 find_mgm 482 400 -82 mthca_modify_qp 3852 3753 -99 mthca_init_cq 1281 1181 -100 mthca_alloc_srq 1719 1610 -109 mthca_init_eq_table 1807 1679 -128 mthca_init_tavor 761 491 -270 mthca_init_arbel 2617 2098 -519 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>	2011-07-15 13:33:20 -07:00
David S. Miller	6a7ebdf2fd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c	2011-07-14 07:56:40 -07:00
Bart Van Assche	fd1b6c4a69	IB/srp: Avoid duplicate devices from LUN scan SCSI scanning of a channel🆔lun triplet in Linux works as follows (function scsi_scan_target() in drivers/scsi/scsi_scan.c): - If lun == SCAN_WILD_CARD, send a REPORT LUNS command to the target and process the result. - If lun != SCAN_WILD_CARD, send an INQUIRY command to the LUN corresponding to the specified channel🆔lun triplet to verify whether the LUN exists. So a SCSI driver must either take the channel and target id values in account in its quecommand() function or it should declare that it only supports one channel and one target id. Currently the ib_srp driver does neither. As a result scanning the SCSI bus via e.g. rescan-scsi-bus.sh causes many duplicate SCSI devices to be created. For each 0:0:L device, several duplicates are created with the same LUN number and with (C:I) != (0:0). Fix this by declaring that the ib_srp driver only supports one channel and one target id. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@kernel.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-13 09:19:16 -07:00
David S. Miller	e12fe68ce3	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2011-07-05 23:23:37 -07:00
Goldwyn Rodrigues	b2bc478219	RDMA: Check for NULL mode in .devnode methods Commits `71c29bd5c2` ("IB/uverbs: Add devnode method to set path/mode") and `c3af0980ce` ("IB: Add devnode methods to cm_class and umad_class") added devnode methods that set the mode. However, these methods don't check for a NULL mode, and so we get a crash when unloading modules because devtmpfs_delete_node() calls device_get_devnode() with mode == NULL. Add the missing checks. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> [ Also fix cm.c. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-04 15:53:28 -07:00
Roland Dreier	c7d74b0909	Merge branches 'cxgb4' and 'qib' into for-next	2011-06-17 11:57:55 -07:00
Mitko Haralanov	3126448451	IB/qib: Ensure that LOS and DFE are being turned off Due to timing, it is possible for the LOS and DFE to remain on. This is due to the link progressing to LinkUP prior to the driver getting the first Status Changed interrupt. By expanding the conditions under which LOS is turned off and DFE timeout is being set, timing is no longer an issue. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-06-17 11:56:59 -07:00
Steve Wise	8da7e7a552	RDMA/cxgb4: Couple of abort fixes - fix a race where the driver could end up sending a close_con_req after an abort_rpl. In c4iw_ep_disconnect(), send abort or close request with the ep mutex held. - fix a hang where driver fails to wake up when a connection is reset during a normal close. Wake up any waiters in the interrupt path, and correctly cleanup after rdma_fini() failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-06-17 11:54:56 -07:00
Steve Wise	301c2c3f03	RDMA/cxgb4: Don't truncate MR lengths Remove left-over code from T3 that limited MR sizes to 32b. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-06-17 11:54:50 -07:00
Steve Wise	2ff7d09a1b	RDMA/cxgb4: Don't exceed hw IQ depth limit for user CQs Memory allocated for user CQs gets rounded up to the next page boundary. And after rounding, we recalculate the resulting IQ depth and we need to make sure we don't exceed the HW limits. This bug can result a much smaller CQ allocated than was expected if the HW size field is exceeded, resulting in CQ overflow failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-06-17 11:52:45 -07:00
Greg Rose	c7ac8679be	rtnetlink: Compute and store minimum ifinfo dump size The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2011-06-09 20:38:07 -07:00
Alexey Dobriyan	a6b7a40786	net: remove interrupt.h inclusion from netdevice.h * remove interrupt.g inclusion from netdevice.h -- not needed * fixup fallout, add interrupt.h and hardirq.h back where needed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-06 22:55:11 -07:00
Linus Torvalds	4c171acc20	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/cma: Save PID of ID's owner RDMA/cma: Add support for netlink statistics export RDMA/cma: Pass QP type into rdma_create_id() RDMA: Update exported headers list RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h> RDMA/nes: Add a check for strict_strtoul() RDMA/cxgb3: Don't post zero-byte read if endpoint is going away RDMA/cxgb4: Use completion objects for event blocking IB/srp: Fix integer -> pointer cast warnings IB: Add devnode methods to cm_class and umad_class IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required IB/uverbs: Add devnode method to set path/mode RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node RDMA: Add netlink infrastructure RDMA: Add error handling to ib_core_init()	2011-05-26 12:13:57 -07:00
Roland Dreier	8dc4abdf4c	Merge branches 'cma', 'cxgb3', 'cxgb4', 'misc', 'nes', 'netlink', 'srp' and 'uverbs' into for-next	2011-05-25 13:47:20 -07:00
Nir Muchtar	83e9502d8d	RDMA/cma: Save PID of ID's owner Save the PID associated with an RDMA CM ID for reporting via netlink. Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:23 -07:00
Nir Muchtar	753f618ae0	RDMA/cma: Add support for netlink statistics export Add callbacks and data types for statistics export of all current devices/ids. The schema for RDMA CM is a series of netlink messages. Each one contains an rdma_cm_stat struct. Additionally, two netlink attributes are created for the addresses for each message (if applicable). Their types used are: RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID) RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID) sockaddr_* structs are encapsulated within these attributes. In other words, every transaction contains a series of messages like: -------message 1------- struct rdma_cm_id_stats { __u32 qp_num; __u32 bound_dev_if; __u32 port_space; __s32 pid; __u8 cm_state; __u8 node_type; __u8 port_num; __u8 reserved; } RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address -------end 1------- -------message 2------- struct rdma_cm_id_stats RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute -------end 2------- Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:23 -07:00
Sean Hefty	b26f9b9949	RDMA/cma: Pass QP type into rdma_create_id() The RDMA CM currently infers the QP type from the port space selected by the user. In the future (eg with RDMA_PS_IB or XRC), there may not be a 1-1 correspondence between port space and QP type. For netlink export of RDMA CM state, we want to export the QP type to userspace, so it is cleaner to explicitly associate a QP type to an ID. Modify rdma_create_id() to allow the user to specify the QP type, and use it to make our selections of datagram versus connected mode. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:23 -07:00
Nir Muchtar	550e5ca77e	RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h> Move cma.c's internal definition of enum cma_state to enum rdma_cm_state in an exported header so that it can be exported via RDMA netlink. Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:22 -07:00
Liu Yuan	52f81dbaf1	RDMA/nes: Add a check for strict_strtoul() It should check if strict_strtoul() succeeds before using 'wqm_quanta_value'. Signed-off-by: Liu Yuan <tailai.ly@taobao.com> [ Convert to kstrtoul() directly while we're here. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-24 10:06:25 -07:00
Steve Wise	807838686e	RDMA/cxgb3: Don't post zero-byte read if endpoint is going away tx_ack() wasn't checking the endpoint state and consequently would attempt to post the p2p 0B read on an endpoint/QP that is closing or aborting. This causes a NULL pointer dereference crash. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-24 10:01:04 -07:00
Steve Wise	c337374bf2	RDMA/cxgb4: Use completion objects for event blocking There exists a race condition when using wait_queue_head_t objects that are declared on the stack. This was being done in a few places where we are sending work requests to the FW and awaiting replies, but we don't have an endpoint structure with an embedded c4iw_wr_wait struct. So the code was allocating it locally on the stack. Bad design. The race is: 1) thread on cpuX declares the wait_queue_head_t on the stack, then posts a firmware WR with that wait object ptr as the cookie to be returned in the WR reply. This thread will proceed to block in wait_event_timeout() but before it does: 2) An interrupt runs on cpuY with the WR reply. fw6_msg() handles this and calls c4iw_wake_up(). c4iw_wake_up() sets the condition variable in the c4iw_wr_wait object to TRUE and will call wake_up(), but before it calls wake_up(): 3) The thread on cpuX calls c4iw_wait_for_reply(), which calls wait_event_timeout(). The wait_event_timeout() macro checks the condition variable and returns immediately since it is TRUE. So this thread never blocks/sleeps. The function then returns effectively deallocating the c4iw_wr_wait object that was on the stack. 4) So at this point cpuY has a pointer to the c4iw_wr_wait object that is no longer valid. Further its pointing to a stack frame that might now be in use by some other context/thread. So cpuY continues execution and calls wake_up() on a ptr to a wait object that as been effectively deallocated. This race, when it hits, can cause a crash in wake_up(), which I've seen under heavy stress. It can also corrupt the referenced stack which can cause any number of failures. The fix: Use struct completion, which supports on-stack declarations. Completions use a spinlock around setting the condition to true and the wake up so that steps 2 and 4 above are atomic and step 3 can never happen in-between. Signed-off-by: Steve Wise <swise@opengridcomputing.com>	2011-05-24 09:47:38 -07:00
Roland Dreier	737b94eb41	IB/srp: Fix integer -> pointer cast warnings Fix drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_handle_recv': drivers/infiniband/ulp/srp/ib_srp.c:1150: warning: cast to pointer from integer of different size drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_send_completion': drivers/infiniband/ulp/srp/ib_srp.c🔢 warning: cast to pointer from integer of different size by adding an intermediate cast to uintptr_t. Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-by: David Dillow <dillowda@ornl.gov>	2011-05-23 11:30:04 -07:00
Roland Dreier	c3af0980ce	IB: Add devnode methods to cm_class and umad_class We want the ucmX, umadX and issmX device nodes to show up under /dev/infiniband, and additionally ucmX should have mode 0666. Add appropriate devnode methods to their class structs for this. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-23 11:24:28 -07:00
Ira Weiny	c8367c4cd9	IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required We had a script which was looping through the devices returned from ibstat and attempted to register a SMI agent on an ethernet device. This caused a kernel panic for IBoE devices that don't have QP0. Fix this by checking if the QP exists before using it. Signed-off-by: Ira Weiny <weiny2@llnl.gov> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-23 11:15:48 -07:00
Roland Dreier	71c29bd5c2	IB/uverbs: Add devnode method to set path/mode We want udev to create a device node under /dev/infiniband with permission 0666 for uverbsX devices, so add a devnode method to set the appropriate info. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-23 11:10:05 -07:00
Roland Dreier	04ea2f8197	RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node We want udev to create a device node under /dev/infiniband with permission 0666 for rdma_cm, so add that info to our struct miscdevice. Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-by: Sean Hefty <sean.hefty@intel.com>	2011-05-23 10:48:43 -07:00
Paul Gortmaker	70c7160619	Add appropriate <linux/prefetch.h> include for prefetch users After discovering that wide use of prefetch on modern CPUs could be a net loss instead of a win, net drivers which were relying on the implicit inclusion of prefetch.h via the list headers showed up in the resulting cleanup fallout. Give them an explicit include via the following $0.02 script. ========================================= #!/bin/bash MANUAL="" for i in `git grep -l 'prefetch(.*)' .` ; do grep -q '<linux/prefetch.h>' $i if [ $? = 0 ] ; then continue fi ( echo '?^#include <linux/?a' echo '#include <linux/prefetch.h>' echo . echo w echo q ) \| ed -s $i > /dev/null 2>&1 if [ $? != 0 ]; then echo $i needs manual fixup MANUAL="$i $MANUAL" fi done echo ------------------- 8\<---------------------- echo vi $MANUAL ========================================= Signed-off-by: Paul <paul.gortmaker@windriver.com> [ Fixed up some incorrect #include placements, and added some non-network drivers and the fib_trie.c case - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-22 21:41:57 -07:00
Linus Torvalds	06f4e926d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits) macvlan: fix panic if lowerdev in a bond tg3: Add braces around 5906 workaround. tg3: Fix NETIF_F_LOOPBACK error macvlan: remove one synchronize_rcu() call networking: NET_CLS_ROUTE4 depends on INET irda: Fix error propagation in ircomm_lmp_connect_response() irda: Kill set but unused variable 'bytes' in irlan_check_command_param() irda: Kill set but unused variable 'clen' in ircomm_connect_indication() rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport() be2net: Kill set but unused variable 'req' in lancer_fw_download() irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication() atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined. rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer(). rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler() rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection() rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window() pkt_sched: Kill set but unused variable 'protocol' in tc_classify() isdn: capi: Use pr_debug() instead of ifdefs. tg3: Update version to 3.119 tg3: Apply rx_discards fix to 5719/5720 ... Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c as per Davem.	2011-05-20 13:43:21 -07:00
Roland Dreier	b2cbae2c24	RDMA: Add netlink infrastructure Add basic RDMA netlink infrastructure that allows for registration of RDMA clients for which data is to be exported and supplies message construction callbacks. Signed-off-by: Nir Muchtar <nirm@voltaire.com> [ Reorganize a few things, add CONFIG_NET dependency. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-20 11:46:11 -07:00
Nir Muchtar	fd75c789ab	RDMA: Add error handling to ib_core_init() Fail RDMA midlayer initialization if sysfs setup fails. Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-20 11:46:10 -07:00
Benjamin Herrenschmidt	880102e785	Merge remote branch 'origin/master' into merge Manual merge of arch/powerpc/kernel/smp.c and add missing scheduler_ipi() call to arch/powerpc/platforms/cell/interrupt.c Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-20 15:36:52 +10:00
Benjamin Herrenschmidt	4c8440666b	Merge branch 'merge' into next	2011-05-19 17:00:06 +10:00
Roland Dreier	1df9fad122	Merge branches 'cma', 'cxgb4' and 'qib' into for-next	2011-05-12 08:57:20 -07:00
Sergei Shtylyov	1c65335714	IB/qib: Use pci_dev->revision The driver reads PCI revision ID from the PCI configuration register while it's already stored by PCI subsystem in the revision field of struct pci_dev. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-12 08:57:12 -07:00
David S. Miller	5fc3590c81	infiniband: Remove rt->rt_src usage in addr4_resolve() Use an explicit flow key and fetch it from there. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-10 13:32:48 -07:00
Roland Dreier	d0c49bf391	RDMA/iwcm: Get rid of enum iw_cm_event_status The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places; cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4 drivers -- only nes was using the enum values (with the mild consequence that all nes connection failures were treated as generic errors rather than reported as timeouts or rejections). We can fix this confusion by getting rid of enum iw_cm_event_status and using a plain int for struct iw_cm_event.status, and converting nes to use -Exxx as the other iWARP drivers do. This also gets rid of the warning drivers/infiniband/core/cma.c: In function 'cma_iw_handler': drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status' drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status' drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status' Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Faisal Latif <faisal.latif@intel.com>	2011-05-09 22:23:57 -07:00
Sergei Shtylyov	ec03d6777a	IB/ipath: Use pci_dev->revision, again Commit `44c10138fd` ("PCI: Change all drivers to use pci_device->revision") already converted this driver to using the revision field of struct pci_dev but commit `bb9171448d` ("IB/ipath: Misc changes to prepare for IB7220 introduction") later reverted that change for some strange reason. Restore the change. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:07:31 -07:00
Mitko Haralanov	9f5754e34b	IB/qib: Prevent driver hang with unprogrammed boards The time limit test now correctly checks against current jiffies to avoid the hang. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:07:31 -07:00
Steve Wise	2f25e9a540	RDMA/cxgb4: EEH errors can hang the driver A few more EEH fixes: c4iw_wait_for_reply(): detect fatal EEH condition on timeout and return an error. The iw_cxgb4 driver was only calling ib_deregister_device() on an EEH event followed by a ib_register_device() when the device was reinitialized. However, the RDMA core doesn't allow multiple iterations of register/deregister by the provider. See drivers/infiniband/core/sysfs.c: ib_device_unregister_sysfs() where the kobject ref is held until the device is deallocated in ib_deallocate_device(). Calling deregister adds this kobj reference, and then a subsequent register call will generate a WARN_ON() from the kobject subsystem because the kobject is being initialized but is already initialized with the ref held. So the provider must deregister and dealloc when resetting for an EEH event, then alloc/register to re-initialize. To do this, we cannot use the device ptr as our ULD handle since it will change with each reallocation. This commit adds a ULD context struct which is used as the ULD handle, and then contains the device pointer and other state needed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:23 -07:00
Steve Wise	d9594d990a	RDMA/cxgb4: Reset wait condition atomically The driver was never really waiting for RDMA_WR/FINI completions because the condition variable used to determine if the completion happened was never reset, and this condition variable is reused for both connection setup and teardown. This causes various driver crashes under heavy loads due to releasing resources too early. The fix is to use atomic bits to correctly reset the condition immediately after the completion is detected. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:22 -07:00
Roel Kluin	85d215b0f3	RDMA/cxgb4: Fix missing parentheses Parens are missing: '\|' has a higher presedence than '?'. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:22 -07:00
Steve Wise	bbe9a0a2bc	RDMA/cxgb4: Initialization errors can cause crash c4iw_uld_add() must return ERR_PTR() values instead of NULL on failure. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:22 -07:00
Steve Wise	30c95c2d49	RDMA/cxgb4: Don't change QP state outside EP lock Concurrent ingress CLOSE and ULP ABORT operations causes a crash due to a race condition where the close path releases the EP lock and then tries to move the QP state to CLOSED. This must be done inside the EP lock to avoid the race. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:22 -07:00
Hefty, Sean	a9bb79128a	RDMA/cma: Add an ID_REUSEADDR option Lustre requires that clients bind to a privileged port number before connecting to a remote server. On larger clusters (typically more than about 1000 nodes), the number of privileged ports is exhausted, resulting in lustre being unusable. To handle this, we add support for reusable addresses to the rdma_cm. This mimics the behavior of the socket option SO_REUSEADDR. A user may set an rdma_cm_id to reuse an address before calling rdma_bind_addr() (explicitly or implicitly). If set, other rdma_cm_id's may be bound to the same address, provided that they all have reuse enabled, and there are no active listens. If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it will only succeed if there are no other id's bound to that same address. The reuse option is exported to user space. The behavior of the kernel reuse implementation was verified against that given by sockets. This patch is derived from a path by Ira Weiny <weiny2@llnl.gov> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:10 -07:00
Hefty, Sean	43b752daae	RDMA/cma: Fix handling of IPv6 addressing in cma_use_port cma_use_port() assumes that the sockaddr is an IPv4 address. Since IPv6 addressing is supported (and also to support other address families) make the code more generic in its address handling. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:10 -07:00
David S. Miller	31e4543db2	ipv4: Make caller provide on-stack flow key to ip_route_output_ports(). Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-03 20:25:42 -07:00
David Decotigny	7073949720	ethtool: cosmetic: Use ethtool ethtool_cmd_speed API This updates the network drivers so that they don't access the ethtool_cmd::speed field directly, but use ethtool_cmd_speed() instead. For most of the drivers, these changes are purely cosmetic and don't fix any problem, such as for those 1GbE/10GbE drivers that indirectly call their own ethtool get_settings()/mii_ethtool_gset(). The changes are meant to enforce code consistency and provide robustness with future larger throughputs, at the expense of a few CPU cycles for each ethtool operation. All drivers compiled with make allyesconfig ion x86_64 have been updated. Tested: make allyesconfig on x86_64 + e1000e/bnx2x work Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-29 14:03:01 -07:00
Lucas De Marchi	e9c549998d	Revert wrong fixes for common misspellings These changes were incorrectly fixed by codespell. They were now manually corrected. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-04-26 23:31:11 -07:00
Nishanth Aravamudan	e297d9dd5c	cxgb4: use pgprot_writecombine() on powerpc Commit `fe3cc0d99d` ("powerpc: Add pgprot_writecombine") in benh's tree exposes the pgprot_writecombine() API to drivers on powerpc. cxgb4 has an open-coded version of the same, so use the common API now that it's available. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Anton Blanchard <anton@samba.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 14:18:25 +10:00
Michał Mirosław	3d96c74d89	net: infiniband/ulp/ipoib: convert to hw_features Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:30:42 -07:00
Michał Mirosław	dd6f6d0249	net: infiniband/hw/nes: convert to hw_features Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:30:41 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Linus Torvalds	dc50eddb2f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/nes: Fix test of uninitialized netdev	2011-03-25 21:06:37 -07:00
Linus Torvalds	00a2470546	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits) route: Take the right src and dst addresses in ip_route_newports ipv4: Fix nexthop caching wrt. scoping. ipv4: Invalidate nexthop cache nh_saddr more correctly. net: fix pch_gbe section mismatch warning ipv4: fix fib metrics mlx4_en: Removing HW info from ethtool -i report. net_sched: fix THROTTLED/RUNNING race drivers/net/a2065.c: Convert release_resource to release_region/release_mem_region drivers/net/ariadne.c: Convert release_resource to release_region/release_mem_region bonding: fix rx_handler locking myri10ge: fix rmmod crash mlx4_en: updated driver version to 1.5.4.1 mlx4_en: Using blue flame support mlx4_core: reserve UARs for userspace consumers mlx4_core: maintain available field in bitmap allocator mlx4: Add blue flame support for kernel consumers mlx4_en: Enabling new steering mlx4: Add support for promiscuous mode in the new steering model. mlx4: generalization of multicast steering. mlx4_en: Reporting HW revision in ethtool -i ...	2011-03-25 21:02:22 -07:00
Roland Dreier	cf55bb2439	RDMA/nes: Fix test of uninitialized netdev Commit `1765a57533` ("net: make dev->master general") introduced a test of an uninitialized netdev. Fix the code so the intended netdev is tested. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-24 17:18:30 -07:00
Linus Torvalds	0625bef606	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB: Increase DMA max_segment_size on Mellanox hardware IB/mad: Improve an error message so error code is included RDMA/nes: Don't print success message at level KERN_ERR RDMA/addr: Fix return of uninitialized ret value IB/srp: try to use larger FMR sizes to cover our mappings IB/srp: add support for indirect tables that don't fit in SRP_CMD IB/srp: rework mapping engine to use multiple FMR entries IB/srp: allow sg_tablesize to be set for each target IB/srp: move IB CM setup completion into its own function IB/srp: always avoid non-zero offsets into an FMR	2011-03-24 07:59:46 -07:00
Yevgeny Petrilin	0345584e0b	mlx4: generalization of multicast steering. The same packet steering mechanism would be used both for IB and Ethernet, Both multicasts and unicasts. This commit prepares the general infrastructure for this. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-23 12:24:21 -07:00
Yevgeny Petrilin	725c89997e	mlx4_en: Reporting HW revision in ethtool -i HW revision is derived from device ID and rev id. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-23 12:24:20 -07:00
Roland Dreier	ba82638247	Merge branches 'misc', 'nes' and 'srp' into for-next	2011-03-23 11:13:37 -07:00
David Dillow	7f9e5c48c1	IB: Increase DMA max_segment_size on Mellanox hardware By default, each device is assumed to be able only handle 64 KB chunks during DMA. By giving the segment size a larger value, the block layer will coalesce more S/G entries together for SRP, allowing larger requests with the same sg_tablesize setting. The block layer is the only direct user of it, though a few IOMMU drivers reference it as well for their *_map_sg coalescing code. pci-gart_64 on x86, and a smattering on on sparc, powerpc, and ia64. Since other IB protocols could potentially see larger segments with this, let's check those: - iSER is fine, because you limit your maximum request size to 512 KB, so we'll never overrun the page vector in struct iser_page_vec (128 entries currently). It is independent of the DMA segment size, and handles multi-page segments already. - IPoIB is fine, as it maps each page individually, and doesn't use ib_dma_map_sg(). - RDS appears to do the right thing and has no dependencies on DMA segment size, but I don't claim to have done a complete audit. - NFSoRDMA and 9p are OK -- they do not use ib_dma_map_sg(), so they doesn't care about the coalescing. - Lustre's ko2iblnd does not care about coalescing -- it properly walks the returned sg list. This patch ups the value on Mellanox hardware to 1 GB, which matches reported firmware limits on mlx4. Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-22 09:39:18 -07:00
Roland Dreier	071c778347	Merge branch 'external-indirect' of git://git.kernel.org/pub/scm/linux/kernel/git/dad/srp-initiator into srp	2011-03-21 11:52:00 -07:00
Michael Heinz	1eba843dd7	IB/mad: Improve an error message so error code is included Signed-off-by: Michael Heinz <michael.heinz@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-18 09:42:20 -07:00
Roland Dreier	748bfd9c1d	RDMA/nes: Don't print success message at level KERN_ERR There's no reason to print "NetEffect RNIC driver successfully loaded" at level KERN_ERR (where it will uglify the console on a quiet boot). Change it to KERN_INFO. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-18 08:52:30 -07:00
Linus Torvalds	ec0afc9311	Merge branch 'kvm-updates/2.6.39' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.39' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (55 commits) KVM: unbreak userspace that does not sets tss address KVM: MMU: cleanup pte write path KVM: MMU: introduce a common function to get no-dirty-logged slot KVM: fix rcu usage in init_rmode_* functions KVM: fix kvmclock regression due to missing clock update KVM: emulator: Fix permission checking in io permission bitmap KVM: emulator: Fix io permission checking for 64bit guest KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n KVM: x86: Remove useless regs_page pointer from kvm_lapic KVM: improve comment on rcu use in irqfd_deassign KVM: MMU: remove unused macros KVM: MMU: cleanup page alloc and free KVM: MMU: do not record gfn in kvm_mmu_pte_write KVM: MMU: move mmu pages calculated out of mmu lock KVM: MMU: set spte accessed bit properly KVM: MMU: fix kvm_mmu_slot_remove_write_access dropping intermediate W bits KVM: Start lock documentation KVM: better readability of efer_reserved_bits KVM: Clear async page fault hash after switching to real mode KVM: VMX: Initialize vm86 TSS only once. ...	2011-03-17 18:40:35 -07:00
Linus Torvalds	c55d267de2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits) [SCSI] scsi_dh_rdac: Add MD36xxf into device list [SCSI] scsi_debug: add consecutive medium errors [SCSI] libsas: fix ata list corruption issue [SCSI] hpsa: export resettable host attribute [SCSI] hpsa: move device attributes to avoid forward declarations [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26) [SCSI] sd: Logical Block Provisioning update [SCSI] Include protection operation in SCSI command trace [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try) [SCSI] target: Fix volume size misreporting for volumes > 2TB [SCSI] bnx2fc: Broadcom FCoE offload driver [SCSI] fcoe: fix broken fcoe interface reset [SCSI] fcoe: precedence bug in fcoe_filter_frames() [SCSI] libfcoe: Remove stale fcoe-netdev entries [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out" [SCSI] libfc: Fixing a memory leak when destroying an interface [SCSI] megaraid_sas: Version and Changelog update ... Fix up trivial conflicts due to whitespace differences in drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}	2011-03-17 17:54:40 -07:00
Sean Hefty	1bdd6384c2	RDMA/addr: Fix return of uninitialized ret value Commit `b23dd4fe42` ("ipv4: Make output route lookup return rtable directly") resulted in leaving ret uninitialized, where it may later be returned. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-17 17:00:19 -07:00
Huang Ying	0014bd990e	mm: export __get_user_pages In most cases, get_user_pages and get_user_pages_fast should be used to pin user pages in memory. But sometimes, some special flags except FOLL_GET, FOLL_WRITE and FOLL_FORCE are needed, for example in following patch, KVM needs FOLL_HWPOISON. To support these users, __get_user_pages is exported directly. There are some symbol name conflicts in infiniband driver, fixed them too. Signed-off-by: Huang Ying <ying.huang@intel.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Michel Lespinasse <walken@google.com> CC: Roland Dreier <roland@kernel.org> CC: Ralph Campbell <infinipath@qlogic.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:27 -03:00
Linus Torvalds	7a6362800c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits) bonding: enable netpoll without checking link status xfrm: Refcount destination entry on xfrm_lookup net: introduce rx_handler results and logic around that bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag bonding: wrap slave state work net: get rid of multiple bond-related netdevice->priv_flags bonding: register slave pointer for rx_handler be2net: Bump up the version number be2net: Copyright notice change. Update to Emulex instead of ServerEngines e1000e: fix kconfig for crc32 dependency netfilter ebtables: fix xt_AUDIT to work with ebtables xen network backend driver bonding: Improve syslog message at device creation time bonding: Call netif_carrier_off after register_netdevice bonding: Incorrect TX queue offset net_sched: fix ip_tos2prio xfrm: fix __xfrm_route_forward() be2net: Fix UDP packet detected status in RX compl Phonet: fix aligned-mode pipe socket buffer header reserve netxen: support for GbE port settings ... Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c with the staging updates.	2011-03-16 16:29:25 -07:00
David Dillow	be8b981453	IB/srp: try to use larger FMR sizes to cover our mappings Now that we can get larger SG lists, we can take advantage of HCAs that allow us to use larger FMR sizes. In many cases, we can use up to 512 entries, so start there and work our way down. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:41:30 -04:00
David Dillow	c07d424d61	IB/srp: add support for indirect tables that don't fit in SRP_CMD This allows us to guarantee the ability to submit up to 8 MB requests based on the current value of SCSI_MAX_SG_CHAIN_SEGMENTS. While FMR will usually condense the requests into 8 SG entries, it is imperative that the target support external tables in case the FMR mapping fails or is not supported. We add a safety valve to allow targets without the needed support to reap the benefits of the large tables, but fail in a manner that lets the user know that the data didn't make it to the device. The user must add "allow_ext_sg=1" to the target parameters to indicate that the target has the needed support. If indirect_sg_entries is not specified in the modules options, then the sg_tablesize for the target will default to cmd_sg_entries unless overridden by the target options. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:37:23 -04:00
David Dillow	8f26c9ff9c	IB/srp: rework mapping engine to use multiple FMR entries Instead of forcing all of the S/G entries to fit in one FMR, and falling back to indirect descriptors if that fails, allow the use of as many FMRs as needed to map the request. This lays the groundwork for allowing indirect descriptor tables that are larger than can fit in the command IU, but should marginally improve performance now by reducing the number of indirect descriptors needed. We increase the minimum page size for the FMR pool to 4K, as larger pages help increase the coverage of each FMR, and it is rare that the kernel would send down a request with scattered 512 byte fragments. This patch also move some of the target initialization code afte the parsing of options, to keep it together with the new code that needs to allocate memory based on the options given. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:35:16 -04:00
David Dillow	4924864404	IB/srp: allow sg_tablesize to be set for each target Different configurations of target software allow differing max sizes of the command IU. Allowing this to be changed per-target allows all targets on an initiator to get an optimal setting. We deprecate srp_sg_tablesize and replace it with cmd_sg_entries in preparation for allowing more indirect descriptors than can fit in the IU. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:35:05 -04:00
David Dillow	961e0be89a	IB/srp: move IB CM setup completion into its own function This is to clean up prior to further changes. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:34:48 -04:00
David Dillow	8c4037b501	IB/srp: always avoid non-zero offsets into an FMR It is unclear exactly how this code works around Mellanox SRP targets, or if the problem is on the target side or in the HCA itself. In an abundance of caution, we should always enable the workaround. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:34:28 -04:00
Roland Dreier	043332cf28	Merge branches 'cma', 'cxgb4', 'ipath' and 'qib' into for-next	2011-03-15 10:58:04 -07:00
Sean Hefty	a396d43a35	RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one rdma_destroy_id currently uses the global rdma cm 'lock' to test if an rdma_cm_id has been bound to a device. This prevents an active address resolution callback handler from assigning a device to the rdma_cm_id after rdma_destroy_id checks for one. Instead, we can replace the use of the global lock around the check to the rdma_cm_id device pointer by setting the id state to destroying, then flushing all active callbacks. The latter is accomplished by acquiring and releasing the handler_mutex. Any active handler will complete first, and any newly scheduled handlers will find the rdma_cm_id in an invalid state. In addition to optimizing the current locking scheme, the use of the rdma_cm_id mutex is a more intuitive synchronization mechanism than that of the global lock. These changes are based on feedback from Doug Ledford <dledford@redhat.com> while he was trying to debug a crash in the rdma cm destroy path. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-15 10:57:34 -07:00
Sean Hefty	8d8ac86564	IB/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state This problem was reported by Moni Shoua <monis@mellanox.com> and Amir Vadai <amirv@mellanox.com>: When destroying a cm_id from a context of a work queue and if the lap_state of this cm_id is IB_CM_LAP_SENT, we need to release the reference of this id that was taken upon the send of the LAP message. Otherwise, if the expected APR message gets lost, it is only after a long time that the reference will be released, while during that the work handler thread is not available to process other things. It turns out that we need to cancel any pending LAP messages whenever we transition out of the IB_CM_ESTABLISH state. This occurs when disconnecting - either sending or receiving a DREQ. It can also happen in a corner case where we receive a REJ message after sending an RTU, followed by a LAP. Add checks and cancel any outstanding LAP messages in these three cases. Canceling the LAP when sending a DREQ fixes the destroy problem reported by Moni. When a cm_id is destroyed in the IB_CM_ESTABLISHED state, it sends a DREQ to the remote side to notify the peer that the connection is going away. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-15 10:56:12 -07:00
Sean Hefty	29963437a4	IB/cm: Bump reference count on cm_id before invoking callback When processing a SIDR REQ, the ib_cm allocates a new cm_id. The refcount of the cm_id is initialized to 1. However, cm_process_work will decrement the refcount after invoking all callbacks. The result is that the cm_id will end up with refcount set to 0 by the end of the sidr req handler. If a user tries to destroy the cm_id, the destruction will proceed, under the incorrect assumption that no other threads are referencing the cm_id. This can lead to a crash when the cm callback thread tries to access the cm_id. This problem was noticed as part of a larger investigation with kernel crashes in the rdma_cm when running on a real time OS. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Doug Ledford <dledford@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-15 10:56:12 -07:00
Sean Hefty	25ae21a101	RDMA/cma: Fix crash in request handlers Doug Ledford and Red Hat reported a crash when running the rdma_cm on a real-time OS. The crash has the following call trace: cm_process_work cma_req_handler cma_disable_callback rdma_create_id kzalloc init_completion cma_get_net_info cma_save_net_info cma_any_addr cma_zero_addr rdma_translate_ip rdma_copy_addr cma_acquire_dev rdma_addr_get_sgid ib_find_cached_gid cma_attach_to_dev ucma_event_handler kzalloc ib_copy_ah_attr_to_user cma_comp [ preempted ] cma_write copy_from_user ucma_destroy_id copy_from_user _ucma_find_context ucma_put_ctx ucma_free_ctx rdma_destroy_id cma_exch cma_cancel_operation rdma_node_get_transport rt_mutex_slowunlock bad_area_nosemaphore oops_enter They were able to reproduce the crash multiple times with the following details: Crash seems to always happen on the: mutex_unlock(&conn_id->handler_mutex); as conn_id looks to have been freed during this code path. An examination of the code shows that a race exists in the request handlers. When a new connection request is received, the rdma_cm allocates a new connection identifier. This identifier has a single reference count on it. If a user calls rdma_destroy_id() from another thread after receiving a callback, rdma_destroy_id will proceed to destroy the id and free the associated memory. However, the request handlers may still be in the process of running. When control returns to the request handlers, they can attempt to access the newly created identifiers. Fix this by holding a reference on the newly created rdma_cm_id until the request handler is through accessing it. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Doug Ledford <dledford@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-15 10:00:28 -07:00
Nicolas Kaiser	2a543904dd	IB/ipath: Don't reset disabled devices The comment some lines above states that disabled devices must not reset. Signed-off-by: Nicolas Kaiser <nikai@nikai.net>	2011-03-14 14:25:59 -07:00
Mitko Haralanov	36b87b419c	IB/qib: Fix M_Key field in SubnGet and SubnGetResp MADs Set the M_Key field in SubnGet and SugnGetResp MADs based on correctly interpreting the protection level specified in the M_KeyProtBits field. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:11:51 -07:00
Mitko Haralanov	4634b7945c	IB/qib: Set default LE2 value for active cables to 0 For active and far-EQ cables use an LE2 value of 0 for improved SI. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:10:34 -07:00
Steve Wise	db5d040d7b	RDMA/cxgb4: Debugfs dump_qp() updates - Show whether the SQ is in onchip memory or not. - Dump both SQ and RQ QIDs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:14 -07:00
Steve Wise	767fbe8151	RDMA/cxgb4: Dispatch FATAL event on EEH errors This at least kicks the user mode applications that are watching for device events. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:13 -07:00
Steve Wise	b48f3b9c10	RDMA/cxgb4: Use ULP_MODE_TCPDDP Set the ULP mode for initial RDMA connection setup to the proper DDP mode. This avoids wasting some HW resources while in streaming mode. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:12 -07:00

... 3 4 5 6 7 ...

3091 Commits