linux

Commit Graph

Author	SHA1	Message	Date
Julia Lawall	e642df6a0b	IB/ucm: Use memdup_user() Use memdup_user when user data is immediately copied into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = $kmalloc@p\\|kzalloc@p$(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) \|\| ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk>	2010-05-25 21:10:57 -07:00
Roland Dreier	acdc30b56a	Merge branches 'cxgb4', 'misc', 'mlx4', 'nes' and 'qib' into for-next	2010-05-25 09:54:03 -07:00
Roland Dreier	1693395511	IB/mad: Make needlessly global mad_sendq_size/mad_recvq_size static Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-05-23 21:39:31 -07:00
Ralph Campbell	9a6edb60ec	IB/core: Allow device-specific per-port sysfs files Add a new parameter to ib_register_device() so that low-level device drivers can pass in a pointer to a callback function that will be called for each port that is registered in sysfs. This allows low-level device drivers to create files in /sys/class/infiniband/<hca>/ports/<N>/ without having to poke through the internals of the RDMA sysfs handling. There is no need for an unregister function since the kobject reference will go to zero when ib_unregister_device() is called. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-05-21 10:34:44 -07:00
Roland Dreier	ffebedb7ab	Merge branches 'amso1100', 'bkl', 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'iser', 'masked-atomics', 'misc', 'mthca' and 'nes' into for-next	2010-05-15 20:06:01 -07:00
Julia Lawall	9893e742a0	IB/core: Use kmemdup() instead of kmalloc()+memcpy() Use kmemdup when some other buffer is immediately copied into the allocated region. A simplified version of the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; statement S; @@ - to = $kmalloc\\|kzalloc$(size,flag); + to = kmemdup(from,size,flag); if (to==NULL \|\| ...) S - memcpy(to, from, size); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-05-15 20:05:07 -07:00
Tetsuo Handa	5d7220e8dc	RDMA/cma: Randomize local port allocation Randomize local port allocation in the way sctp_get_port_local() does. Update rover at the end of loop since we're likely to pick a valid port on the first try. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-04-21 16:18:40 -07:00
Roland Dreier	bc1db9af73	IB: Explicitly rule out llseek to avoid BKL in default_llseek() Several RDMA user-access drivers have file_operations structures with no .llseek method set. None of the drivers actually do anything with f_pos, so this means llseek is essentially a NOP, instead of returning an error as leaving other file_operations methods unimplemented would do. This is mostly harmless, except that a NULL .llseek means that default_llseek() is used, and this function grabs the BKL, which we would like to avoid. Since llseek does nothing useful on these files, we would like it to return an error to userspace instead of silently grabbing the BKL and succeeding. For nearly all of the file types, we take the belt-and-suspenders approach of setting the .llseek method to no_llseek and also calling nonseekable_open(); the exception is the uverbs_event files, which are created with anon_inode_getfile(), which already sets f_mode the same way as nonseekable_open() would. This work is motivated by Arnd Bergmann's bkl-removal tree. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-04-21 12:17:38 -07:00
Linus Torvalds	0eddb519b9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mlx4: Check correct variable for allocation failure RDMA/nes: Correct cap.max_inline_data assignment in nes_query_qp() RDMA/cm: Set num_paths when manually assigning path records IB/cm: Fix device_create() return value check	2010-04-09 11:53:06 -07:00
Roland Dreier	5091b35388	Merge branches 'cma', 'misc', 'mlx4' and 'nes' into for-linus	2010-04-09 09:14:21 -07:00
Sean Hefty	ae2d9293d7	RDMA/cm: Set num_paths when manually assigning path records When manually assigning the path records to use for a connection, save the number of paths that were set. Otherwise, checks against num_path will show 0, even though path record data is available. This was discovered by manually setting the path records from user space, then querying the kernel to see if the correct path records were assigned, only to discover that the kernel returned 0 path records to the query. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-04-07 14:13:22 -07:00
Jani Nikula	3e340c05c0	IB/cm: Fix device_create() return value check Use IS_ERR() instead of comparing to NULL. Signed-off-by: Jani Nikula <ext-jani.1.nikula@nokia.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-03-31 14:26:52 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Greg Kroah-Hartman	21e3bde964	sysfs: fix sysfs lockdep warning in infiniband code This fixes a sysfs lockdep warning in the infiniband code. Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-19 07:12:13 -07:00
Linus Torvalds	122ce878dc	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/nes: Fix CX4 link problem in back-to-back configuration RDMA/nes: Clear stall bit before destroying NIC QP RDMA/nes: Set assume_aligned_header bit RDMA/cxgb3: Wait at least one schedule cycle during device removal IB/mad: Ignore iWARP devices on device removal IPoIB: Include return code in trace message for ib_post_send() failures IPoIB: Fix TX queue lockup with mixed UD/CM traffic	2010-03-13 14:38:31 -08:00
Steve Wise	070e140c4c	IB/mad: Ignore iWARP devices on device removal When an iWARP device is unloaded, the ib_mad module logs errors. It should be ignoring iWARP devices on device removal just like it does on device add. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-03-11 14:00:08 -08:00
Emese Revfy	52cf25d0ab	Driver core: Constify struct sysfs_ops in struct kobj_type Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-07 17:04:49 -08:00
Andi Kleen	0933e2d98d	driver core: Convert some drivers to CLASS_ATTR_STRING Convert some drivers who export a single string as class attribute to the new class_attr_string functions. This removes redundant code all over. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-07 17:04:48 -08:00
Andi Kleen	28812fe11a	driver-core: Add attribute argument to class_attribute show/store Passing the attribute to the low level IO functions allows all kinds of cleanups, by sharing low level IO code without requiring an own function for every piece of data. Also drivers can extend the attributes with own data fields and use that in the low level function. This makes the class attributes the same as sysdev_class attributes and plain attributes. This will allow further cleanups in drivers. Full tree sweep converting all users. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-07 17:04:48 -08:00
Akinobu Mita	19b629f581	infiniband: use for_each_set_bit() Replace open-coded loop with for_each_set_bit(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-06 11:26:23 -08:00
Linus Torvalds	0f2cc4ecd8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) init: Open /dev/console from rootfs mqueue: fix typo "failues" -> "failures" mqueue: only set error codes if they are really necessary mqueue: simplify do_open() error handling mqueue: apply mathematics distributivity on mq_bytes calculation mqueue: remove unneeded info->messages initialization mqueue: fix mq_open() file descriptor leak on user-space processes fix race in d_splice_alias() set S_DEAD on unlink() and non-directory rename() victims vfs: add NOFOLLOW flag to umount(2) get rid of ->mnt_parent in tomoyo/realpath hppfs can use existing proc_mnt, no need for do_kern_mount() in there Mirror MS_KERNMOUNT in ->mnt_flags get rid of useless vfsmount_lock use in put_mnt_ns() Take vfsmount_lock to fs/internal.h get rid of insanity with namespace roots in tomoyo take check for new events in namespace (guts of mounts_poll()) to namespace.c Don't mess with generic_permission() under ->d_lock in hpfs sanitize const/signedness for udf nilfs: sanitize const/signedness in dealing with ->d_name.name ... Fix up fairly trivial (famous last words...) conflicts in drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c	2010-03-04 08:15:33 -08:00
Al Viro	b1e4594ba0	switch infiniband uverbs to anon_inodes Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:07:27 -05:00
Roland Dreier	fe8875e5a4	Merge branch 'misc' into for-next Conflicts: drivers/infiniband/core/uverbs_main.c	2010-03-01 23:52:31 -08:00
Roland Dreier	a265e5587f	IB/uverbs: Use anon_inodes instead of private infinibandeventfs The anon_inodes interface has been split to allow creating a bare (non-installed) file pointer and also extended to allow specifying O_RDONLY in the flags. This makes it a suitable replacement for the private "infinibandeventfs" pseudo-filesystem used by uverbs, and this replacement saves a small chunk of boilerplate code. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 16:51:20 -08:00
Eli Cohen	920d706c89	IB/core: Fix and clean up ib_ud_header_init() ib_ud_header_init() first clears header and then fills up the various fields. Later on, it tests header->immediate_present, which it has already cleared, so the condition is always false. Fix this by adding an immediate_present parameter and setting header->immediate_present as is done with grh_present. Also remove unused calculation of header_len. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 14:54:10 -08:00
Alexander Chiang	cdb8e43889	IB/ucm: Clean whitespace errors As shown when 'let c_space_errors=1' is set in vim. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:48 -08:00
Alexander Chiang	daa913580e	IB/ucm: Increase maximum devices supported Some large systems may support more than IB_UCM_MAX_DEVICES (currently 32). This change allows us to support more devices in a backwards-compatible manner. the first IB_UCM_MAX_DEVICES keep the same major/minor device numbers they've always had. If there are more than IB_UCM_MAX_DEVICES, then we dynamically request a new major device number (new minors start at 0). Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:48 -08:00
Alexander Chiang	31d14b6e10	IB/ucm: Use stack variable 'base' in ib_ucm_add_one This change is not useful by itself, but sets us up for a future change that allows us to support more than IB_UCM_MAX_DEVICES. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:47 -08:00
Alexander Chiang	dd08f702dd	IB/ucm: Use stack variable 'devnum' in ib_ucm_add_one This change is not useful by itself, but sets us up for a future change that allows us to dynamically allocate device numbers in case we have more than IB_UCM_MAX_DEVICES in the system. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:47 -08:00
Alexander Chiang	d3f2c67f2d	IB/umad: Clean whitespace Clean errors as shown when 'let c_space_errors=1' is set in vim. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:46 -08:00
Alexander Chiang	8698d3fecc	IB/umad: Increase maximum devices supported Some large systems may support more than IB_UMAD_MAX_PORTS (currently 64). This change allows us to support more ports in a backwards-compatible manner. The first IB_UMAD_MAX_PORTS keep the same major/minor device numbers they've always had. If there are more than IB_UMAD_MAX_PORTS, we then dynamically request a new major device number (new minors start at 0). Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:45 -08:00
Alexander Chiang	dc2ed5e3c9	IB/umad: Use stack variable 'base' in ib_umad_init_port This change is not useful by itself, but sets us up for a future change that allows us to support more than IB_UMAD_MAX_PORTS in a system. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:45 -08:00
Alexander Chiang	d451b8df9f	IB/umad: Use stack variable 'devnum' in ib_umad_init_port This change is not useful by itself, but sets us up for a future change that allows us to dynamically allocate device numbers in case we have more than IB_UMAD_MAX_PORTS in the system. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:44 -08:00
Alexander Chiang	6aa2a86ec4	IB/umad: Remove port_table[] We no longer need this data structure, as it was used to associate an inode back to a struct ib_umad_port during ->open(). But now that we're embedding a struct cdev in struct ib_umad_port, we can use the container_of() macro to go from the inode back to the device instead. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:44 -08:00
Alexander Chiang	2b937afcab	IB/umad: Convert *cdev to cdev in struct ib_umad_port Instead of storing pointers to cdev and sm_cdev, embed the full structures instead. This change allows us to use the container_of() macro in ib_umad_open() and ib_umad_sm_open() in a future patch. This change increases the size of struct ib_umad_port to 320 bytes from 128. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:43 -08:00
Alexander Chiang	9afed76d59	IB/uverbs: Whitespace cleanup Clean up the errors as shown when 'let c_space_errors=1' is set in vim. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:42 -08:00
Alexander Chiang	830a387138	IB/uverbs: Pack struct ib_uverbs_event_file tighter Eliminate some padding in the structure by rearranging the members. sizeof(struct ib_uverbs_event_file) is now 72 bytes (from 80) and more members now fit in the first cacheline. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:42 -08:00
Alexander Chiang	6d6a0e71ee	IB/uverbs: Increase maximum devices supported Some large systems may support more than IB_UVERBS_MAX_DEVICES (currently 32). This change allows us to support more devices in a backwards-compatible manner. The first IB_UVERBS_MAX_DEVICES keep the same major/minor device numbers that they've always had. If there are more than IB_UVERBS_MAX_DEVICES, we then dynamically request a new major device number (new minors start at 0). This change increases the maximum number of HCAs to 64 (from 32). Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:41 -08:00
Alexander Chiang	ddbd688301	IB/uverbs: use stack variable 'base' in ib_uverbs_add_one This change is not useful by itself, but sets us up for a future change that allows us to support more than IB_UVERBS_MAX_DEVICES in a system. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:40 -08:00
Alexander Chiang	38707980c4	IB/uverbs: Use stack variable 'devnum' in ib_uverbs_add_one This change is not useful by itself, but it sets us up for a future change that allows us to dynamically allocate device numbers in case we have more than IB_UVERBS_MAX_DEVICES in the system. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:40 -08:00
Alexander Chiang	2a72f21226	IB/uverbs: Remove dev_table dev_table's raison d'etre was to associate an inode back to a struct ib_uverbs_device. However, now that we've converted ib_uverbs_device to contain an embedded cdev (instead of a *cdev), we can use the container_of() macro and cast back to the containing device. There's no longer any need for dev_table, so get rid of it. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:39 -08:00
Alexander Chiang	055422ddbb	IB/uverbs: Convert *cdev to cdev in struct ib_uverbs_device Instead of storing a pointer to a cdev, embed the entire struct cdev. This change allows us to use the container_of() macro in ib_uverbs_open() in a future patch. This change increases the size of struct ib_uverbs_device to 168 bytes across 3 cachelines from 80 bytes in 2 cachelines. However, we rearrange the members so that everything fits into the first cacheline except for the struct cdev. Finally, we don't touch the cdev in any fastpaths, so this change shouldn't negatively affect performance. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:39 -08:00
Jiri Slaby	ccbe9f0b11	RDMA: Use rlimit helpers Make sure compiler won't do weird things with limits by using the rlimit helpers added in `3e10e716` ("resource: add helpers for fetching rlimits"). E.g. fetching them twice may return 2 different values after writable limits are implemented. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-11 15:40:48 -08:00
Sean Hefty	8523c04809	RDMA/cm: Revert association of an RDMA device when binding to loopback Revert the following change from commit `6f8372b6` ("RDMA/cm: fix loopback address support") The defined behavior of rdma_bind_addr is to associate an RDMA device with an rdma_cm_id, as long as the user specified a non- zero address. (ie they weren't just trying to reserve a port) Currently, if the loopback address is passed to rdma_bind_addr, no device is associated with the rdma_cm_id. Fix this. It turns out that important apps such as Open MPI depend on rdma_bind_addr() NOT associating any RDMA device when binding to a loopback address. Open MPI is being updated to deal with this, but at least until a new Open MPI release is available, maintain the previous behavior: allow rdma_bind_addr() to succeed, but do not bind to a device. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-10 12:00:48 -08:00
Robert P. J. Day	fd4582a399	IB/addr: Correct CONFIG_IPv6 to CONFIG_IPV6 Correct misspelled "CONFIG_IPv6" that was introduced in commit `d14714df` ("IB/addr: Fix IPv6 routing lookup"). The config variable should be all uppercase. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> [ This was my fault when I munged the original patch. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-01-06 13:16:30 -08:00
Linus Torvalds	bac5e54c29	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits) direct I/O fallback sync simplification ocfs: stop using do_sync_mapping_range cleanup blockdev_direct_IO locking make generic_acl slightly more generic sanitize xattr handler prototypes libfs: move EXPORT_SYMBOL for d_alloc_name vfs: force reval of target when following LAST_BIND symlinks (try #7) ima: limit imbalance msg Untangling ima mess, part 3: kill dead code in ima Untangling ima mess, part 2: deal with counters Untangling ima mess, part 1: alloc_file() O_TRUNC open shouldn't fail after file truncation ima: call ima_inode_free ima_inode_free IMA: clean up the IMA counts updating code ima: only insert at inode creation time ima: valid return code from ima_inode_alloc fs: move get_empty_filp() deffinition to internal.h Sanitize exec_permission_lite() Kill cached_lookup() and real_lookup() Kill path_lookup_open() ... Trivial conflicts in fs/direct-io.c	2009-12-16 12:04:02 -08:00
Al Viro	2c48b9c455	switch alloc_file() to passing struct path ... and have the caller grab both mnt and dentry; kill leak in infiniband, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:42 -05:00
Roland Dreier	14f369d1d6	Merge branches 'amso1100', 'cma', 'cxgb3', 'ehca', 'ipath', 'ipoib', 'iser', 'misc', 'mlx4' and 'nes' into for-next	2009-12-15 23:39:25 -08:00
Roel Kluin	df42245a3c	IB/uverbs: Fix return of PTR_ERR() of wrong pointer in ib_uverbs_get_context() Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-12-09 14:30:44 -08:00
Sean Hefty	d14714df61	IB/addr: Fix IPv6 routing lookup Include link scope as part of address resolution. Combine local and remote address resolution into a single, simpler code path. Fix error checking in the IPv6 routing lookups. Based on work from: David Wilder <dwilder@us.ibm.com> Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ Fix up cma_check_linklocal() for !IPV6 case. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 16:46:25 -08:00
Sean Hefty	923c100ef0	IB/addr: Simplify resolving IPv4 addresses Merge resolve local/remote address resolution into a single data flow to ensure consistent access and use of the local routing tables. Based on work from: David Wilder <dwilder@us.ibm.com> Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 13:26:51 -08:00
Sean Hefty	6f8372b69c	RDMA/cm: fix loopback address support The RDMA CM is intended to support the use of a loopback address when establishing a connection; however, the behavior of the CM when loopback addresses are used is confusing and does not always work, depending on whether loopback was specified by the server, the client, or both. The defined behavior of rdma_bind_addr is to associate an RDMA device with an rdma_cm_id, as long as the user specified a non- zero address. (ie they weren't just trying to reserve a port) Currently, if the loopback address is passed to rdam_bind_addr, no device is associated with the rdma_cm_id. Fix this. If a loopback address is specified by the client as the destination address for a connection, it will fail to establish a connection. This is true even if the server is listing across all addresses or on the loopback address itself. The issue is that the server tries to translate the IP address carried in the REQ message to a local net_device address, which fails. The translation is not needed in this case, since the REQ carries the actual HW address that should be used. Finally, cleanup loopback support to be more transport neutral. Replace separate calls to get/set the sgid and dgid from the device address to a single call that behaves correctly depending on the format of the device address. And support both IPv4 and IPv6 address formats. Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ Fixed RDS build by s/ib_addr_get/rdma_addr_get/ - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 13:26:06 -08:00
Sean Hefty	c4315d85f9	IB/addr: Store net_device type instead of translating to RDMA transport The struct rdma_dev_addr stores net_device address information: the source device address, destination hardware address, and broadcast address. For consistency, store the net_device type rather than converting it to the rdma_node_type. The type indicates the format of the various hardware addresses, which is what we're concerned with, and not the RDMA node type that the address may map to. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 12:57:18 -08:00
Sean Hefty	d2e0886245	IB/addr: Verify source and destination address families match If a source address is provided, verify that the address family matches that of the destination address. If the source is not specified, use the same address family as the destination. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 12:55:22 -08:00
Sean Hefty	6266ed6e41	RDMA/cma: Replace net_device pointer with index Provide the device interface when resolving route information to ensure that the correct outbound device is used. This will also simplify processing of sin6_scope_id for IPv6 support. Based on work from: David Wilder <dwilder@us.ibm.com> Jason Gunthorpe <jgunthrope@obsidianresearch.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 12:55:22 -08:00
Jason Gunthorpe	e2e626972e	RDMA/cma: Fix AF_INET6 support in multicast joining If joining to an AF_INET6 address, we need to map the address to a MGID in the same way as the IP stack. The old code would just fall through to the IPv4 case and generate garbage. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 12:55:22 -08:00
Jason Gunthorpe	1c9b281997	RDMA/cma: Correct detection of SA Created MGID RDMA CM treats AF_INET6 addresses that are either 0 or prefixed with FF1x:A01B::/32 as MGIDs, but the detection for the prefix was buggy; fix it up. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 12:55:21 -08:00
Eric Dumazet	0f9ea5d2ab	RDMA/addr: Use appropriate locking with for_each_netdev() for_each_netdev() should be used with RTNL or dev_base_lock held, or else we risk a crash. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-18 14:24:34 -08:00
Sean Hefty	a7ca1f00ed	RDMA/ucma: Add option to manually set IB path Export rdma_set_ib_paths to user space to allow applications to manually set the IB path used for connections. This allows alternative ways for a user space application or library to obtain path record information, including retrieving path information from cached data, avoiding direct interaction with the IB SA. The IB SA is a single, centralized entity that can limit scaling on large clusters running MPI applications. Future changes to the rdma cm can expand on this framework to support the full range of features allowed by the IB CM, such as separate forward and reverse paths and APM. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-16 09:30:33 -08:00
Alexey Dobriyan	d43c36dc6b	headers: remove sched.h from interrupt.h After m68k's task_thread_info() doesn't refer to current, it's possible to remove sched.h from interrupt.h and not break m68k! Many thanks to Heiko Carstens for allowing this. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-10-11 11:20:58 -07:00
David J. Wilder	85f20b39fd	RDMA/addr: Fix resolution of local IPv6 addresses This patch allows a local IPv6 address to be resolved by rdma_cm. To reproduce the problem: $ rping -s -v -a ::0 & $ rping -c -v -a <IPv6 address local to this system> rdma_resolve_addr error -1 Local IPv6 address was obtained with "ip addr show ib0" Addresses: https://bugs.openfabrics.org/show_bug.cgi?id=1759 Signed-off-by: David Wilder <dwilder@us.ibm.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-10-07 16:03:18 -07:00
Steve Wise	54e05f15cc	RDMA/iwcm: Don't call provider reject func with irqs disabled In commit `cb58160e` ("RDMA/iwcm: Reject the connection when the cm_id is destroyed") a call to the provider's reject handler was added to destroy_cm_id() to fix a provider endpoint leak. This call needs to be done with interrupts enabled. So unlock and relock around this call. This is safe because: 1) the provider will do nothing with this endpoint until the iwcm either accepts or rejects. 2) the lock is only released after the iwcm state is changed, so an errant iwcm app that is destroying -and- rejecting the connection concurrently will get a failure on one of the calls. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-10-07 15:38:12 -07:00
Alexey Dobriyan	a99bbaf5ee	headers: remove sched.h from poll.h Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-04 15:05:10 -07:00
Roland Dreier	0e442afd92	IB/mad: Fix lock-lock-timer deadlock in RMPP code Holding agent->lock across cancel_delayed_work() (which does del_timer_sync()) in ib_cancel_rmpp_recvs() leads to lockdep reports of possible lock-timer deadlocks if a consumer ever does something that connects agent->lock to a lock taken in IRQ context (cf http://marc.info/?l=linux-rdma&m=125243699026045). Fix this by changing the list items to a new state "CANCELING" while holding the lock, and then canceling the delayed work without holding the lock. If the delayed work runs after the lock is dropped, it will see the state is CANCELING and return immediately, so the list will stay stable while we traverse it with the lock not held. Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-09-23 11:10:15 -07:00
Roland Dreier	73f526da02	Merge branch 'mad' into for-linus Conflicts: drivers/infiniband/core/mad.c	2009-09-10 21:19:45 -07:00
Steve Wise	cb58160e72	RDMA/iwcm: Reject the connection when the cm_id is destroyed If the cm_id of a connect request is destroyed prior to the ULP accepting or rejecting the connection, then the provider never cleans up the connection. The iwcm should explicitly reject these connections if the cm_id is destroyed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-09-09 11:37:38 -07:00
Hal Rosenstock	b76aabc395	IB/mad: Allow tuning of QP0 and QP1 sizes MADs are UD and can be dropped if there are no receives posted, so allow receive queue size to be set with a module parameter in case the queue needs to be lengthened. Send side tuning is done for symmetry with receive. Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-09-07 08:28:48 -07:00
Roland Dreier	6b2eef8fd7	IB/mad: Fix possible lock-lock-timer deadlock Lockdep reported a possible deadlock with cm_id_priv->lock, mad_agent_priv->lock and mad_agent_priv->timed_work.timer; this happens because the mad module does cancel_delayed_work(&mad_agent_priv->timed_work); while holding mad_agent_priv->lock. cancel_delayed_work() internally does del_timer_sync(&mad_agent_priv->timed_work.timer). This can turn into a deadlock because mad_agent_priv->lock is taken inside cm_id_priv->lock, so we can get the following set of contexts that deadlock each other: A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock B: holding mad_agent_priv->lock, waiting for del_timer_sync() C: interrupt during mad_agent_priv->timed_work.timer that takes cm_id_priv->lock Fix this by using the new __cancel_delayed_work() interface (which internally does del_timer() instead of del_timer_sync()) in all the places where we are holding a lock. Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757 Reported-by: Bart Van Assche <bart.vanassche@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-09-07 08:27:50 -07:00
Jack Morgenstein	b1b8afb833	IB/uverbs: Return ENOSYS for unimplemented commands (not EINVAL) Since the original commit `883a99c7` ("[IB] uverbs: Add a mask of device methods allowed for userspace"), the uverbs core returns EINVAL for commands not implemented by a specific low-level driver. This creates a problem that there is no way to tell the difference between an unimplemented command and an implemented one which is incorrectly invoked (which also returns EINVAL). The fix is to have unimplemented commands return ENOSYS. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-09-05 20:24:24 -07:00
Yossi Etigin	e1d7806df3	IB/core: Fix send multicast group leave retry Until now, retries were only sent when joining a multicast group. This patch will adds retries when leaving a multicast group as well. Signed-off-by: Ron Livne <ronli@voltaire.com> Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-09-05 20:24:24 -07:00
Roland Dreier	6276e08a9b	IB: Use DEFINE_SPINLOCK() for static spinlocks Rather than just defining static spinlock_t variables and then initializing them later in init functions, simply define them with DEFINE_SPINLOCK() and remove the calls to spin_lock_init(). This cleans up the source a tad and also shrinks the compiled code; eg on x86-64: add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40) function old new delta ib_uverbs_init 336 326 -10 ib_mad_init_module 147 137 -10 ib_sa_init 123 103 -20 Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-09-05 20:24:23 -07:00
Roland Dreier	60f2b652f5	IB/mad: Check hop count field in directed route MAD to avoid array overflow The hop count field in a directed route MAD is only allowed to be in the range 0 to 63 (by spec). Check that this really is the case to avoid accessing outside the bounds of the hop array. Reported-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-09-05 20:24:10 -07:00
Peter Huewe	716abb1fdf	RDMA: Add __init/__exit macros to addr.c and cma.c Add __init and __exit annotations to the module_init/module_exit functions from drivers/infiniband/core/addr.c and cma.c. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-06-23 10:38:42 -07:00
Greg Kroah-Hartman	3f7c58a05f	infiniband: remove driver_data direct access of struct device In the near future, the driver core is going to not allow direct access to the driver_data pointer in struct device. Instead, the functions dev_get_drvdata() and dev_set_drvdata() should be used. These functions have been around since the beginning, so are backwards compatible with all older kernel versions. Cc: general@lists.openfabrics.org Cc: Roland Dreier <rolandd@cisco.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:26 -07:00
Yossi Etigin	d2ca39f262	RDMA/cma: Create cm id even when IB port is down When doing rdma_resolve_addr(), if the relevant IB port is down, the function fails and the cm_id is not bound to the correct device. Therefore, application does not have a device handle and cannot wait for the port to become active. The function fails because the underlying IPoIB interface is not joined to the broadcast group and therefore the SA does not have a multicast record to take a Q_Key from. The fix is to use lazy Q_Key resolution - cma_set_qkey() will set id_priv->qkey if it was not set, and will be called just before the Q_Key is really required. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-04-08 13:42:33 -07:00
Yossi Etigin	84adeee9aa	RDMA/cma: Use rate from IPoIB broadcast when joining IPoIB multicast groups When joining an IPoIB multicast group, use the same rate as in the broadcast group. Otherwise, if the RDMA CM creates this group before IPoIB does, it might get a different rate. This will cause IPoIB to fail joining to the same group later on, because IPoIB uses strict rate selection. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-04-01 13:55:32 -07:00
Roland Dreier	09f98bafea	Merge branches 'cxgb3', 'endian', 'ipath', 'ipoib', 'iser', 'mad', 'misc', 'mlx4', 'mthca', 'nes' and 'sysfs' into for-next	2009-03-24 20:44:41 -07:00
Roland Dreier	6432f36684	IB: Remove useless ibdev_is_alive() tests from sysfs code Some attribute show functions test ibdev_is_alive() to make sure that it's OK to access device state. However, the sysfs attributes will not be registered until the device is fully initialized, and they'll be unregistered before anything is torn down, so ibdev_is_alive() doesn't do anything useful. Remove it. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-03-04 15:22:39 -08:00
Jack Morgenstein	6b708b3dde	IB/sa_query: Fix AH leak due to update_sm_ah() race Our testing uncovered a race condition in ib_sa_event(): spin_lock_irqsave(&port->ah_lock, flags); if (port->sm_ah) kref_put(&port->sm_ah->ref, free_sm_ah); port->sm_ah = NULL; spin_unlock_irqrestore(&port->ah_lock, flags); schedule_work(&sa_dev->port[event->element.port_num - sa_dev->start_port].update_task); If two events occur back-to-back (e.g., client-reregister and LID change), both may pass the spinlock-protected code above before the scheduled work updates the port->sm_ah handle. Then if the scheduled work ends up running twice, the second operation will then find a non-NULL port->sm_ah, and will simply overwrite it in update_sm_ah -- resulting in an AH leak. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-03-03 14:30:01 -08:00
Ralph Campbell	4780c1953f	IB/mad: Fix ib_post_send_mad() returning 0 with no generate send comp If ib_post_send_mad() returns 0, the API guarantees that there will be a callback to send_buf->mad_agent->send_handler() so that the sender can call ib_free_send_mad(). Otherwise, the ib_mad_send_buf will be leaked and the mad_agent reference count will never go to zero and the IB device module cannot be unloaded. The above can happen without this patch if process_mad() returns (IB_MAD_RESULT_SUCCESS \| IB_MAD_RESULT_CONSUMED). If process_mad() returns IB_MAD_RESULT_SUCCESS and there is no agent registered to receive the mad being sent, handle_outgoing_dr_smp() returns zero which causes a MAD packet which is at the end of the directed route to be incorrectly sent on the wire but doesn't cause a hang since the HCA generates a send completion. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-03-03 14:22:17 -08:00
Ralph Campbell	d9620a4c82	IB/mad: initialize mad_agent_priv before putting on lists There is a potential race in ib_register_mad_agent() where the struct ib_mad_agent_private is not fully initialized before it is added to the list of agents per IB port. This means the ib_mad_agent_private could be seen before the refcount, spin locks, and linked lists are initialized. The fix is to initialize the structure earlier. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-02-27 14:44:32 -08:00
Ralph Campbell	1d9bc6d648	IB/mad: Fix null pointer dereference in local_completions() handle_outgoing_dr_smp() can queue a struct ib_mad_local_private *local on the mad_agent_priv->local_work work queue with local->mad_priv == NULL if device->process_mad() returns IB_MAD_RESULT_SUCCESS \| IB_MAD_RESULT_REPLY and (!ib_response_mad(&mad_priv->mad.mad) \|\| !mad_agent_priv->agent.recv_handler). In this case, local_completions() will be called with local->mad_priv == NULL. The code does check for this case and skips calling recv_mad_agent->agent.recv_handler() but recv == 0 so kmem_cache_free() is called with a NULL pointer. Also, since recv isn't reinitialized each time through the loop, it can cause a memory leak if recv should have been zero. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>	2009-02-27 10:34:30 -08:00
Roland Dreier	9206dff157	IB: Remove sysfs files before unregistering device Move the ib_device_unregister_sysfs() call from ib_dealloc_device() to ib_unregister_device(). The old code allows device unregister to proceed even if some sysfs files are open, which leaves a window where userspace can open a file before a device is removed but then end up reading the file after the device is removed, which leads to various kernel crashes either because the device data structure is freed or because the low-level driver code is gone after module removal. By not returning from ib_unregister_device() until after all sysfs entries are removed, we make sure that data structures and/or module code is not freed until after all sysfs access is done. Reported-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-02-25 13:27:46 -08:00
Harvey Harrison	9c3da09917	IB: Remove __constant_{endian} uses The base versions handle constant folding just fine, use them directly. The replacements are OK in the include/ files as they are not exported to userspace so we don't need the __ prefixed versions. This patch does not affect code generation at all. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-01-17 17:11:57 -08:00
Kay Sievers	d927e38c6c	infiniband: struct device - replace bus_id with dev_name(), dev_set_name() Acked-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-01-06 10:44:39 -08:00
Roland Dreier	2c4ab6243f	RDMA/addr: Fix build breakage when IPv6 is disabled Commit `38617c64` ("RDMA/addr: Add support for translating IPv6 addresses") broke the build when CONFIG_IPV6=n, because the ib_addr module unconditionally attempted to call ipv6_chk_addr() and other IPv6 functions that are not defined when IPv6 is disabled. Fix this by only building IPv6 support if CONFIG_IPV6 is turned on, and add a Kconfig dependency to prevent the ib_addr code from being built in when IPv6 is built modular. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-12-29 23:37:14 -08:00
Linus Torvalds	0191b625ca	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits) net: Allow dependancies of FDDI & Tokenring to be modular. igb: Fix build warning when DCA is disabled. net: Fix warning fallout from recent NAPI interface changes. gro: Fix potential use after free sfc: If AN is enabled, always read speed/duplex from the AN advertising bits sfc: When disabling the NIC, close the device rather than unregistering it sfc: SFT9001: Add cable diagnostics sfc: Add support for multiple PHY self-tests sfc: Merge top-level functions for self-tests sfc: Clean up PHY mode management in loopback self-test sfc: Fix unreliable link detection in some loopback modes sfc: Generate unique names for per-NIC workqueues 802.3ad: use standard ethhdr instead of ad_header 802.3ad: generalize out mac address initializer 802.3ad: initialize ports LACPDU from const initializer 802.3ad: remove typedef around ad_system 802.3ad: turn ports is_individual into a bool 802.3ad: turn ports is_enabled into a bool 802.3ad: make ntt bool ixgbe: Fix set_ringparam in ixgbe to use the same memory pools. ... Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due to the conversion to %pI (in this networking merge) and the addition of doing IPv6 addresses (from the earlier merge of CIFS).	2008-12-28 12:49:40 -08:00
Aleksey Senin	1f5175adea	RDMA/cma: Add IPv6 support Handle AF_INET6 cases where required, and use struct sockaddr_storage wherever an IPv6 address might be stored. Signed-off-by: Aleksey Senin <aleksey@alst60.(none)> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-12-24 10:16:45 -08:00
Aleksey Senin	38617c64bf	RDMA/addr: Add support for translating IPv6 addresses Add support for translating AF_INET6 addresses to the IB address translation service. This requires using struct sockaddr_storage instead of struct sockaddr wherever an IPv6 address might be stored, and adding cases to handle IPv6 in addition to IPv4 to the various translation functions. Signed-off-by: Aleksey Senin <aleksey@alst60.(none)> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-12-24 10:16:37 -08:00
David S. Miller	9eeda9abd1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath5k/base.c net/8021q/vlan_core.c	2008-11-06 22:43:03 -08:00
Al Viro	233e70f422	saner FASYNC handling on file close As it is, all instances of ->release() for files that have ->fasync() need to remember to evict file from fasync lists; forgetting that creates a hole and we actually have a bunch that does forget. So let's keep our lives simple - let __fput() check FASYNC in file->f_flags and call ->fasync() there if it's been set. And lose that crap in ->release() instances - leaving it there is still valid, but we don't have to bother anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-01 09:49:46 -07:00
Harvey Harrison	5b095d9892	net: replace %p6 with %pI6 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-29 12:52:50 -07:00
Harvey Harrison	8867cd7c86	infiniband: use %p6 for printing message ids Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-28 23:02:35 -07:00
Linus Torvalds	724bdd097e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/ehca: Reject dynamic memory add/remove when ehca adapter is present IB/ehca: Fix reported max number of QPs and CQs in systems with >1 adapter IPoIB: Set netdev offload features properly for child (VLAN) interfaces IPoIB: Clean up ethtool support mlx4_core: Add Ethernet PCI device IDs mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC mlx4_core: Multiple port type support mlx4_core: Ethernet MAC/VLAN management mlx4_core: Get ethernet MTU and default address from firmware mlx4_core: Support multiple pre-reserved QP regions Update NetEffect maintainer emails to Intel emails RDMA/cxgb3: Remove cmid reference on tid allocation failures IB/mad: Use krealloc() to resize snoop table IPoIB: Always initialize poll_timer to avoid crash on unload IB/ehca: Don't allow creating UC QP with SRQ mlx4_core: Add QP range reservation support RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR()	2008-10-23 08:16:03 -07:00
Roland Dreier	56f2fdaade	Merge branches 'cma', 'cxgb3', 'ehca', 'ipoib', 'mad', 'mlx4' and 'nes' into for-next	2008-10-22 15:56:41 -07:00
Parag Warudkar	01e8ef11bc	x86: sysfs: kill owner field from attribute Tejun's commit `7b595756ec` made sysfs attribute->owner unnecessary. But the field was left in the structure to ease the merge. It's been over a year since that change and it is now time to start killing attribute->owner along with its users - one arch at a time! This patch is attempt #1 to get rid of attribute->owner only for CONFIG_X86_64 or CONFIG_X86_32 . We will deal with other arches later on as and when possible - avr32 will be the next since that is something I can test. Compile (make allyesconfig / make allmodconfig / custom config) and boot tested. akpm: the idea is that we put the declaration of sttribute.owner inside `#ifndef CONFIG_X86'. But that proved to be too ambitious for now because new usages kept on turning up in subsystem trees. [akpm: remove the ifdef for now] Signed-off-by: Parag Warudkar <parag.lkml@gmail.com> Cc: Greg KH <greg@kroah.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tejun Heo <htejun@gmail.com> Cc: Len Brown <lenb@kernel.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: David Brownell <david-b@pacbell.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:42 -07:00
Greg Kroah-Hartman	91bd418fdc	device create: infiniband: convert device_create_drvdata to device_create Now that device_create() has been audited, rename things back to the original call to be sane. Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-10-16 09:24:42 -07:00
Roland Dreier	528051746b	IB/mad: Use krealloc() to resize snoop table Use krealloc() instead of kmalloc() followed by memcpy() when resizing the MAD module's snoop table. Also put parentheses around the new table size to avoid calculating the wrong size to allocate, which fixes a bug pointed out by Haven Hash <haven.hash@isilon.com>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-10-14 14:05:36 -07:00
Julien Brunel	6aea938f54	RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR() In case of error, the function ucma_alloc_multicast() returns a NULL pointer, but never returns an ERR pointer. So after a call to this function, an IS_ERR test should be replaced by a NULL test. The semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @match bad_is_err_test@ expression x, E; @@ x = ucma_alloc_multicast(...) ... when != x = E IS_ERR(x) // </smpl> Signed-off-by: Julien Brunel <brunel@diku.dk> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-10-10 12:00:19 -07:00
Roland Dreier	eedd5d0a70	Merge branches 'cma', 'cxgb3', 'ehca', 'ipath', 'ipoib', 'mad', 'misc', 'mlx4', 'mthca' and 'nes' into for-next	2008-10-09 17:41:15 -07:00
Hefty, Sean	a7e80ce26c	IB/cm: Correctly free cm_device structure commit `110cf374` ("infiniband: make cm_device use a struct device and not a kobject.") introduced a memory leak, since it deleted cm_release_dev_obj(), which was where cm_dev was freed. Fix this by freeing the leaked structure after calling device_unregister(). Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-09-30 10:36:54 -07:00
Michael Brooks	7097228c54	IB/mad: Don't discard BMA responses in kernel This fixes the problem of incoming BMA responses being dropped due to a bad "is response" check. Fix the test to use the ib_response_mad() predicate, which correctly handles BMA MADs. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=988>. Signed-off-by: Michael Brooks <michael.brooks@qlogic.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-09-20 20:06:16 -07:00
Roland Dreier	06a91a02e9	Merge branches 'cma', 'cxgb3', 'ipath', 'ipoib', 'mad' and 'mlx4' into for-linus	2008-08-07 14:12:03 -07:00
Julien Brunel	cd55ef5a10	IB/mad: Test ib_create_send_mad() return with IS_ERR(), not == NULL In case of error, the function ib_create_send_mad() returns an ERR pointer, but never returns a NULL pointer. So testing the return value for error should be done with IS_ERR, not by comparing with NULL. A simplified version of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @correct_null_test@ expression x,E; statement S1, S2; @@ x = ib_create_send_mad(...) <... when != x = E if ( ( - x@p2 != NULL + ! IS_ERR ( x ) \| - x@p2 == NULL + IS_ERR( x ) ) ) S1 else S2 ...> ? x = E; // </smpl> Signed-off-by: Julien Brunel <brunel@diku.dk> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-07 14:11:56 -07:00
Roland Dreier	3f44675439	RDMA/cma: Remove padding arrays by using struct sockaddr_storage There are a few places where the RDMA CM code handles IPv6 by doing struct sockaddr addr; u8 pad[sizeof(struct sockaddr_in6) - sizeof(struct sockaddr)]; This is fragile and ugly; handle this in a better way with just struct sockaddr_storage addr; [ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to switch to struct sockaddr_storage and get rid of padding arrays in struct rdma_addr. ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-04 11:02:14 -07:00
Roland Dreier	5ba18b186c	RDMA/ucm: BKL is not needed for ib_ucm_open() Remove explicit cycle_kernel_lock() call and document why the code is safe. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-24 20:36:59 -07:00
Roland Dreier	f7a6117ee5	RDMA/ucma: BKL is not needed for ucma_open() Remove explicit lock_kernel() calls and document why the code is safe. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-24 20:36:59 -07:00
Linus Torvalds	5c402355ad	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: MAINTAINERS: Remove Glenn Streiff from NetEffect entry mlx4_core: Improve error message when not enough UAR pages are available IB/mlx4: Add support for memory management extensions and local DMA L_Key IB/mthca: Keep free count for MTT buddy allocator mlx4_core: Keep free count for MTT buddy allocator mlx4_code: Add missing FW status return code IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg mlx4_core: Add module parameter to enable QoS support RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one() IB/ehca: Release mutex in error path of alloc_small_queue_page() IB/ehca: Use default value for Local CA ACK Delay if FW returns 0 IB/ehca: Filter PATH_MIG events if QP was never armed IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event	2008-07-24 12:56:07 -07:00
Roland Dreier	2cc177364e	Merge branches 'bkl-removal', 'cma', 'ehca', 'for-2.6.27', 'mlx4', 'mthca' and 'nes' into for-linus	2008-07-24 08:38:47 -07:00
Dotan Barak	1ca8d15619	RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes Remove IB_ACCESS_LOCAL_WRITE from qp.qp_access_flags because this attribute is only used to set remote permissions. Signed-off-by: Dotan Barak <dotanba@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:18:34 -07:00
Ralph Campbell	64b784b583	IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one() If update_sm_ah() fails, it leaves the port's sm_ah as NULL. Then if the device or module is removed, ib_sa_remove_one() will dereference a NULL pointer when it calls kref_put(). Fix this by testing if sm_ah is NULL before dropping the reference. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:18:33 -07:00
Amir Vadai	38ca83a588	RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event Consumers that want to re-use their QPs in new connections need to know when the QP has exited the timewait state. Report the timewait event through the rdma_cm. Signed-off-by: Amir Vadai <amirv@mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:14:23 -07:00
Or Gerlitz	dd5bdff83b	RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm consumers that wish to have their RDMA sessions always use the same links (eg <hca/port>) as the IP stack does. In the current code, this does not happen when bonding is used and fail-over happened but the IB link used by an already existing session is operating fine. Use the netevent notification for sensing that a change has happened in the IP stack, then scan the rdma-cm ID list to see if there is an ID that is "misaligned" with respect to the IP stack, and deliver RDMA_CM_EVENT_ADDR_CHANGE for this ID. The consumer can act on the event or just ignore it. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:14:22 -07:00
Greg Kroah-Hartman	110cf374a8	infiniband: make cm_device use a struct device and not a kobject. This object really should be a struct device, or at least contain a pointer to a struct device, as it is trying to create a separate device tree outside of the main device tree. This patch fixes this problem. It is needed for the class core rework that is being done in the driver core. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:49 -07:00
Greg Kroah-Hartman	d4c4196f24	infiniband: rename "device" to "ib_device" in cm_device This pointer really is a struct ib_device, not a struct device, so name it properly to help prevent confusion. This makes the followon patch in this series much smaller and easier to understand as well. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:49 -07:00
Or Gerlitz	de910bd921	RDMA/cma: Simplify locking needed for serialization of callbacks The RDMA CM has some logic in place to make sure that callbacks on a given CM ID are delivered to the consumer in a serialized manner. Specifically it has code to protect against a device removal racing with a running callback function. This patch simplifies this logic by using a mutex per ID instead of a wait queue and atomic variable. This means that cma_disable_remove() now is more properly named to cma_disable_callback(), and cma_enable_remove() can now be removed because it just would become a trivial wrapper around mutex_unlock(). Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:53 -07:00
Or Gerlitz	64c5e613b9	RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr Keep a pointer to the local (src) netdevice in struct rdma_dev_addr, and copy it in as part of rdma_copy_addr(). Use rdma_translate_ip() in cma_new_conn_id() to reduce some code duplication and also make sure the src_dev member gets set. In a high-availability configuration the netdevice pointer can be used by the RDMA CM to align RDMA sessions to use the same links as the IP stack does under fail-over and route change cases. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:53 -07:00
Steve Wise	7f624d023b	RDMA/core: Add iWARP protocol statistics attributes in sysfs This patch adds a sysfs attribute group called "proto_stats" under /sys/class/infiniband/$device/ and populates this group with protocol statistics if they exist for a given device. Currently, only iWARP stats are defined, but the code is designed to allow InfiniBand protocol stats if they become available. These stats are per-device and more importantly -not- per port. Details: - Add union rdma_protocol_stats in ib_verbs.h. This union allows defining transport-specific stats. Currently only iwarp stats are defined. - Add struct iw_protocol_stats to define the current set of iwarp protocol stats. - Add new ib_device method called get_proto_stats() to return protocol statistics. - Add logic in core/sysfs.c to create iwarp protocol stats attributes if the device is an RNIC and has a get_proto_stats() method. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:48 -07:00
Roland Dreier	468f2239bc	RDMA/cma: Add missing newlines to printk()s Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Sean Hefty <sean.hefty@intel.com>	2008-07-14 23:48:47 -07:00
Ralph Campbell	e5a5e7d59a	IB/core: Reset to error QP state transition is not allowed I was reviewing the QP state transition diagram in the IB 1.2.1 spec and the code for qp_state_table[], and noticed that the code allows a QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes for figure 124 (pg 457) specifically says that this transition isn't allowed. This is a clarification from earlier versions of the IB spec, which were ambiguous in this area and suggested that the RESET to ERR transition was allowed. Fix up the qp_state_table[] to make RESET->ERR not allowed. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:46 -07:00
Steve Wise	00f7ec36c9	RDMA/core: Add memory management extensions support This patch adds support for the IB "base memory management extension" (BMME) and the equivalent iWARP operations (which the iWARP verbs mandates all devices must implement). The new operations are: - Allocate an ib_mr for use in fast register work requests. - Allocate/free a physical buffer lists for use in fast register work requests. This allows device drivers to allocate this memory as needed for use in posting send requests (eg via dma_alloc_coherent). - New send queue work requests: * send with remote invalidate * fast register memory region * local invalidate memory region * RDMA read with invalidate local memory region (iWARP only) Consumer interface details: - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added to indicate device support for these features. - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV, IB_WR_RDMA_READ_WITH_INV are added. - A new consumer API function, ib_alloc_mr() is added to allocate fast register memory regions. - New consumer API functions, ib_alloc_fast_reg_page_list() and ib_free_fast_reg_page_list() are added to allocate and free device-specific memory for fast registration page lists. - A new consumer API function, ib_update_fast_reg_key(), is added to allow the key portion of the R_Key and L_Key of a fast registration MR to be updated. Consumers call this if desired before posting a IB_WR_FAST_REG_MR work request. Consumers can use this as follows: - MR is allocated with ib_alloc_mr(). - Page list memory is allocated with ib_alloc_fast_reg_page_list(). - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key(). - MR made VALID and bound to a specific page list via ib_post_send(IB_WR_FAST_REG_MR) - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV), ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with invalidate operation. - MR is deallocated with ib_dereg_mr() - page lists dealloced via ib_free_fast_reg_page_list(). Applications can allocate a fast register MR once, and then can repeatedly bind the MR to different physical block lists (PBLs) via posting work requests to a send queue (SQ). For each outstanding MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be allocated (the fast_reg_page_list is owned by the low-level driver from the consumer posting a work request until the request completes). Thus pipelining can be achieved while still allowing device-specific page_list processing. The 32-bit fast register memory key/STag is composed of a 24-bit index and an 8-bit key. The application can change the key each time it fast registers thus allowing more control over the peer's use of the key/STag (ie it can effectively be changed each time the rkey is rebound to a page list). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:45 -07:00
Roland Dreier	f3781d2e89	RDMA: Remove subversion $Id tags They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:44 -07:00
Moni Shoua	164ba0893c	IB/sa: Fail requests made while creating new SM AH This patch solves a race that occurs after an event occurs that causes the SA query module to flush its SM address handle (AH). When SM AH becomes invalid and needs an update it is handled by the global workqueue. On the other hand this event is also handled in the IPoIB driver by queuing work in the ipoib_workqueue that does multicast joins. Although queuing is in the right order, it is done to 2 different workqueues and so there is no guarantee that the first to be queued is the first to be executed. This causes a problem because IPoIB may end up sending an request to the old SM, which will take a long time to time out (since the old SM is gone); this leads to a much longer than necessary interruption in multicast traffer. The patch sets the SA query module's SM AH to NULL when the event occurs, and until update_sm_ah() is done, any request that needs sm_ah fails with -EAGAIN return status. For consumers, the patch doesn't make things worse. Before the patch, MADs are sent to the wrong SM so the request gets lost. Consumers can be improved if they examine the return code and respond to EAGAIN properly but even without an improvement the situation is not getting worse. Signed-off-by: Moni Levy <monil@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:43 -07:00
Sean Hefty	a947491709	RDMA: Fix license text The license text for several files references a third software license that was inadvertently copied in. Update the license to what was intended. This update was based on a request from HP. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:43 -07:00
Jonathan Corbet	2fceef397f	Merge commit 'v2.6.26' into bkl-removal	2008-07-14 15:29:34 -06:00
Roland Dreier	feae1ef116	IB/umad: BKL is not needed for ib_umad_open() Remove explicit lock_kernel() calls and document why the code is safe. Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-07-11 16:40:58 -06:00
Roland Dreier	5b2d281acb	IB/uverbs: BKL is not needed for ib_uverbs_open() Remove explicit lock_kernel() calls and document why the code is safe. Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-07-04 10:32:28 -06:00
Arnd Bergmann	6b0ee363b2	infiniband-ucma: BKL pushdown Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2008-06-20 14:05:57 -06:00
Jonathan Corbet	f2b9857eee	Add a bunch of cycle_kernel_lock() calls All of the open() functions which don't need the BKL on their face may still depend on its acquisition to serialize opens against driver initialization. So make those functions acquire then release the BKL to be on the safe side. Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-06-20 14:05:53 -06:00
Jonathan Corbet	057e7c7ff9	infiniband: more BKL pushdown Be extra-cautious and protect the remaining open() functions. Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-06-20 14:05:51 -06:00
Jonathan Corbet	d21c95c569	Add "no BKL needed" comments to several drivers This documents the fact that somebody looked at the relevant open() functions and concluded that, due to their trivial nature, no locking was needed. Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-06-20 14:05:50 -06:00
Jack Morgenstein	fb77bcef9f	IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler() Commit `1ae5c187` ("IB/uverbs: Don't store struct file * for event files") changed the way that closed files are handled in the uverbs code. However, after the conversion, is_closed flag is checked incorrectly in ib_uverbs_async_handler(). As a result, no async events are ever passed to applications. Found by: Ronni Zimmerman <ronniz@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-06-18 15:36:38 -07:00
Roland Dreier	8079ffa0e1	IB/umem: Avoid sign problems when demoting npages to integer On a 64-bit architecture, if ib_umem_get() is called with a size value that is so big that npages is negative when cast to int, then the length of the page list passed to get_user_pages(), namely min_t(int, npages, PAGE_SIZE / sizeof (struct page *)) will be negative, and get_user_pages() will immediately return 0 (at least since `900cf086`, "Be more robust about bad arguments in get_user_pages()"). This leads to an infinite loop in ib_umem_get(), since the code boils down to: while (npages) { ret = get_user_pages(...); npages -= ret; } Fix this by taking the minimum as unsigned longs, so that the value of npages is never truncated. The impact of this bug isn't too severe, since the value of npages is checked against RLIMIT_MEMLOCK, so a process would need to have an astronomical limit or have CAP_IPC_LOCK to be able to trigger this, and such a process could already cause lots of mischief. But it does let buggy userspace code cause a kernel lock-up; for example I hit this with code that passes a negative value into a memory registartion function where it is promoted to a huge u64 value. Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-06-06 21:38:37 -07:00
Linus Torvalds	c2448278e3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mad: Fix kernel crash when .process_mad() returns SUCCESS\|CONSUMED IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish() MAINTAINERS: Add cxgb3 and iw_cxgb3 NIC and iWARP driver entries IB/mlx4: Fix creation of kernel QP with max number of send s/g entries IB/mthca: Fix max_sge value returned by query_device RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send() IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send() IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate IB/ipath: Fix printk format for ipath_sdma_status	2008-05-23 11:11:44 -07:00
Dave Olson	5a4f2b6752	IB/mad: Fix kernel crash when .process_mad() returns SUCCESS\|CONSUMED If a low-level driver returns IB_MAD_RESULT_SUCCESS \| IB_MAD_RESULT_CONSUMED, handle_outgoing_dr_smp() doesn't clean up properly. The fix is to kfree the local data and break, rather than falling through. This was observed with the ipath driver, but could happen with any driver. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1027>. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-05-23 10:52:59 -07:00
Greg Kroah-Hartman	6c06aec248	IB: fix race in device_create There is a race from when a device is created with device_create() and then the drvdata is set with a call to dev_set_drvdata() in which a sysfs file could be open, yet the drvdata will be NULL, causing all sorts of bad things to happen. This patch fixes the problem by using the new function, device_create_drvdata(). Cc: Kay Sievers <kay.sievers@vrfy.org> Reviewed-by: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-05-20 13:31:55 -07:00
Arthur Kepner	cb9fbc5c37	IB: expand ib_umem_get() prototype Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1 when mapping user-allocated CQs with ib_umem_get(). Signed-off-by: Arthur Kepner <akepner@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jes Sorensen <jes@sgi.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:12 -07:00
Linus Torvalds	e80ab411e5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits) SCSI: convert struct class_device to struct device DRM: remove unused dev_class IB: rename "dev" to "srp_dev" in srp_host structure IB: convert struct class_device to struct device memstick: convert struct class_device to struct device driver core: replace remaining __FUNCTION__ occurrences sysfs: refill attribute buffer when reading from offset 0 PM: Remove destroy_suspended_device() Firmware: add iSCSI iBFT Support PM: Remove legacy PM (fix) Kobject: Replace list_for_each() with list_for_each_entry(). SYSFS: Explicitly include required header file slab.h. Driver core: make device_is_registered() work for class devices PM: Convert wakeup flag accessors to inline functions PM: Make wakeup flags available whenever CONFIG_PM is set PM: Fix misuse of wakeup flag accessors in serial core Driver core: Call device_pm_add() after bus_add_device() in device_add() PM: Handle device registrations during suspend/resume block: send disk "change" event for rescan_partitions() sysdev: detect multiple driver registrations ... Fixed trivial conflict in include/linux/memory.h due to semaphore header file change (made irrelevant by the change to mutex).	2008-04-21 15:49:58 -07:00
Tony Jones	f4e91eb4a8	IB: convert struct class_device to struct device This converts the main ib_device to use struct device instead of struct class_device as class_device is going away. Signed-off-by: Tony Jones <tonyj@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-19 19:10:30 -07:00
Matthew Wilcox	6188e10d38	Convert asm/semaphore.h users to linux/semaphore.h Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2008-04-18 22:22:54 -04:00
Eli Cohen	2dd5716227	IB/core: Add support for modify CQ Add support for modifying CQ parameters for controlling event generation moderation. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:33 -07:00
Roland Dreier	0f39cf3d54	IB/core: Add support for "send with invalidate" work requests Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a "send with invalidate" work request as defined in the iWARP verbs and the InfiniBand base memory management extensions. Also put "imm_data" and a new "invalidate_rkey" member in a new "ex" union in struct ib_send_wr. The invalidate_rkey member can be used to pass in an R_Key/STag to be invalidated. Add this new union to struct ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in ib_uverbs_post_send(). Fix up low-level drivers to deal with the change to struct ib_send_wr, and just remove the imm_data initialization from net/sunrpc/xprtrdma/, since that code never does any send with immediate operations. Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since the iWARP drivers currently in the tree set the bit. The amso1100 driver at least will silently fail to honor the IB_SEND_INVALIDATE bit if passed in as part of userspace send requests (since it does not implement kernel bypass work request queueing). Remove the flag from all existing drivers that set it until we know which ones are OK. The values chosen for the new flag is not consecutive to avoid clashing with flags defined in the XRC patches, which are not merged yet but which are already in use and are likely to be merged soon. This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:32 -07:00
Dotan Barak	7ce5eacb45	IB/core: Check optional verbs before using them Make sure that a device implements the modify_srq and reg_phys_mr optional methods before calling them. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:28 -07:00
Eli Cohen	b846f25aa2	IB/core: Add creation flags to struct ib_qp_init_attr Add a create_flags member to struct ib_qp_init_attr that will allow a kernel verbs consumer to create a pass special flags when creating a QP. Add a flag value for telling low-level drivers that a QP will be used for IPoIB UD LSO. The create_flags member will also be useful for XRC and ehca low-latency QP support. Since no create_flags handling is implemented yet, add code to all low-level drivers to return -EINVAL if create_flags is non-zero. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:27 -07:00
Robert P. J. Day	157de22946	IB: Use shorter list_splice_init() for brevity Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init() Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:26 -07:00
Julia Lawall	10f32065a2	RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0 The function rdma_create_id() always returns either a valid pointer or a value made with ERR_PTR, so its result should be tested with IS_ERR, not with a test for 0. The problem was found using the following semantic match. (http://www.emn.fr/x-info/coccinelle/) //<smpl> @a@ expression E, E1; statement S,S1; position p; @@ E = rdma_create_id(...) ... when != E = E1 if@p (E) S else S1 @n@ position a.p; expression E,E1; statement S,S1; @@ E = NULL ... when != E = E1 if@p (E) S else S1 @depends on !n@ expression E; statement S,S1; position a.p; @@ * if@p (E) S else S1 //</smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:25 -07:00
Roland Dreier	a7dab9e887	IB/uverbs: Use alloc_file() instead of get_empty_filp() Christoph Hellwig wants to unexport get_empty_filp(), which is an ugly internal interface. Change the modular user in ib_uverbs_alloc_event_file() to use the better alloc_file() interface; this makes the code cleaner too. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:01:08 -07:00
Roland Dreier	1ae5c187ac	IB/uverbs: Don't store struct file * for event files The file member of struct ib_uverbs_event_file was only used to keep track of whether the file had been closed or not. The only thing we ever did with the value was check if it was NULL or not. Simplify the code and get rid of the need to keep track of the struct file * we allocate by replacing the file member with an is_closed member. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:01:08 -07:00
Roland Dreier	9cda779cc2	RDMA/ucma: Endian annotation Add __force cast of node_guid to __u64, since we are sticking it into a structure whose definition is shared with userspace. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:01:07 -07:00
Roland Dreier	a88f488857	IB/cm: Endianness annotations Mostly update the RB tree comparisons to force __be types to normal integers, but the change to cm_format_sidr_req() is a real fix: param->path->pkey is already __be16. Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Sean Hefty <sean.hefty@intel.com>	2008-04-16 21:01:07 -07:00
Al Viro	1b90c137cc	trivial endianness annotations: infiniband core Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-30 14:20:24 -07:00
Steve Wise	d7c1fbd660	RDMA/iwcm: Don't access a cm_id after dropping reference cm_work_handler() can access cm_id_priv after it drops its reference by calling iwch_deref_id(), which might cause it to be freed. The fix is to look at whether IWCM_F_CALLBACK_DESTROY is set _before_ dropping the reference. Then if it was set, free the cm_id on this thread. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-03-10 21:22:22 -07:00
Pete Wyckoff	331552925d	IB/fmr_pool: Flush all dirty FMRs from ib_fmr_pool_flush() Commit `a3cd7d90` ("IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs") caused a regression for iSER and was reverted in e5507736. This change attempts to redo the original patch so that all used FMR entries are flushed when ib_flush_fmr_pool() is called without affecting the normal FMR pool cleaning thread. Simply move used entries from the clean list onto the dirty list in ib_flush_fmr_pool() before letting the cleanup thread do its job. Signed-off-by: Pete Wyckoff <pw@osc.edu> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-29 13:31:48 -08:00
Pete Wyckoff	35fb5340e3	Revert "IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs" This reverts commit `a3cd7d9070`. The original commit breaks iSER reliably, making it complain: iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11 The FMR cleanup thread runs ib_fmr_batch_release() as dirty entries build up. This commit causes clean but used FMR entries also to be purged. During that process, another thread can see that there are no free FMRs and fail, even though there should always have been enough available. Signed-off-by: Pete Wyckoff <pw@osc.edu> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-29 13:29:19 -08:00
Sean Hefty	84ba284cd7	IB/cm: Flush workqueue when removing device When a CM MAD is received, it is queued to a CM workqueue for processing. The queued work item references the port and device on which the MAD was received. If that device is removed from the system before the work item can execute, the work item will reference freed memory. To fix this, flush the workqueue after unregistering to receive MAD, and before the device is be freed. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-29 13:27:52 -08:00
Li Zefan	c7482b81c8	IB: Fix return value in ib_device_register_sysfs() If kobject_create_and_add() fails and returns NULL, the current code in ib_device_register_sysfs() does not set ret and hence returns 0. Set ret to -ENOMEM for this failure, so that the caller knows that ib_device_register_sysfs() actually failed. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-15 15:05:05 -08:00
Sean Hefty	ead595aeb0	RDMA/cma: Do not issue MRA if user rejects connection request There's an undesirable interaction with issuing MRA requests to increase connection timeouts and the listen backlog. When the rdma_cm receives a connection request, it queues an MRA with the ib_cm. (The ib_cm will send an MRA if it receives a duplicate REQ.) The rdma_cm will then create a new rdma_cm_id and give that to the user, which in this case is the rdma_user_cm. If the listen backlog maintained in the rdma_user_cm is full, it destroys the rdma_cm_id, which in turns destroys the ib_cm_id. The ib_cm_id generates a REJ because the state of the ib_cm_id has changed to MRA sent, versus REQ received. When the backlog is full, we just want to drop the REQ so that it is retried later. Fix this by deferring queuing the MRA until after the user of the rdma_cm has examined the connection request. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-14 15:30:41 -08:00
Roland Dreier	7c7a9bccd2	IB/cm: Fix infiniband_cm class kobject ref counting Commit `9af57b7a` ("IB/cm: Add basic performance counters") introduced a bug in how the reference count for cm_class.subsys.kobj was handled: the path that released a device did a kobject_put() on that kobject, but there was no kobject_get() in the path the handles adding a device. So the reference count ended up too low, which leads to bad things. Fix up and simplify the reference counting to avoid this. (Actually, I introduced the bug when fixing the patch up to match some of Greg's kobject changes, but who's counting) Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-12 14:38:27 -08:00
Roland Dreier	ab64b96067	IB/cm: Remove debug printk()s that snuck upstream Pesky little devils, sneaking around... Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-12 14:38:27 -08:00
Or Gerlitz	1d96354e61	IB/fmr_pool: Allocate page list for pool FMRs only when caching enabled Allocate memory for the page_list field of struct ib_pool_fmr only when caching is enabled for the FMR pool, since the field is not used otherwise. This can save significant amounts of memory for large pools with caching turned off. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-04 20:20:44 -08:00
Sean Hefty	3971c9f6db	IB/cm: Add interim support for routed paths Paths with hop_limit > 1 indicate that the connection will be routed between IB subnets. Update the subnet local field in the CM REQ based on the hop_limit value. In addition, if the path is routed, then set the LIDs in the REQ to the permissive LIDs. This is used to indicate to the passive side that it should use the LIDs in the received local route header (LRH) associated with the REQ when programming the QP. This is a temporary work-around to the IB CM to support IB router development until the IB router specification is completed. It is not anticipated that this work-around will cause any interoperability issues with existing stacks or future stacks that will properly support IB routers when defined. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-04 20:20:42 -08:00
Denis V. Lunev	f206351a50	[NETNS]: Add namespace parameter to ip_route_output_key. Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:07 -08:00
Denis V. Lunev	1ab352768f	[NETNS]: Add namespace parameter to ip_dev_find. in_dev_find() need a namespace to pass it to fib_get_table(), so add an argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:04 -08:00
Joe Perches	6360a02af1	[IPV4] drivers/infiniband: Use ipv4_is_<type> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:17 -08:00
Olaf Kirch	a3cd7d9070	IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs When a FMR is released via ib_fmr_pool_unmap(), the FMR usually ends up on the free_list rather than the dirty_list (because we allow a certain number of remappings before actually requiring a flush). However, ib_fmr_batch_release() only looks at dirty_list when flushing out old mappings. This means that when ib_fmr_pool_flush() is used to force a flush of the FMR pool, some dirty FMRs that have not reached their maximum remap count will not actually be flushed. Fix this by flushing all FMRs that have been used at least once in ib_fmr_batch_release(). Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:43 -08:00
Olaf Kirch	a656eb758f	IB/fmr_pool: Flush serial numbers can get out of sync Normally, the serial numbers for flush requests and flushes executed for an FMR pool should be in sync. However, if the FMR pool flushes dirty FMRs because the dirty_watermark was reached, we wake up the cleanup thread and let it do its stuff. As a side effect, the cleanup thread increments pool->flush_ser, which leaves it one higher than pool->req_ser. The next time the user calls ib_flush_fmr_pool(), the cleanup thread will be woken up, but ib_flush_fmr_pool() won't wait for the flush to complete because flush_ser is already past req_ser. This means the FMRs that the user expects to be flushed may not have all been flushed when the function returns. Fix this by telling the cleanup thread to do work exclusively by incrementing req_ser, and by moving the comparison of dirty_len and dirty_watermark into ib_fmr_pool_unmap(). Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>	2008-01-25 14:15:42 -08:00
Roland Dreier	2fe7e6f7c9	IB/umad: Simplify and fix locking In addition to being overly complex, the locking in user_mad.c is broken: there were multiple reports of deadlocks and lockdep warnings. In particular it seems that a single thread may end up trying to take the same rwsem for reading more than once, which is explicitly forbidden in the comments in <linux/rwsem.h>. To solve this, we change the locking to use plain mutexes instead of rwsems. There is one mutex per open file, which protects the contents of the struct ib_umad_file, including the array of agents and list of queued packets; and there is one mutex per struct ib_umad_port, which protects the contents, including the list of open files. We never hold the file mutex across calls to functions like ib_unregister_mad_agent(), which can call back into other ib_umad code to queue a packet, and we always hold the port mutex as long as we need to make sure that a device is not hot-unplugged from under us. This even makes things nicer for users of the -rt patch, since we remove calls to downgrade_write() (which is not implemented in -rt). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:42 -08:00
Sean Hefty	5851bb893e	RDMA/cma: Override default responder_resources with user value By default, the responder_resources parameter is set to that received in a connection request. The passive side may override this value when accepting the connection. Use the value provided by the passive side when transitioning the QP to RTR state, rather than the value given in the connect request. Without this change, the RTR transition may fail if the passive side supports fewer responder_resources than that in the request. For code consistency and to protect against QP destruction, restructure overriding initiator_depth to match how responder_resources is set. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:41 -08:00
Rolf Manderscheid	a9e527e3f9	IPoIB: improve IPv4/IPv6 to IB mcast mapping functions An IPoIB subnet on an IB fabric that spans multiple IB subnets can't use link-local scope in multicast GIDs. The existing routines that map IP/IPv6 multicast addresses into IB link-level addresses hard-code the scope to link-local, and they also leave the partition key field uninitialised. This patch adds a parameter (the link-level broadcast address) to the mapping routines, allowing them to initialise both the scope and the P_Key appropriately, and fixes up the call sites. The next step will be to add a way to configure the scope for an IPoIB interface. Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:37 -08:00
Sean Hefty	88314e4dda	RDMA/cma: add support for rdma_migrate_id() This is based on user feedback from Doug Ledford at RedHat: Events that occur on an rdma_cm_id are reported to userspace through an event channel. Connection request events are reported on the event channel associated with the listen. When the connection is accepted, a new rdma_cm_id is created and automatically uses the listen event channel. This is suboptimal where the user only wants listen events on that channel. Additionally, it may be desirable to have events related to connection establishment use a different event channel than those related to already established connections. Allow the user to migrate an rdma_cm_id between event channels. All pending events associated with the rdma_cm_id are moved to the new event channel. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:32 -08:00
Vladimir Sokolovsky	45d9478da1	RDMA/cma: Reenable device removal on passive side Enable conn_id remove on the passive side after connection establishment. This corrects an issue where the IB driver can't be unloaded after running applications over RDS. The 'dev_remove' counter does not reach 0 for established connections on the passive side. This problem is limited to device removal, and only occurs on the passive side if there are established connections. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:31 -08:00
Sean Hefty	b61d92d8ae	IB/mad: Fix incorrect access to items on local_list In cancel_mads(), MADs are moved from the wait_list and local_list to a cancel_list for processing. However, the structures on these two lists are not the same. The wait_list references struct ib_mad_send_wr_private, but local_list references struct ib_mad_local_private. Cancel_mads() treats all items moved to the cancel_list as struct ib_mad_send_wr_private. This leads to a system crash when requests are moved from the local_list to the cancel_list. Fix this by leaving local_list alone. All requests on the local_list have completed are just awaiting processing by a queued worker thread. Bug (crash) reported by Dotan Barak <dotanb@dev.mellanox.co.il>. Problem with local_list access reported by Robert Reynolds <rreynolds@opengridcomputing.com>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:31 -08:00
Sean Hefty	9af57b7a27	IB/cm: Add basic performance counters Add performance/debug counters to track sent/received messages, retries, and duplicates. Counters are tracked per CM message type, per port. The counters are always enabled, so intrusive state tracking is not done. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:30 -08:00
Sean Hefty	4fc8cd4919	IB/mad: Report number of times a mad was retried To allow ULPs to tune timeout values and capture retry statistics, report the number of times that a mad send operation was retried. For RMPP mads, report the total number of times that the any portion (send window) of the send operation was retried. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:30 -08:00
Sean Hefty	547af76521	IB/multicast: Report errors on multicast groups if P_key changes P_key changes can invalidate multicast groups. Report errors on all multicast groups affected by a pkey change. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:29 -08:00
Steve Welch	727792da2b	IB/mad: Enable loopback of DR SMP responses from userspace The local loopback of an outgoing DR SMP response is limited to those that originate at the driver specific SMA implementation during the driver specific process_mad() function. This patch enables a returning DR SMP originating in userspace (or elsewhere) to be delivered to the local managment stack. In this specific case the driver process_mad() function does not consume or process the MAD, so a reponse mad has not be created and the original MAD must manually be copied to the MAD buffer that is to be handed off to the local agent. Signed-off-by: Steve Welch <swelch@systemfabricworks.com> Acked-by: Hal Rosenstock <hal@xsigo.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:25 -08:00
Ralph Campbell	3828ff457a	IB/mad: Remove redundant NULL pointer check in ib_mad_recv_done_handler() In ib_mad_recv_done_handler(), the response pointer is checked for NULL after allocating it. It is then checked again in the local process_mad() path but there is no possibility of it changing in between. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Acked-by: Hal Rosenstock <hal@xsigo.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:25 -08:00
Steve Wise	8d8293cfb3	RDMA/iwcm: Set initiator depth and responder resources to device max values Set the initiator depth and responder resources to the device max values for new connect request events in the iWARP connection manager. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:25 -08:00
Greg Kroah-Hartman	c10997f657	Kobject: convert drivers/* from kobject_unregister() to kobject_put() There is no need for kobject_unregister() anymore, thanks to Kay's kobject cleanup changes, so replace all instances of it with kobject_put(). Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman	35be068198	Kobject: change drivers/infiniband to use kobject_init_and_add Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <mshefty@ichips.intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:26 -08:00
Linus Torvalds	53173920da	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/fmr_pool: Stop ib_fmr threads from contributing to load average IB/ipath: Fix incorrect use of sizeof on msg buffer (function argument) IB/ipath: Limit length checksummed in eeprom IB/ipath: Fix a race where s_last is updated without lock held IB/mlx4: Lock SQ lock in mlx4_ib_post_send() IPoIB/cm: Fix receive QP cleanup	2007-10-30 15:26:56 -07:00
Anton Blanchard	3f776e8a25	IB/fmr_pool: Stop ib_fmr threads from contributing to load average I noticed my machine was at a constant load average of 1. This was because ib_create_fmr_pool calls kthread_create but does not immediately wake the thread up. Change to using kthread_run so we enter ib_fmr_cleanup_thread(), set TASK_INTERRUPTIBLE, then go to sleep. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-30 14:57:43 -07:00
Jens Axboe	642f149031	SG: Change sg_set_page() to take length and offset argument Most drivers need to set length and offset as well, so may as well fold those three lines into one. Add sg_assign_page() for those two locations that only needed to set the page, where the offset/length is set outside of the function context. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-24 11:20:47 +02:00
Linus Torvalds	0b776eb542	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4_core: Increase command timeout for INIT_HCA to 10 seconds IPoIB/cm: Use common CQ for CM send completions IB/uverbs: Fix checking of userspace object ownership IB/mlx4: Sanity check userspace send queue sizes IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))" IB/ehca: Enable large page MRs by default IB/ehca: Change meaning of hca_cap_mr_pgsize IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr() IB/ehca: Fix masking error in {,re}reg_phys_mr() IB/ehca: Supply QP token for SRQ base QPs IPoIB: Use round_jiffies() for ah_reap_task RDMA/cma: Fix deadlock destroying listen requests RDMA/cma: Add locking around QP accesses IB/mthca: Avoid alignment traps when writing doorbells mlx4_core: Kill mlx4_write64_raw()	2007-10-23 09:56:11 -07:00
Jens Axboe	45711f1af6	[SG] Update drivers to use sg helpers Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-22 21:19:53 +02:00
Roland Dreier	cbfb50e6e2	IB/uverbs: Fix checking of userspace object ownership Commit `9ead190b` ("IB/uverbs: Don't serialize with ib_uverbs_idr_mutex") rewrote how userspace objects are looked up in the uverbs module's idrs, and introduced a severe bug in the process: there is no checking that an operation is being performed by the right process any more. Fix this by adding the missing check of uobj->context in __idr_get_uobj(). Apparently everyone is being very careful to only touch their own objects, because this bug was introduced in June 2006 in 2.6.18, and has gone undetected until now. Cc: stable <stable@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-19 20:01:43 -07:00
Anton Arapov	a25de534f8	[INET]: Justification for local port range robustness. There is a justifying patch for Stephen's patches. Stephen's patches disallows using a port range of one single port and brakes the meaning of the 'remaining' variable, in some places it has different meaning. My patch gives back the sense of 'remaining' variable. It should mean how many ports are remaining and nothing else. Also my patch allows using a single port. I sure we must be able to use mentioned port range, this does not restricted by documentation and does not brake current behavior. usefull links: Patches posted by Stephen Hemminger http://marc.info/?l=linux-netdev&m=119206106218187&w=2 http://marc.info/?l=linux-netdev&m=119206109918235&w=2 Andrew Morton's comment http://marc.info/?l=linux-kernel&m=119248225007737&w=2 1. Allows using a port range of one single port. 2. Gives back sense of 'remaining' variable. Signed-off-by: Anton Arapov <aarapov@redhat.com> Acked-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 22:00:17 -07:00
Sean Hefty	d02d1f5359	RDMA/cma: Fix deadlock destroying listen requests Deadlock condition reported by Kanoj Sarcar <kanoj@netxen.com>. The deadlock occurs when a connection request arrives at the same time that a wildcard listen is being destroyed. A wildcard listen maintains per device listen requests for each RDMA device in the system. The per device listens are automatically added and removed when RDMA devices are inserted or removed from the system. When a wildcard listen is destroyed, rdma_destroy_id() acquires the rdma_cm's device mutex ('lock') to protect against hot-plug events adding or removing per device listens. It then tries to destroy the per device listens by calling ib_destroy_cm_id() or iw_destroy_cm_id(). It does this while holding the device mutex. However, if the underlying iw/ib CM reports a connection request while this is occurring, the rdma_cm callback function will try to acquire the same device mutex. Since we're in a callback, the ib_destroy_cm_id() or iw_destroy_cm_id() calls will block until their callback thread returns, but the callback is blocked waiting for the device mutex. Fix this by re-working how per device listens are destroyed. Use rdma_destroy_id(), which avoids the deadlock, in place of cma_destroy_listen(). Additional synchronization is added to handle device hot-plug events and ensure that the id is not destroyed twice. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-16 12:25:49 -07:00
Sean Hefty	c5483388bb	RDMA/cma: Add locking around QP accesses If a user allocates a QP on an rdma_cm_id, the rdma_cm will automatically transition the QP through its states (RTR, RTS, error, etc.) While the QP state transitions are occurring, the QP itself must remain valid. Provide locking around the QP pointer to prevent its destruction while accessing the pointer. This fixes an issue reported by Olaf Kirch from Oracle that resulted in a system crash: "An incoming connection arrives and we decide to tear down the nascent connection. The remote ends decides to do the same. We start to shut down the connection, and call rdma_destroy_qp on our cm_id. ... Now apparently a 'connect reject' message comes in from the other host, and cma_ib_handler() is called with an event of IB_CM_REJ_RECEIVED. It calls cma_modify_qp_err, which for some odd reason tries to modify the exact same QP we just destroyed." Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-16 12:25:00 -07:00
Kay Sievers	7eff2e7a8b	Driver core: change add_uevent_var to use a struct This changes the uevent buffer functions to use a struct instead of a long list of parameters. It does no longer require the caller to do the proper buffer termination and size accounting, which is currently wrong in some places. It fixes a known bug where parts of the uevent environment are overwritten because of wrong index calculations. Many thanks to Mathieu Desnoyers for finding bugs and improving the error handling. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:01 -07:00
Linus Torvalds	ce9d3c9a6a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits) mlx4_core: Fix section mismatches IPoIB: Allow setting policy to ignore multicast groups IB/mthca: Mark error paths as unlikely() in post_srq_recv functions IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages. IB/ipath: Remove redundant link state checks IB/ipath: Fix IB_EVENT_PORT_ERR event IB/ipath: Better handling of unexpected GPIO interrupts IB/ipath: Maintain active time on all chips IB/ipath: Fix QHT7040 serial number check IB/ipath: Indicate a couple of chip bugs to userspace IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close IB/ipath: Remove duplicate copy of LMC IB/ipath: Add ability to set the LMC via the sysfs debugging interface IB/ipath: Optimize completion queue entry insertion and polling IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED IB/ipath: Generate flush CQE when QP is in error state IB/ipath: Remove redundant code IB/ipath: Future proof eeprom checksum code (contents reading) IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate ...	2007-10-11 19:43:13 -07:00
Stephen Hemminger	227b60f510	[INET]: local port range robustness Expansion of original idea from Denis V. Lunev <den@openvz.org> Add robustness and locking to the local_port_range sysctl. 1. Enforce that low < high when setting. 2. Use seqlock to ensure atomic update. The locking might seem like overkill, but there are cases where sysadmin might want to change value in the middle of a DoS attack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 17:30:46 -07:00
Sean Hefty	dcb3f974da	RDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retries Automatically queue MRA message to decrease the number of retries sent by the remote side during connection establishment. This also has the effect of increasing the overall connection timeout without using a longer retry time in the case of dropped packets. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:17 -07:00
Sean Hefty	de98b693e9	IB/cm: Modify interface to send MRAs in response to duplicate messages The IB CM provides a message received acknowledged (MRA) message that can be sent to indicate that a REQ or REP message has been received, but will require more time to process than the timeout specified by those messages. In many cases, the application may not know how long it will take to respond to a CM message, but the majority of the time, it will usually respond before a retry has been sent. Rather than sending an MRA in response to all messages just to handle the case where a longer timeout is needed, it is more efficient to queue the MRA for sending in case a duplicate message is received. This avoids sending an MRA when it is not needed, but limits the number of times that a REQ or REP will be resent. It also provides for a simpler implementation than generating the MRA based on a timer event. (That is, trying to send the MRA after receiving the first REQ or REP if a response has not been generated, so that it is received at the remote side before a duplicate REQ or REP has been received) Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:17 -07:00
Roland Dreier	04d29b0ede	IB/uverbs: Make ib_uverbs_release_event_file() static ib_uverbs_release_event_file() is only used in uverbs_main.c, so make it static to that file. Also move the definition before the first use, so a forward declaration is not needed. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:15 -07:00
Roland Dreier	a394f83bdf	IB/umad: Fix bit ordering and 32-on-64 problems on big endian systems The declaration of struct ib_user_mad_reg_req.method_mask[] exported to userspace was an array of __u32, but the kernel internally treated it as a bitmap made up of longs. This makes a difference for 64-bit big-endian kernels, where numbering the bits in an array of__u32 gives: \|31.....0\|63....31\|95....64\|127...96\| while numbering the bits in an array of longs gives: \|63..............0\|127............64\| 64-bit userspace can handle this by just treating method_mask[] as an array of longs, but 32-bit userspace is really stuck: the meaning of the bits in method_mask[] depends on whether the kernel is 32-bit or 64-bit, and there's no sane way for userspace to know that. Fix this by updating <rdma/ib_user_mad.h> to make it clear that method_mask[] is an array of longs, and using a compat_ioctl method to convert to an array of 64-bit longs to handle the 32-on-64 problem. This fixes the interface description to match existing behavior (so working binaries continue to work) in almost all situations, and gives consistent semantics in the case of 32-bit userspace that can run on either a 32-bit or 64-bit kernel, so that the same binary can work for both 32-on-32 and 32-on-64 systems. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:15 -07:00
Roland Dreier	2be8e3ee8e	IB/umad: Add P_Key index support Add support for setting the P_Key index of sent MADs and getting the P_Key index of received MADs. This requires a change to the layout of the ABI structure struct ib_user_mad_hdr, so to avoid breaking compatibility, we default to the old (unchanged) ABI and add a new ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware of the new ABI to opt into using it. We plan on switching to the new ABI by default in a year or so, and this patch adds a warning that is printed when an application uses the old ABI, to push people towards converting to the new ABI. Signed-off-by: Roland Dreier <rolandd@cisco.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Hal Rosenstock <hal@xsigo.com>	2007-10-09 19:59:15 -07:00
Ralph Campbell	57cb61d587	IB/core: Fix handling of multicast response failures I was looking at the code for multicast.c and noticed that ib_sa_join_multicast() calls queue_join() which puts the request at the front of the group->pending_list. If this is a second request, it seems like it would interfere with process_join_error() since group->last_join won't point to the member at the head of the pending_list. The sequence would thus be: 1. ib_sa_join_multicast() puts member1 on head of pending_list and starts work thread 2. mcast_work_handler() calls send_join() which sets group->last_join to member1 3. ib_sa_join_multicast() puts member2 on head of pending_list 4. join operation for member1 receives failures response from SA. 5. join_handler() is called with error status 6. process_join_error() fails to process member1 since it doesn't match the first entry in the group->pending_list. The impact is that the failed join request is tossed. The second request is processed, and after it completes, the original request ends up being retried. This change also results in join requests being processed in FIFO order. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:14 -07:00
Steve Wise	935ef2d7a2	RDMA/cma: Use neigh_event_send() to start neighbour discovery Calling arp_send() to initiate neighbour discovery (ND) doesn't do the full ND protocol. Namely, it doesn't handle retransmitting the arp request if it is dropped. The function neigh_event_send() does all this. Without doing full ND, RDMA address resolution fails in the presence of dropped ARP broadcast packets. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:13 -07:00
Joachim Fenkes	c8d8beea03	IB/umem: Add hugetlb flag to struct ib_umem During ib_umem_get(), determine whether all pages from the memory region are hugetlb pages and report this in the "hugetlb" member. Low-level drivers can use this information if they need it. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:13 -07:00
Sean Hefty	7ce86409ad	RDMA/ucma: Allow user space to set service type Export the ability to set the type of service to user space. Model the interface after setsockopt. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:12 -07:00
Sean Hefty	a81c994d5e	RDMA/cma: Add ability to specify type of service Provide support to specify a type of service for a communication identifier. A new function call is used when dealing with IPv4 addresses. For IPv6 addresses, the ToS is specified through the traffic class field in the sockaddr_in6 structure. Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ The comments Eitan Zahavi and myself have made over the v1 post at <http://lists.openfabrics.org/pipermail/general/2007-August/039247.html> were fully addressed. ] Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:12 -07:00
Sean Hefty	733d65fe33	IB/sa: Add new QoS fields to path record The QoS annex defines new fields for path records. Add them to the ib_sa for consumers that want to use them. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:12 -07:00
Ali Ayoub	3c10c7c929	IB/sa: Error handling thinko fix ib_create_send_mad() returns an error code pointer on error, not NULL. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:07 -07:00
Anton Blanchard	8a68bbe31d	IB/fmr_pool: Clean up some error messages in fmr_pool.c A number of printks in fmr_pool.c dont have newlines, eg: fmr_create failed for FMR 0<5>FS-Cache: Loaded Fix them up. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:05 -07:00
Roland Dreier	65d470b3ea	IB: find_first_zero_bit() takes unsigned pointer Fix sparse warning drivers/infiniband/core/device.c:142:6: warning: incorrect type in argument 1 (different signedness) drivers/infiniband/core/device.c:142:6: expected unsigned long const addr drivers/infiniband/core/device.c:142:6: got long [assigned] inuse by making the local variable inuse unsigned. Does not affect generated code at all. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:04 -07:00
Dotan Barak	92ddc447ce	IB: Move the macro IB_UMEM_MAX_PAGE_CHUNK() to umem.c After moving the definition of struct ib_umem_chunk from ib_verbs.h to ib_umem.h there isn't any reason for the macro IB_UMEM_MAX_PAGE_CHUNK to stay in ib_verbs.h. Move the macro to umem.c, the only place where it is used. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-08-03 10:45:18 -07:00
Sean Hefty	38d5af9565	IB/mad: Fix address handle leak in mad_rmpp The address handle associated with dual-sided RMPP direction switch ACKs is never destroyed. Free the AH for ACKs which fall into this category. Problem was reported by Dotan Barak <dotanb@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-08-03 10:45:17 -07:00
Hal Rosenstock	8fc394b197	IB/mad: agent_send_response() should be void Nothing looks at the return value of agent_send_response(), so there's no point in returning anything. Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-08-03 10:45:17 -07:00
Hal Rosenstock	86dfbecdea	IB/mad: Fix memory leak in switch handling in ib_mad_recv_done_handler() If agent_send_response() returns an error, we shouldn't do anything differently than if it succeeds; setting response to NULL just means that the response buffer gets leaked. Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com> Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-08-03 10:45:17 -07:00
Hal Rosenstock	445d68070c	IB/mad: Fix error path if response alloc fails in ib_mad_recv_done_handler() If ib_mad_recv_done_handler() fails to allocate response, then it just printed a warning and continued, which leads to an oops if the MAD is being handled for a switch device, because the switch code uses response without checking for NULL. Fix this by bailing out of the function if the allocation fails. Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com> Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-08-03 10:45:17 -07:00
Roland Dreier	5399891052	IB/sa: Don't need to check for default P_Key twice Now that ib_find_pkey() ignores the membership bit of P_Keys, there's no need for ib_sa to look for both 0x7fff and 0xffff in a port's P_Key table. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-08-03 10:45:17 -07:00
Moni Shoua	36026ecc20	IB/core: Ignore membership bit in ib_find_pkey() ib_find_pkey() is used as a replacement for ib_find_cached_pkey(), and the original function ignored the membership bit when searching for a P_Key, so ib_find_pkey() should ignore the bit too. In particular, IPoIB turns on the P_Key membership bit of limited membership P_Keys when creating a child interface and looks for the full membership P_key. This broke if a port was a partial member of a partition when IPoIB switched from ib_find_cached_pkey() to ib_find_pkey(), and this change fixes things again. Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-08-03 10:45:17 -07:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Yoann Padioleau	dd00cc486a	some kmalloc/memset ->kzalloc (tree wide) Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc). Here is a short excerpt of the semantic patch performing this transformation: @@ type T2; expression x; identifier f,fld; expression E; expression E1,E2; expression e1,e2,e3,y; statement S; @@ x = - kmalloc + kzalloc (E1,E2) ... when != $x->fld=E;\\|y=f(...,x,...);\\|f(...,x,...);\\|x=E;\\|while(...) S\\|for(e1;e2;e3) S$ - memset((T2)x,0,E1); @@ expression E1,E2,E3; @@ - kzalloc(E1 * E2,E3) + kcalloc(E1,E2,E3) [akpm@linux-foundation.org: get kcalloc args the right way around] Signed-off-by: Yoann Padioleau <padator@wanadoo.fr> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Bryan Wu <bryan.wu@analog.com> Acked-by: Jiri Slaby <jirislaby@gmail.com> Cc: Dave Airlie <airlied@linux.ie> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Jiri Kosina <jkosina@suse.cz> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Acked-by: Pierre Ossman <drzeus-list@drzeus.cx> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Greg KH <greg@kroah.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:50 -07:00
Dotan Barak	8f076531cd	RDMA/cma: Remove local write permission from QP access flags Local write permission makes no sense as part of the QP access flags, since the access flags only control what the remote end of the connection is allowed to do. Remove the code in the RDMA CM that initializes qp_access_flags with IB_ACCESS_LOCAL_WRITE. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-17 20:30:22 -07:00
Roland Dreier	454a01e7f4	IB/cm: Make internal function cm_get_ack_delay() static Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-17 18:37:43 -07:00
Linus Torvalds	0cdf6990e9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (76 commits) IB: Update MAINTAINERS with Hal's new email address IB/mlx4: Implement query SRQ IB/mlx4: Implement query QP IB/cm: Send no match if a SIDR REQ does not match a listen IB/cm: Fix handling of duplicate SIDR REQs IB/cm: cm_msgs.h should include ib_cm.h IB/cm: Include HCA ACK delay in local ACK timeout IB/cm: Use spin_lock_irq() instead of spin_lock_irqsave() when possible IB/sa: Make sure SA queries use default P_Key IPoIB: Recycle loopback skbs instead of freeing and reallocating IB/mthca: Replace memset(<addr>, 0, PAGE_SIZE) with clear_page(<addr>) IPoIB/cm: Fix warning if IPV6 is not enabled IB/core: Take sizeof the correct pointer when calling kmalloc() IB/ehca: Improve latency by unlocking after triggering the hardware IB/ehca: Notify consumers of LID/PKEY/SM changes after nondisruptive events IB/ehca: Return QP pointer in poll_cq() IB/ehca: Change idr spinlocks into rwlocks IB/ehca: Refactor sync between completions and destroy_cq using atomic_t IB/ehca: Lock renaming, static initializers IB/ehca: Report RDMA atomic attributes in query_qp() ...	2007-07-12 16:45:40 -07:00
Tejun Heo	7b595756ec	sysfs: kill unnecessary attribute->owner sysfs is now completely out of driver/module lifetime game. After deletion, a sysfs node doesn't access anything outside sysfs proper, so there's no reason to hold onto the attribute owners. Note that often the wrong modules were accounted for as owners leading to accessing removed modules. This patch kills now unnecessary attribute->owner. Note that with this change, userland holding a sysfs node does not prevent the backing module from being unloaded. For more info regarding lifetime rule cleanup, please read the following message. http://article.gmane.org/gmane.linux.kernel/510293 (tweaked by Greg to not delete the field just yet, to make it easier to merge things properly.) Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:06 -07:00
Sean Hefty	6164c8cd13	IB/cm: Send no match if a SIDR REQ does not match a listen If a SIDR REQ does not match a listen, we should reply with status value 1 (service ID not supported), rather than dropping through to the default case of status 2 (rejected by service provider). Doing this also fixes a bug where the cm_id_priv is removed from the remote_sidr_table twice. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-10 21:52:28 -07:00
Sean Hefty	29c2731cbf	IB/cm: Fix handling of duplicate SIDR REQs Fix handling to duplicate SIDR REQs to avoid sending a reject if a duplicate is detected. Duplicates should just be silently discarded. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-10 21:51:43 -07:00
Sean Hefty	5d861be8c8	IB/cm: cm_msgs.h should include ib_cm.h cm_msgs.h uses definitions from ib_cm.h. Include it directly, rather than depending on a specific include order. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-10 21:50:53 -07:00
Sean Hefty	1d84612649	IB/cm: Include HCA ACK delay in local ACK timeout The IB CM should include the HCA ACK delay when calculating the local ACK timeout value to use for RC QPs. If the HCA ACK delay is large enough relative to the packet life time, then if it is not taken into account, the calculated timeout value ends up being too small, which can result in "retry exceeded" errors. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-10 21:50:05 -07:00
Sean Hefty	24be6e81c7	IB/cm: Use spin_lock_irq() instead of spin_lock_irqsave() when possible The ib_cm is a little over zealous about using spin_lock_irqsave, when spin_lock_irq would do. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-10 21:47:29 -07:00
Sean Hefty	2aec5c602c	IB/sa: Make sure SA queries use default P_Key MADs sent to the SA should use the the default P_Key (0x7fff/0xffff). There's no requirement that the default P_Key is stored at index 0 in the local P_Key table, so add code to the sa_query module to look up the index of the default P_Key when creating an address handle for the SA (which is done any time the P_Key table might change), and use this index for all SA queries. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-10 21:45:31 -07:00
Dotan Barak	856c52a741	IB/core: Take sizeof the correct pointer when calling kmalloc() When allocating out_mad in show_pma_counter(), take sizeof out_mad instead of sizeof in_mad. It is true that today the type of in_mad and out_mad are the same, but this patch will give us a cleaner code. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-10 11:04:40 -07:00
Andrew Morton	1d3f4b905a	IB: Fix ib_umem_get() when npages == 0 gcc correctly warned: drivers/infiniband/core/umem.c: In function 'ib_umem_get': drivers/infiniband/core/umem.c:78: warning: 'ret' may be used uninitialized in this function Set ret to 0 in case npages == 0 and the loop isn't entered at all. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-09 16:17:33 -07:00
Roland Dreier	43506d954e	IB: Remove garbage non-ASCII characters from comments A few files had 0xa0 characters in comments. Remove them so that the files are clean ASCII text. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-09 16:17:32 -07:00
Hal Rosenstock	1bae4dbf95	IB/mad: Enhance SMI for switch support Extend the SMI with switch (intermediate hop) support. Care has been taken to ensure that the CA (and router) code paths are changed as little as possible. Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com> Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-09 16:17:32 -07:00
Roland Dreier	24bce50803	IB/umem: Fix possible hang on process exit If ib_umem_release() is called after ib_uverbs_close() sets context->closing, then a process can get stuck in a D state, because the code boils down to if (down_write_trylock(&mm->mmap_sem)) down_write(&mm->mmap_sem); which is obviously a stupid instant deadlock. Fix the code so that we only try to take the lock once. This bug was introduced in commit `f7c6a7b5` ("IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules") which fortunately never made it into a release, and was reported by Pete Wyckoff <pw@osc.edu>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-06-21 11:05:58 -07:00
Sean Hefty	bf2944bd56	RDMA/cma: Fix initialization of next_port next_port should be between sysctl_local_port_range[0] and [1]. However, it is initially set to a random value with get_random_bytes(). If the value is negative when treated as a signed integer, next_port can end up outside the expected range because of the result of the % operator being negative. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-06-07 23:24:38 -07:00
Sean Hefty	d998ccce02	IB/cm: Fix stale connection detection The ib_cm can incorrectly detect a stale connection (a new connection request for a QPN that is already connected) as a duplicate connection request. Separate the handling of potential duplicate REQs from stale connections. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-29 16:07:09 -07:00
Linus Torvalds	8aee74c8ee	Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IB/cm: Improve local id allocation IPoIB/cm: Fix SRQ WR leak IB/ipoib: Fix typos in error messages IB/mlx4: Check if SRQ is full when posting receive IB/mlx4: Pass send queue sizes from userspace to kernel IB/mlx4: Fix check of opcode in mlx4_ib_post_send() mlx4_core: Fix array overrun in dump_dev_cap_flags() IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions IB/mthca: Fix RESET to ERROR transition IB/mlx4: Set GRH:HopLimit when sending globally routed MADs IB/mthca: Set GRH:HopLimit when building MLX headers IB/mlx4: Fix check of max_qp_dest_rdma in modify QP IB/mthca: Fix use-after-free on device restart IB/ehca: Return proper error code if register_mr fails IPoIB: Handle P_Key table reordering IB/core: Use start_port() and end_port() IB/core: Add helpers for uncached GID and P_Key searches IB/ipath: Fix potential deadlock with multicast spinlocks IB/core: Free umem when mm is already gone	2007-05-21 16:19:32 -07:00
Michael S. Tsirkin	9f81036c54	IB/cm: Improve local id allocation The IB CM uses an idr for local id allocations, with a running counter as start_id. This fails to generate distinct ids if 1. An id is constantly created and destroyed 2. A chunk of ids just beyond the current next_id value is occupied This in turn leads to an increased chance of connection request being mis-detected as a duplicate, sometimes for several retries, until next_id gets past the block of allocated ids. This has been observed in practice. As a fix, remember the last id allocated and start immediately above it. This also fixes a problem with the old code, where next_id might overflow and become negative. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-21 13:41:29 -07:00
Alexey Dobriyan	e8edc6e03a	Detach sched.h from mm.h First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-21 09:18:19 -07:00
Roland Dreier	1af4c435f3	IB/core: Use start_port() and end_port() Clean up ib_query_port() and ib_modify_port() slightly by using the just-added start_port() and end_port() helpers. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-19 08:51:54 -07:00
Yosef Etigin	5eb620c81c	IB/core: Add helpers for uncached GID and P_Key searches Add ib_find_gid() and ib_find_pkey() functions that use uncached device queries. The calls might block but the returns are always up-to-date. Cache P_Key and GID table lengths in core to avoid extra port info queries. Signed-off-by: Yosef Etigin <yosefe@voltaire.com> Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-19 08:51:53 -07:00
Eli Cohen	7b82cd8ee7	IB/core: Free umem when mm is already gone Free umem when task's mm is already destroyed by the time ib_umem_release gets called. Found by Dotan Barak at Mellanox. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-19 08:51:53 -07:00
Sean Hefty	6c719f5c6c	RDMA/cma: Add check to validate that cm_id is bound to a device Several checks in the rdma_cm check against the state of the cm_id, but only to validate that the cm_id is bound to an underlying transport specific CM and an RDMA device. Make the check explicit in what we're trying to check for, since we're not synchronizing against the cm_id state. This will allow a user to disconnect a cm_id or reject a connection after receiving a device removal event. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-14 14:10:32 -07:00
Sean Hefty	be65f086f2	RDMA/cma: Fix synchronization with device removal in cma_iw_handler The cma_iw_handler needs to validate the state of the rdma_cm_id before processing a new connection request to ensure that a device removal is not already being processed for the same rdma_cm_id. Without the state check, the user can receive simultaneous callbacks for the same cm_id, or a callback after they've destroyed the cm_id. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-14 13:56:32 -07:00
Sean Hefty	8aa08602bd	RDMA/cma: Simplify device removal handling code Add a new routine and rename another to encapsulate common code for synchronizing with device removal. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-14 13:54:49 -07:00
Roland Dreier	1bf66a3042	IB: Put rlimit accounting struct in struct ib_umem When memory pinned with ib_umem_get() is released, ib_umem_release() needs to subtract the amount of memory being unpinned from mm->locked_vm. However, ib_umem_release() may be called with mm->mmap_sem already held for writing if the memory is being released as part of an munmap() call, so it is sometimes necessary to defer this accounting into a workqueue. However, the work struct used to defer this accounting is dynamically allocated before it is queued, so there is the possibility of failing that allocation. If the allocation fails, then ib_umem_release has no choice except to bail out and leave the process with a permanently elevated locked_vm. Fix this by allocating the structure to defer accounting as part of the original struct ib_umem, so there's no possibility of failing a later allocation if creating the struct ib_umem and pinning memory succeeds. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-08 18:00:37 -07:00
Roland Dreier	f7c6a7b5d5	IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules Export ib_umem_get()/ib_umem_release() and put low-level drivers in control of when to call ib_umem_get() to pin and DMA map userspace, rather than always calling it in ib_uverbs_reg_mr() before calling the low-level driver's reg_user_mr method. Also move these functions to be in the ib_core module instead of ib_uverbs, so that driver modules using them do not depend on ib_uverbs. This has a number of advantages: - It is better design from the standpoint of making generic code a library that can be used or overridden by device-specific code as the details of specific devices dictate. - Drivers that do not need to pin userspace memory regions do not need to take the performance hit of calling ib_mem_get(). For example, although I have not tried to implement it in this patch, the ipath driver should be able to avoid pinning memory and just use copy_{to,from}_user() to access userspace memory regions. - Buffers that need special mapping treatment can be identified by the low-level driver. For example, it may be possible to solve some Altix-specific memory ordering issues with mthca CQs in userspace by mapping CQ buffers with extra flags. - Drivers that need to pin and DMA map userspace memory for things other than memory regions can use ib_umem_get() directly, instead of hacks using extra parameters to their reg_phys_mr method. For example, the mlx4 driver that is pending being merged needs to pin and DMA map QP and CQ buffers, but it does not need to create a memory key for these buffers. So the cleanest solution is for mlx4 to call ib_umem_get() in the create_qp and create_cq methods. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-08 18:00:37 -07:00
Linus Torvalds	972d45fb43	Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IPoIB: Convert to NAPI IB: Return "maybe missed event" hint from ib_req_notify_cq() IB: Add CQ comp_vector support IB/ipath: Fix a race condition when generating ACKs IB/ipath: Fix two more spin lock problems IB/fmr_pool: Add prefix to all printks IB/srp: Set proc_name IB/srp: Add orig_dgid sysfs attribute to scsi_host IPoIB/cm: Don't crash if remote side uses one QP for both directions RDMA/cxgb3: Support for new abort logic RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message RDMA/cxgb3: Fail qp creation if the requested max_inline is too large RDMA/cxgb3: Fix TERM codes IPoIB/cm: Fix error handling in ipoib_cm_dev_open() IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed IB/mthca: Work around kernel QP starvation IB/ipath: Don't put QP in timeout queue if waiting to send IB/ipath: Don't call spin_lock_irq() from interrupt context	2007-05-07 12:18:21 -07:00
Michael S. Tsirkin	f4fd0b224d	IB: Add CQ comp_vector support Add a num_comp_vectors member to struct ib_device and extend ib_create_cq() to pass in a comp_vector parameter -- this parallels the userspace libibverbs API. Update all hardware drivers to set num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector value. Pass the value of num_comp_vectors to userspace rather than hard-coding a value of 1. We want multiple CQ event vector support (via MSI-X or similar for adapters that can generate multiple interrupts), but it's not clear how many vectors we want, or how we want to deal with policy issues such as how to decide which vector to use or how to set up interrupt affinity. This patch is useful for experimenting, since no core changes will be necessary when updating a driver to support multiple vectors, and we know that we want to make at least these changes anyway. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-06 21:18:11 -07:00
Roland Dreier	1a70a05d9d	IB/fmr_pool: Add prefix to all printks Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-06 21:18:11 -07:00
Jean Delvare	6473d160b4	PCI: Cleanup the includes of <linux/pci.h> I noticed that many source files include <linux/pci.h> while they do not appear to need it. Here is an attempt to clean it all up. In order to find all possibly affected files, I searched for all files including <linux/pci.h> but without any other occurence of "pci" or "PCI". I removed the include statement from all of these, then I compiled an allmodconfig kernel on both i386 and x86_64 and fixed the false positives manually. My tests covered 66% of the affected files, so there could be false positives remaining. Untested files are: arch/alpha/kernel/err_common.c arch/alpha/kernel/err_ev6.c arch/alpha/kernel/err_ev7.c arch/ia64/sn/kernel/huberror.c arch/ia64/sn/kernel/xpnet.c arch/m68knommu/kernel/dma.c arch/mips/lib/iomap.c arch/powerpc/platforms/pseries/ras.c arch/ppc/8260_io/enet.c arch/ppc/8260_io/fcc_enet.c arch/ppc/8xx_io/enet.c arch/ppc/syslib/ppc4xx_sgdma.c arch/sh64/mach-cayman/iomap.c arch/xtensa/kernel/xtensa_ksyms.c arch/xtensa/platform-iss/setup.c drivers/i2c/busses/i2c-at91.c drivers/i2c/busses/i2c-mpc.c drivers/media/video/saa711x.c drivers/misc/hdpuftrs/hdpu_cpustate.c drivers/misc/hdpuftrs/hdpu_nexus.c drivers/net/au1000_eth.c drivers/net/fec_8xx/fec_main.c drivers/net/fec_8xx/fec_mii.c drivers/net/fs_enet/fs_enet-main.c drivers/net/fs_enet/mac-fcc.c drivers/net/fs_enet/mac-fec.c drivers/net/fs_enet/mac-scc.c drivers/net/fs_enet/mii-bitbang.c drivers/net/fs_enet/mii-fec.c drivers/net/ibm_emac/ibm_emac_core.c drivers/net/lasi_82596.c drivers/parisc/hppb.c drivers/sbus/sbus.c drivers/video/g364fb.c drivers/video/platinumfb.c drivers/video/stifb.c drivers/video/valkyriefb.c include/asm-arm/arch-ixp4xx/dma.h sound/oss/au1550_ac97.c I would welcome test reports for these files. I am fine with removing the untested files from the patch if the general opinion is that these changes aren't safe. The tested part would still be nice to have. Note that this patch depends on another header fixup patch I submitted to LKML yesterday: [PATCH] scatterlist.h needs types.h http://lkml.org/lkml/2007/3/01/141 Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:35 -07:00
Joachim Fenkes	1912ffbb88	IB: Set class_dev->dev in core for nice device symlink All RDMA drivers except ehca set class_dev->dev to their dma_device value (ehca leaves this unset). dma_device is the only value that makes any sense, so move this assignment to core/sysfs.c. This reduce the duplicated code in the rest of the drivers and gives ehca a nice /sys/class/infiniband/ehcaX/device symlink. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-04-24 21:30:38 -07:00
Hal Rosenstock	de493d47d8	IB/mad: Change SMI to use enums rather than magic return codes Clarify code by changing return values from SMI functions to named enum values instead of magic 0/1 values. Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-04-24 16:31:12 -07:00
Sean Hefty	aeba84a925	IB/umad: Implement GRH handling for sent/received MADs We need to set the SGID index for routed MADs and pass received GRH information to userspace when a MAD is received. Signed-off-by: Sean Hefty <sean.hefty@intel.com>	2007-04-24 16:31:12 -07:00
Sean Hefty	d0e7bb1418	IB/sa: Set src_path_bits correctly in ib_init_ah_from_path() src_path_bits needs to mask off the base LID value. Signed-off-by: Sean Hefty <sean.hefty@intel.com>	2007-04-24 16:31:12 -07:00
Sean Hefty	9d41b7fdea	IB/ucm: Simplify ib_ucm_event() Use wait_event_interruptible() instead of a more complicated open-coded equivalent. Signed-off-by: Sean Hefty <sean.hefty@intel.com>	2007-04-24 16:31:11 -07:00
Sean Hefty	d92f76448c	RDMA/ucma: Simplify ucma_get_event() Use wait_event_interruptible() instead of a more complicated open-coded equivalent. Signed-off-by: Sean Hefty <sean.hefty@intel.com>	2007-04-24 16:31:11 -07:00
Hal Rosenstock	9a4b65e357	IB/umad: Fix declaration of dev_map[] The current ib_umad code never accesses bits past IB_UMAD_MAX_PORTS in dev_map[]. We shouldn't declare it to be twice as big. Pointed-out-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Hal Rosenstock <halr@voltaire.com>	2007-04-18 20:20:53 -07:00
Sean Hefty	3492856e33	RDMA/ucma: Avoid sending reject if backlog is full Change the returned error code to ENOMEM if the connection event backlog is full. This prevents the ib_cm from issuing a reject on the connection, which can allow retries to succeed. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-03-06 14:58:11 -08:00
Sean Hefty	cb164b8c6a	RDMA/cma: Initialize rdma_bind_list in cma_alloc_any_port() The struct rdma_bind_list fields for hlist are not being initialized, resulting in a corrupted list. Fix this by using kzalloc() to make sure all pointers are NULL. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-03-06 12:41:44 -08:00
Sean Hefty	1836854f25	RDMA/cma: Remove unused node_guid from cma_device structure Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-22 17:54:35 -08:00
Sean Hefty	e971b8cd19	IB/cm: Remove ca_guid from cm_device structure The cm_device references an ib_device, which already contains the node_guid. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-22 17:54:33 -08:00
Sean Hefty	962063e64b	RDMA/cma: Request reversible paths only The rdma_cm requires that path records be reversible. Set the reversible bit when issuing an path record query. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-22 17:54:07 -08:00
Sean Hefty	47645d8d25	IB/core: Set hop limit in ib_init_ah_from_wc correctly The hop_limit value in the ah_attr should be 0xFF, not the value read from the received GRH (which should be 0). See 13.5.4.4 in the 1.2 IB spec. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-22 17:54:02 -08:00
Roland Dreier	aaf1aef55f	IB/uverbs: Return correct error for invalid PD in register MR If no matching PD is found in ib_uverbs_reg_mr(), then the function jumps to err_release without setting the return value ret. This means that ret will hold the return value of the call to ib_umem_get() a few lines earlier; if the function reaches the point where it looks for the PD, we know that ib_umem_get() must have returned 0, so ib_uverbs_reg_mr() ends up return 0 for a bad PD ID. Fix this by setting ret to -EINVAL before jumping to the exit path when no PD is found. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-22 13:16:51 -08:00
Roland Dreier	7084f8429c	IB/core: Set static rate in ib_init_ah_from_path() The static rate from the path record should be put into the address vector -- a long time ago the rate in the address attributes needed to be a relative rate, which required more munging, but now that the conversion from absolute to relative is done in the low-level driver, it's easy for ib_init_ah_from_path() to put the absolute rate in. Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-16 15:31:24 -08:00
Roland Dreier	38abaa63bf	IB/core: Fix sparse warnings about shadowed declarations Change a couple of variable names to avoid sparse warnings about symbols being shadowed. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-16 14:41:14 -08:00
Sean Hefty	c8f6a362bf	RDMA/cma: Add multicast communication support Extend rdma_cm to support multicast communication. Multicast support is added to the existing RDMA_PS_UDP port space, as well as a new RDMA_PS_IPOIB port space. The latter port space allows joining the multicast groups used by IPoIB, which enables offloading IPoIB traffic to a separate QP. The port space determines the signature used in the MGID when joining the group. The newly added RDMA_PS_IPOIB also allows for unicast operations, similar to RDMA_PS_UDP. Supporting the RDMA_PS_IPOIB requires changing how UD QPs are initialized, since we can no longer assume that the qkey is constant. This requires saving the Q_Key to use when attaching to a device, so that it is available when creating the QP. The Q_Key information is exported to the user through the existing rdma_init_qp_attr() interface. Multicast support is also exported to userspace through the rdma_ucm. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-16 14:29:07 -08:00
Sean Hefty	faec2f7b96	IB/sa: Track multicast join/leave requests The IB SA tracks multicast join/leave requests on a per port basis and does not do any reference counting: if two users of the same port join the same group, and one leaves that group, then the SA will remove the port from the group even though there is one user who wants to stay a member left. Therefore, in order to support multiple users of the same multicast group from the same port, we need to perform reference counting locally. To do this, add an multicast submodule to ib_sa to perform reference counting of multicast join/leave operations. Modify ib_ipoib (the only in-kernel user of multicast) to use the new interface. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-16 14:20:02 -08:00
Steve Wise	ebb90986e1	RDMA/iwcm: iw_cm_id destruction race fixes iwcm iw_cm_id destruction race condition fixes: - iwcm_deref_id() always wakes up if there's another reference. - clean up race condition in cm_work_handler(). - create static void free_cm_id() which deallocs the work entries and then kfrees the cm_id memory. This reduces code replication. - rem_ref() if this is the last reference -and- the IWCM owns freeing the cm_id, then free it. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-16 13:57:35 -08:00
Tim Schmielau	cd354f1ae7	[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
Linus Torvalds	93bbad8fe1	Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IB/mthca: Always fill MTTs from CPU IB/mthca: Merge MR and FMR space on 64-bit systems IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs IB/mthca: Give reserved MTTs a separate cache line IB/mthca: Fix reserved MTTs calculation on mem-free HCAs RDMA/cxgb3: Add driver for Chelsio T3 RNIC IB: Remove redundant "_wq" from workqueue names RDMA/cma: Increment port number after close to avoid re-use IB/ehca: Fix memleak on module unloading IB/mthca: Work around gcc bug on sparc64 IPoIB: Connected mode experimental support IB/core: Use ARRAY_SIZE macro for mandatory_table IB/mthca: Use correct structure size in call to memset()	2007-02-13 21:16:39 -08:00
Arjan van de Ven	2b8693c061	[PATCH] mark struct file_operations const 3 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:45 -08:00
Sean Hefty	c7f743a669	IB: Remove redundant "_wq" from workqueue names Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-10 08:00:50 -08:00
Sean Hefty	aedec08050	RDMA/cma: Increment port number after close to avoid re-use Randomize the starting port number and avoid re-using port values immediately after they are closed. Instead keep track of the last port value used and increment it every time a new port number is assigned, to better replicate other port spaces. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-10 08:00:50 -08:00
Ahmed S. Darwish	9a6b090c0d	IB/core: Use ARRAY_SIZE macro for mandatory_table Use ARRAY_SIZE() macro already defined in kernel.h instead of open coding equivalent code. Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-10 08:00:47 -08:00
Steve Wise	1f12667021	RDMA/addr: Handle ethernet neighbour updates during route resolution The iWARP connection manager uses the ib_addr services to do route resolution (neighbour discovery in the IP world). The ib_addr netevent callback routine, however, currently only acts on InfiniBand neighbour updates. It needs to act on ethernet neighbour updates as well. This patch just removes filtering on device type altogether and will trigger on any neighour updates where the nud_type is valid. This simplifies the code some. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-04 14:11:57 -08:00
Michael S. Tsirkin	062dbb69f3	IB: Return qp pointer as part of ib_wc struct ib_wc currently only includes the local QP number: this matches the IB spec, but seems mostly useless. The following patch replaces this with the pointer to qp itself, and updates all low level drivers and all users. This has the following advantages: - Ability to get a per-qp context through wc->qp->qp_context - Existing drivers already have the qp pointer ready in poll cq, so this change actually saves a tiny bit (extra memory read) on data path (for ehca it would actually be expensive to find the QP pointer when polling a CQ, but ehca does not support SRQ so we can leave wc->qp as NULL for ehca) - Users that need the QP number can still get it through wc->qp->qp_num Use case: In IPoIB connected mode code, I have a common CQ shared by multiple QPs. To track connection usage, I need a way to get at some per-QP context upon the completion, and I would like to avoid allocating context object per work request just to stick a QP pointer into it. With this code, I can just use wc->qp->qp_context. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-04 14:11:55 -08:00
Sean Hefty	0cefcf0bbc	RDMA/ucma: Don't report events with invalid user context There's a problem with how rdma cm events are reported to userspace that can lead to application crashes. When a new connection request arrives, a context for the connection is allocated in the kernel. The connection event is then reported to userspace. The userspace library retrieves the event and allocates its own context for the connection. The userspace context is associated with the kernel's context when accepting. This allows the kernel to give userspace context with other events. A problem occurs if a second event for the same connection occurs before the user has had a chance to call accept. The userspace context has not yet been set, which causes the librdmacm to crash. (This has been seen when the app takes too long to call accept, resulting in the remote side timing out and rejecting the connection) Fix this by ignoring events for new connections until userspace has set their context. This can only happen if an error occurs on a new connection before the user accepts it. This is okay, since the accept will just fail later. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-01-07 20:20:08 -08:00
Sean Hefty	30a5ec982e	RDMA/ucma: Fix struct ucma_event leak when backlog is full We discard new connection requests while the listen backlog is full, but leak a struct ucma_event in the process. Free the structure in this case. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-01-07 20:17:34 -08:00
Steve Wise	881a045fc5	RDMA/iwcm: iWARP connection timeouts shouldn't be reported as rejects The iWARP CM should report timeouts as event RDMA_CM_EVENT_UNREACHABLE, not event RDMA_CM_EVENT_REJECTED. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-01-07 20:15:58 -08:00
Ralph Campbell	1527106ff8	IB/core: Use the new verbs DMA mapping functions Convert code in core/ to use the new DMA mapping functions for kernel verbs consumers. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-12 14:28:30 -08:00
Sean Hefty	7521663857	RDMA/cma: Export rdma cm interface to userspace Export the rdma cm interfaces to userspace via a misc device. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-12 11:50:22 -08:00
Sean Hefty	628e5f6d39	RDMA/cma: Add support for RDMA_PS_UDP Allow the use of UD QPs through the rdma_cm, in order to provide address translation services for resolving IB addresses for datagram messages using SIDR. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-12 11:50:21 -08:00
Sean Hefty	0fe313b000	RDMA/cma: Allow early transition to RTS to handle lost CM messages During connection establishment, the passive side of a connection can receive messages from the active side before the connection event has been delivered to the user. Allow the passive side to send messages in response to received data before the event is delivered. To handle the case where the connection messages are lost, a new rdma_notify() function is added that users may invoke to force a connection into the established state. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-12 11:50:21 -08:00
Sean Hefty	a1b1b61f80	RDMA/cma: Report connect info with connect events Connection information was never given to the recipient of a connection request or reply message. Only the event was delivered. Report the connection data with the event to allows user to reject the connection based on the requested parameters, or adjust their resources to match the request. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-12 11:50:21 -08:00
Sean Hefty	9b2e9c0c24	RDMA/cma: Remove unneeded qp_type parameter from rdma_cm The qp_type parameter into the rdma_cm is unneeded, and can be misleading. The QP type should be determined from the port space. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-12 11:50:21 -08:00
Roland Dreier	f47e22c6e4	IB/fmr: ib_flush_fmr_pool() may wait too long ib_flush_fmr_pool() stashes away the request generation number properly, but then goes ahead and rereads it every time it tests whether the flush generation number has caught up. This means that there is a theoretical possibility of livelock, if the request generation number keeps getting bumped and the flush generation number never catches up. The fix is simple: use the request generation number read at the beginning of the function. Also, atomic_inc() followed by atomic_read() can be replaced with atomic_int_return(). There's no real requirement for atomicity here but we might as well shrink the code. This bug was discovered using David Binderman's list of "set but never used" warnings from icc. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-12 11:50:19 -08:00
Josef Sipek	1cfd6e648b	[PATCH] struct path: convert infiniband Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:46 -08:00
David Howells	4c1ac1b491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 14:37:56 +00:00
Michael S. Tsirkin	f469b2626f	IB/ucm: Fix deadlock in cleanup ib_ucm_cleanup_events() holds file_mutex while calling ib_destroy_cm_id(). This can deadlock since ib_destroy_cm_id() flushes event handlers, and ib_ucm_event_handler() needs file_mutex, too. Therefore, drop the file_mutex during the call to ib_destroy_cm_id(). Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:10 -08:00
Sean Hefty	e1444b5a16	IB/cm: Fix automatic path migration support The ib_cm_establish() function is replaced with a more generic ib_cm_notify(). This routine is used to notify the CM that failover has occurred, so that future CM messages (LAP, DREQ) reach the remote CM. (Currently, we continue to use the original path) This bumps the userspace CM ABI. New alternate path information is captured when a LAP message is sent or received. This allows QP attributes to be initialized for the user when a new path is loaded after failover occurs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:10 -08:00
Roland Dreier	04699a1f86	RDMA/addr: list_move() cleanups Replace a couple list_del()/list_add() combos with list_move(). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:09 -08:00
Krishna Kumar	c78bb8442b	RDMA/addr: Fix some cancellation problems in process_req() Fix following problems in process_req() relating to cancellation: - Function is wrongly doing another addr_remote() when cancelled, which is not required. - Make failure reporting immediate by using time_after_eq(). - On cancellation, -ETIMEDOUT was returned to the callback routine instead of the more appropriate -ECANCELLED (users getting notified may want to print/return this status, eg ucma_event_handler). Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:09 -08:00
Krishna Kumar	9ab1ffa877	RDMA/iwcm: Fix comment for iwcm_deref_id() to match code In iwcm_deref_id(), the comment says : "If the last reference is being removed and iw_destroy_cm_id is waiting, wake up the waiting thread". The second part of the comment, "and iw_destroy_cm_id is waiting," is wrong, since this function either wakes the waiter already waiting in iwcm_deref_id, or enables it (so that when wait_for_completion() is performed later, it will immediately return). Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:08 -08:00
Krishna Kumar	715a588f42	RDMA/iwcm: Remove unnecessary function argument Remove unnecessary cm_id_priv argument to copy_private_data(), and change text to reflect the code. Fix couple of typos in comments. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:08 -08:00
Krishna Kumar	13fccdb380	RDMA/iwcm: Remove unnecessary initializations Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:08 -08:00
Krishna Kumar	83b9658623	RDMA/iwcm: Fix memory leak If we get IW_CM_EVENT_CONNECT_REQUEST message and encounter an error (not in the LISTEN state, cannot create an id, cannot alloc work_entry, etc), then the memory allocated by cm_event_handler() in the event->private_data gets leaked. Since cm_work_handler has already put the event on the work_free_list, this allocated memory is leaked. High backlog value can allow DoS attacks. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:07 -08:00
Krishna Kumar	33ba0fa9f3	RDMA/iwcm: Fix memory corruption bug in cm_work_handler() Possible memory corruption scenario: after putting the work entry back on the work_free_list, we call process_event() which dereferences work->event, which could have been modified to another value meanwhile. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:07 -08:00
Roland Dreier	e54f81889c	IB: Convert kmem_cache_t -> struct kmem_cache Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:07 -08:00
Dotan Barak	e31353eaec	RDMA/cm: Remove setting local write as part of QP access flags The qp_access_flags are for remote access permissions only, so IB_ACCESS_LOCAL_WRITE is an invalid value. Remove it from the values set by cm_init_qp_init_attr() and cma_init_ib_qp(). Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:06 -08:00
Eric Sesterhenn	bed8bdfddd	IB: kmemdup() cleanup Replace open coded kmemdup() to save some screen space, and allow inlining/not inlining to be triggered by gcc. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:06 -08:00
Krishna Kumar	a1a733f65b	RDMA/cma: Rewrite cma_req_handler() to encapsulate common code Rewrite cma_req_handler error handling case to encapsulate common code. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:05 -08:00
Krishna Kumar	f115db4803	RDMA/addr: Use time_after_eq() instead of time_after() in queue_req() In queue_req(), use time_after_eq() instead of time_after() for following reasons : - Improves insert time if multiple entries with same time are present. - set_timeout need not be called if entry with same time is added to the list (and that happens to be the entry with the smallest time), saving atomic/locking operations. - Earlier entries with same time are deleted first (fifo). Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:05 -08:00
Krishna Kumar	e4022274cf	RDMA/cma: Remove redundant check in cma_add_one Remove redundant check of node_guid in cma_add_one(). Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:04 -08:00
Krishna Kumar	e82153b54d	RDMA/cma: Optimize cma_bind_loopback() to check for empty list Optimize to test for an empty list first. This ends up simplifying the code too. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:04 -08:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
Roland Dreier	39798695b4	IB/mad: Fix race between cancel and receive completion When ib_cancel_mad() is called, it puts the canceled send on a list and schedules a "flushed" callback from process context. However, this leaves a window where a receive completion could be processed before the send is fully flushed. This is fine, except that ib_find_send_mad() will find the MAD and return it to the receive processing, which results in the sender getting both a successful receive and a "flushed" send completion for the same request. Understandably, this confuses the sender, which is expecting only one of these two callbacks, and leads to grief such as a use-after-free in IPoIB. Fix this by changing ib_find_send_mad() to return a send struct only if the status is still successful (and not "flushed"). The search of the send_list already had this check, so this patch just adds the same check to the search of the wait_list. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-13 09:38:07 -08:00
Sean Hefty	7a118df3ea	RDMA/addr: Use client registration to fix module unload race Require registration with ib_addr module to prevent caller from unloading while a callback is in progress. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-02 14:26:04 -08:00
Jack Morgenstein	0b26c88f29	IB/uverbs: Return sq_draining value in query_qp response Return the sq_draining value back to user space for query_qp instead of the en_sqd_async notify value, which is valid only for modify_qp. For query_qp, the draining status should returned. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-30 21:19:35 -08:00
Krishna Kumar	255d0c14b3	RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count rdma_bind_addr() leaks a cma_dev reference count in failure case. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>	2006-10-30 20:52:51 -08:00
Sean Hefty	82a9c16a10	IB/cm: Send DREP in response to unmatched DREQ Currently a DREP is only sent in response to a DREQ if a connection has been found matching the DREQ, and it is in the proper state. Once a DREP is sent, the local connection moves into timewait. Duplicate DREQs received while in this state result in re-sending the DREP. However, it's likely that the local connection will enter and exit timewait before the remote side times out a lost DREP and resends a DREQ. To handle this, we send a DREP in response to a DREQ, even if a local connection is not found. This avoids maintaining disconnected id's in timewait states for excessively long times, just to handle a lost DREP. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-10 12:50:38 -07:00
Sean Hefty	8575329d4f	IB/cm: Fix timewait crash after module unload If the ib_cm module is unloaded while id's are still in timewait, the CM will destroy the work queue used to process timewait. Once the id's exit timewait, their timers will fire, leading to a crash trying to access the destroyed work queue. We need to track id's that are in timewait, and cancel their deferred work on module unload. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-10 12:50:38 -07:00
Krishna Kumar	3f168d2b66	RDMA/cma: Optimize error handling Reorganize code relating to cma_get_net_info() and rdam_create_id() to optimize error case handling (no need to alloc memory/etc. as part of rdma_create_id() if input parameters are wrong). Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-02 14:52:16 -07:00
Krishna Kumar	94de178ac6	RDMA/cma: Eliminate unnecessary remove_list Eliminate remove_list by using list_del_init() instead during device removal handling. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-02 14:52:16 -07:00
Sean Hefty	8f0472d331	RDMA/cma: Set status correctly on route resolution error On reporting a route error, also include the status for the error, rather than indicating a status of 0 when an error has occurred. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-02 14:52:15 -07:00
Krishna Kumar	6e35aabee1	RDMA/cma: Fix device removal race The race is as follows: A process : cma_process_remove() calls cma_remove_id_dev(), which sets id state to CMA_DEVICE_REMOVAL and calls wait_event(dev_remove). B process : cma_req_handler() had incremented dev_remove, and calls cma_acquire_ib_dev() and on failure calls cma_release_remove(), which does a wake_up of cma_process_remove(). Then cma_req_handler() calls rdma_destroy_id(); A Process : cma_remove_id_dev() gets woken and checks the state of id, and since it is still (wrongly) CMA_DEVICE_REMOVAL, it calls notify_user(id) and if that fails, the caller - cma_process_remove() calls rdma_destroy_id(id). Two processes can call rdma_destroy_id(), resulting in one de-referencing kfreed id_priv. Fix is for process B to set CMA_DESTROYING in cma_req_handler() so that process A will return instead of doing a rdma_destroy_id(). Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-02 14:52:15 -07:00
Krishna Kumar	675a027c3d	RDMA/cma: Fix leak of cm_ids in case of failures cma_connect_ib() and cma_connect_iw() leak cm_id's in failure cases. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-02 14:52:15 -07:00
Al Viro	60cad5da57	[IPV4]: annotate inetdev.h helpers inet_confirm_addr(), inet_ifa_byprefix(), ip_dev_find(), inet_make_mask() and inet_ifa_match() annotated, along with inferred net-endian variables Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:01:05 -07:00
Alexey Dobriyan	1a1d92c10d	[PATCH] Really ignore kmem_cache_destroy return value * Rougly half of callers already do it by not checking return value * Code in drivers/acpi/osl.c does the following to be sure: (void)kmem_cache_destroy(cache); * Those who check it printk something, however, slab_error already printed the name of failed cache. * XFS BUGs on failed kmem_cache_destroy which is not the decision low-level filesystem driver should make. Converted to ignore. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:10 -07:00
Al Viro	d7b2004528	[PATCH] missing includes from infiniband merge indirect chains of includes are arch-specific and can't be relied upon... (hell, even attempt to build it for itanic would trigger vmalloc.h ones; err.h triggers on e.g. alpha). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-23 11:34:43 -07:00
Krishna Kumar	9cd330d36b	IB: Fix typo in kerneldoc for ib_set_client_data() Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:58 -07:00
Michael S. Tsirkin	a70d059009	IB/cm: Do not track remote QPN in timewait state Do not track remote QPN in TimeWait state, since QP is not connected. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:53 -07:00
Michael S. Tsirkin	c1a0b23bf4	IB/sa: Require SA registration Require users to register with SA module, to prevent the sa_query module text from going away while an SA query callback is still running. Update all in-tree users for the new interface. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:53 -07:00
Sean Hefty	61a73c708f	RDMA/cma: Protect against adding device during destruction Closes a window where address resolution can attach an rdma_cm_id to a device during destruction of the rdma_cm_id. This can result in the rdma_cm_id remaining in the device list after its memory has been freed. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:49 -07:00
Tom Tucker	07ebafbaaa	RDMA: iWARP Core Changes. Modifications to the existing rdma header files, core files, drivers, and ulp files to support iWARP, including: - Hook iWARP CM into the build system and use it in rdma_cm. - Convert enum ib_node_type to enum rdma_node_type, which includes the possibility of RDMA_NODE_RNIC, and update everything for this. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:47 -07:00
Tom Tucker	922a8e9fb2	RDMA: iWARP Connection Manager. Add an iWARP Connection Manager (CM), which abstracts connection management for iWARP devices (RNICs). It is a logical instance of the xx_cm where xx is the transport type (ib or iw). The symbols exported are used by the transport independent rdma_cm module, and are available also for transport dependent ULPs. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:46 -07:00
Roland Dreier	3cd965646b	IB: Whitespace fixes Remove some trailing whitespace that has snuck in despite the best efforts of whitespace=error-all. Also fix a few other whitespace bogosities. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:46 -07:00
Sean Hefty	f06d265375	IB/cm: Randomize starting comm ID Randomize the starting local comm ID to avoid getting a rejected connection due to a stale connection after a system reboot or reloading of the ib_cm. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:45 -07:00
James Lentini	2b3e258e5d	IB/mad: Remove unused includes The ib_mad module does not use a kthread function, but mad_priv.h includes <linux/kthread.h>. mad_rmpp.c does not do any DMA-related stuff, but includes <linux/dma-mapping.h>. Remove the unused includes. Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:44 -07:00
Sean Hefty	75ab13443e	IB/mad: Add support for dual-sided RMPP transfers. The implementation assumes that any RMPP request that requires a response uses DS RMPP. Based on the RMPP start-up scenarios defined by the spec, this should be a valid assumption. That is, there is no start-up scenario defined where an RMPP request is followed by a non-RMPP response. By having this assumption we avoid any API changes. In order for a node that supports DS RMPP to communicate with one that does not, RMPP responses assume a new window size of 1 if a DS ACK has not been received. (By DS ACK, I'm referring to the turn-around ACK after the final ACK of the request.) This is a slight spec deviation, but is necessary to allow communication with nodes that do not generate the DS ACK. It also handles the case when a response is sent after the request state has been discarded. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:44 -07:00
Sean Hefty	76842405fc	IB/cm: Use correct reject code for invalid GID Set the reject code properly when rejecting a request that contains an invalid GID. A suitable GID is returned by the IB CM in the additional reject information (ARI). This is a spec compliancy issue. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:43 -07:00
Sean Hefty	c1f250c0b4	IB/cm: Enable atomics along with RDMA reads Enable atomic operations along with RDMA reads if a local RDMA read/atomic depth is provided by the user. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:42 -07:00
Ralph Campbell	9bc57e2d19	IB/uverbs: Pass userspace data to modify_srq and modify_qp methods Pass a struct ib_udata to the low-level driver's ->modify_srq() and ->modify_qp() methods, so that it can get to the device-specific data passed in by the userspace driver. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:25 -07:00
Ralph Campbell	64f817ba98	IB/uverbs: Allow resize CQ operation to return driver-specific data Add a ib_uverbs_resize_cq_resp.driver_data field so that low-level drivers can return data from a resize CQ operation to userspace. Have ib_uverbs_resize_cq() only copy the cqe field, to avoid having to bump the userspace ABI. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:24 -07:00
Roland Dreier	1ccf6aa19a	IB/uverbs: Fix lockdep warning when QP is created with 2 CQs Lockdep warns when userspace creates a QP that uses different CQs for send completions and receive completions, because both CQs are locked and their mutexes belong to the same lock class. However, we know that the mutexes are distinct and the nesting is safe (there is no possibility of AB-BA deadlock because the mutexes are locked with down_read()), so annotate the situation with SINGLE_DEPTH_NESTING to get rid of the lockdep warning. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:17:20 -07:00
Roland Dreier	ab10867621	IB/uverbs: Use idr_read_cq() where appropriate There were two functions that open-coded idr_read_cq() in terms of idr_read_uobj() rather than using the helper. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:17:19 -07:00
Michael S. Tsirkin	d5bb75999c	RDMA/cma: Increase the IB CM retry count in CMA 3 seems like a low number of IB Communication Manager retries to set; we see connections failing under stress, and in any case 3 just looks like an arbitrary number. 15 is the max value allowed by the InfiniBand spec. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-14 13:55:30 -07:00
Jack Morgenstein	acaea9ee46	IB/core: Fix SM LID/LID change with client reregister set After commit `12bbb2b7be`, when SM LID change or LID change MAD also has a client reregistration bit set, only CLIENT_REREGISTER event is generated. As a result, the sa_query module and the cache module don't update the port information, and ULPs (e.g. IPoIB) stop working. This is the regression we observe as compared to 2.6.17. Rather than generate multiple events (which would have negative performance impact), let us simply let cache and SA query respond to reregister event in the same way as to LID and SM change events. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-08-16 09:54:47 -07:00
Jack Morgenstein	fd60ae404f	IB/uverbs: Avoid a crash on device hot remove Wait until all users have closed their device context before allowing device unregistration to complete. This prevents a crash caused by referring to stale data structures. A better solution would be to have a way to revoke contexts rather than waiting for userspace to close the context, but that's a much bigger change that will have to wait. For now let's at least avoid the crash. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-08-03 10:56:42 -07:00
Sean Hefty	75df23e229	IB/cm: Fix error handling in ib_send_cm_req Report error code rather than success (0) on failure allocating timewait_info in ib_send_cm_req(). Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-08-03 09:44:22 -07:00
Tom Tucker	e795d09250	[NET] infiniband: Cleanup ib_addr module to use the netevents Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-02 13:38:22 -07:00
Sean Hefty	2527e681fd	IB/mad: Validate MADs for spec compliance Validate MADs sent by userspace clients for spec compliance with C13-18.1.1 (prevent duplicate requests and responses sent on the same port). Without this, RMPP transactions get aborted because of duplicate packets. This patch is similar to that provided by Jack Morgenstein. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-07-24 09:18:07 -07:00
Roland Dreier	43db2bc044	IB/uverbs: Fix lockdep warnings Lockdep warns because uverbs is trying to take uobj->mutex when it already holds that lock. This is because there are really multiple types of uobjs even though all of their locks are initialized in common code. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-07-23 15:16:04 -07:00
Michael S. Tsirkin	ec924b4726	IB/uverbs: Fix unlocking in error paths ib_uverbs_create_ah() and ib_uverbs_create_srq() did not release the PD's read lock in their error paths, which lead to deadlock when destroying the PD. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-07-23 15:16:03 -07:00
Michael S. Tsirkin	e322fedf0c	[PATCH] IB/core: use correct gfp_mask in sa_query Avoid bogus out of memory errors: fix sa_query to actually pass gfp_mask supplied by the user to idr_pre_get. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Acked-by: "Sean Hefty" <mshefty@ichips.intel.com> Acked-by: "Roland Dreier" <rdreier@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:53:51 -07:00
Michael S. Tsirkin	adfaa888a2	[PATCH] fmr pool: remove unnecessary pointer dereference ib_fmr_pool_map_phys gets the virtual address by pointer but never writes there, and users (e.g. srp) seem to assume this and ignore the value returned. This patch cleans up the API to get the VA by value, and updates all users. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Acked-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:53:51 -07:00
Ira Weiny	74f76fbac7	[PATCH] IB/cm: set private data length for reject messages Set private data length for reject messages to the correct size. Fix from openib svn r8483. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:53:50 -07:00
Michael S. Tsirkin	f0ee3404cc	[PATCH] IB/addr: gid structure alignment fix The device address contains unsigned character arrays, which contain raw GID addresses. The GIDs may not be naturally aligned, so do not cast them to structures or unions. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:53:50 -07:00
Michael S. Tsirkin	04c335430f	[PATCH] IB/cm: drop REQ when out of memory If a user of the IB CM returns -ENOMEM from their connection callback, simply drop the incoming REQ - do not attempt to send a reject. This should allow the sender to retry the request. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:53:50 -07:00
Sean Hefty	0d8fdfd71b	IB/core: Set alternate port number when initializing QP attributes Set alternate port number when initializing QP attributes. This bug is OpenFabrics bugzilla bug #160. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-30 14:10:14 -07:00
Roland Dreier	146d26b2bf	IB/uverbs: Set correct user handle for user SRQs Store away the user handle passed in from userspace when creating an SRQ, so that the kernel can return the correct handle when an SRQ asynchronous event occurs. (A 0 was incorrectly stored as the user handle as part of the changes in `9ead190b`, "IB/uverbs: Don't serialize with ib_uverbs_idr_mutex") Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-30 13:40:13 -07:00
Eric Sesterhenn	5fd571cbc1	[PATCH] Array overrun in drivers/infiniband/core/cma.c This was spotted by coverity #id 1300. Since the array has only four elements, we should just use those four. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 11:57:28 -07:00
Akinobu Mita	179e09172a	[PATCH] drivers: use list_move() This patch converts the combination of list_del(A) and list_add(A, B) to list_move(A, B) under drivers/. Acked-by: Corey Minyard <minyard@mvista.com> Cc: Ben Collins <bcollins@debian.org> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Alasdair Kergon <dm-devel@redhat.com> Cc: Gerd Knorr <kraxel@bytesex.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frank Pavlic <fpavlic@de.ibm.com> Acked-by: Matthew Wilcox <matthew@wil.cx> Cc: Andrew Vasquez <linux-driver@qlogic.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:18 -07:00
Linus Torvalds	61b9175808	Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IB/iser: iSER Kconfig and Makefile IB/iser: iSER handling of memory for RDMA IB/iser: iSER RDMA CM (CMA) and IB verbs interaction IB/iser: iSER initiator iSCSI PDU and TX/RX IB/iser: iSCSI iSER transport provider high level code IB/iser: iSCSI iSER transport provider header file IB/uverbs: Remove unnecessary list_del()s IB/uverbs: Don't free wr list when it's known to be empty	2006-06-25 16:07:58 -07:00
David Howells	454e2398be	[PATCH] VFS: Permit filesystem to override root dentry on mount Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: () the get_sb_() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. () If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). () generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[] dentries with child trees. [] Anonymous until discovered from another tree. () The documentation has been adjusted, including the additional bit of changing ext2_ into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-23 07:42:45 -07:00
Roland Dreier	9b8efc0242	IB/uverbs: Remove unnecessary list_del()s In ib_uverbs_cleanup_ucontext(), when iterating through the lists of objects, there's no reason to do list_del() to remove the objects, since both the objects and the lists that contain them are about to be freed anyway. Since list_del() is a moderately big inline function, getting rid of this extra work saves quite a bit of .text: add/remove: 0/0 grow/shrink: 1/2 up/down: 3/-217 (-214) function old new delta ib_uverbs_comp_handler 225 228 +3 ib_uverbs_async_handler 256 255 -1 ib_uverbs_close 905 689 -216 Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-22 07:47:27 -07:00
Krishna Kumar	183208284e	IB/uverbs: Don't free wr list when it's known to be empty In ib_uverbs_post_send(), move the "out:" label after the loop that frees the list of work requests, since the only place that jumps there is before any work requests could possibly be added to the list. This removes a compile warning: "is_ud might be used uninitialized in this function". Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-22 07:47:27 -07:00
Roland Dreier	9ead190bfd	IB/uverbs: Don't serialize with ib_uverbs_idr_mutex Currently, all userspace verbs operations that call into the kernel are serialized by ib_uverbs_idr_mutex. This can be a scalability issue for some workloads, especially for devices driven by the ipath driver, which needs to call into the kernel even for datapath operations. Fix this by adding reference counts to the userspace objects, and then converting ib_uverbs_idr_mutex into a spinlock that only protects the idrs long enough to take a reference on the object being looked up. Because remove operations may fail, we have to do a slightly funky two-step deletion, which is described in the comments at the top of uverbs_cmd.c. This also still leaves ib_uverbs_idr_lock as a single lock that is possibly subject to contention. However, the lock hold time will only be a single idr operation, so multiple threads should still be able to make progress, even if ib_uverbs_idr_lock is being ping-ponged. Surprisingly, these changes even shrink the object code: add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60) Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:44:49 -07:00
Roland Dreier	3463175d6e	IB/uverbs: Factor out common idr code Factor out common code for adding a userspace object to an idr into a function idr_add_uobj(). This shrinks both the source and object code: add/remove: 1/0 grow/shrink: 0/6 up/down: 57/-220 (-163) function old new delta idr_add_uobj - 57 +57 ib_uverbs_create_ah 543 512 -31 ib_uverbs_create_srq 662 630 -32 ib_uverbs_reg_mr 737 699 -38 ib_uverbs_create_cq 639 600 -39 ib_uverbs_alloc_pd 485 446 -39 ib_uverbs_create_qp 1020 979 -41 Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:40 -07:00
Roland Dreier	92b1582268	IB/uverbs: Don't decrement usecnt on error paths In error paths when destroying an object, uverbs should not decrement associated objects' usecnt, since ib_dereg_mr(), ib_destroy_qp(), etc. already do that. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:40 -07:00
Ganapathi CH	77f76013e3	IB/uverbs: Release lock on error path If ibdev->alloc_ucontext() fails then ib_uverbs_get_context() does not unlock file->mutex before returning error. Signed-off by: Ganapathi CH <cganapathi@novell.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:40 -07:00
Sean Hefty	ca222c6b2c	IB/cm: Use address handle helpers Use new ib_init_ah_from_wc() and ib_init_ah_from_path() helper functions to clean up the IB CM. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:40 -07:00
Sean Hefty	6d969a471b	IB/sa: Add ib_init_ah_from_path() Add a call to initialize address handle attributes given a path record. This is used by the CM, and would be useful for users of UD QPs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:39 -07:00
Sean Hefty	4e00d69454	IB: Add ib_init_ah_from_wc() Add a function to initialize address handle attributes from a work completion. This functionality is duplicated by both verbs and the CM. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:39 -07:00
Sean Hefty	75af908851	IB/ucm: Get rid of duplicate P_Key parameter The P_Key is provided into a SIDR REQ in two places, once as a parameter, and again in the path record. Remove the P_Key as a parameter and always use the one given in the path record. This change has no practical effect on ABI functionality. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:39 -07:00
Or Gerlitz	6c8c1aa25d	IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool. When creating a FMR pool, query the IB device and use the returned max_map_map_per_fmr attribute as for the max number of FMR remaps. If the device does not suport querying this attribute, use the original IB_FMR_MAX_REMAPS (32) default. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:37 -07:00
Jack Morgenstein	9874e74655	IB/mad: Check GID/LID when matching requests Check GID/LID for requester side when searching for request which matches received response. This is in order to guarantee uniqueness if the same TID is used when requesting via multiple source LIDs (when LMC is not zero). Use ports' cached LMC to perform the check. Further, do not perform LID check for direct-routed packets, since the permissive LID makes a proper check impossible. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:34 -07:00
Jack Morgenstein	6fb9cdbf2c	IB: Add caching of ports' LMC Add an LMC cache to struct ib_device, and add a function ib_get_cached_lmc() to query the cache. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:34 -07:00
Michael S. Tsirkin	856c256f88	IB/cm: remove unneeded flush_workqueue destroy_workqueue() already does flush_workqueue(). Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com>	2006-06-17 20:37:33 -07:00
Sean Hefty	4be10c1e6d	IB/ucm: convert semaphore to mutex Convert semaphore in ib_ucm_file to a real mutex. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:33 -07:00
Roland Dreier	403a496fd4	IB: Make needlessly global ib_mad_cache static Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:31 -07:00
Sean Hefty	e51060f08a	IB: IP address based RDMA connection manager Kernel connection management agent over InfiniBand that connects based on IP addresses. The agent defines a generic RDMA connection abstraction to support clients wanting to connect over different RDMA devices. The agent also handles RDMA device hotplug events on behalf of clients. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:29 -07:00
Sean Hefty	7025fcd36b	IB: address translation to map IP toIB addresses (GIDs) Add an address translation service that maps IP addresses to InfiniBand GID addresses using IPoIB. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:28 -07:00
Sean Hefty	6e61d04f2d	IB/cm: Match connection requests based on private data Extend matching connection requests to listens in the InfiniBand CM to include private data checks. This allows applications to listen on the same service identifier, with private data directing the request to the appropriate application. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:28 -07:00
Sean Hefty	6a9af2e18a	IB: common handling for marshalling parameters to/from userspace Provide common handling for marshalling data between userspace clients and kernel InfiniBand drivers. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:27 -07:00
Roland Dreier	0cb4fe8d26	IB/uverbs: Don't leak ref to mm on error path In ib_umem_release_on_close(), if the kmalloc() fails, then a reference to current->mm will be leaked. Fix this by adding a mmput() instead of just returning on kmalloc() failure. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-05-17 22:20:50 -07:00
Sean Hefty	1b52fa98ed	IB: refcount race fixes Fix race condition during destruction calls to avoid possibility of accessing object after it has been freed. Instead of waking up a wait queue directly, which is susceptible to a race where the object is freed between the reference count going to 0 and the wake_up(), use a completion to wait in the function doing the freeing. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-05-12 14:57:52 -07:00
Ralph Campbell	d8b9f23b23	IB: Fix display of 4-bit port counters in sysfs The code to display local_link_integrity_errors and excessive_buffer_overrun_errors in /sys/class/infiniband/<hca>/ports/<n>/counters/ uses the wrong shift to extract the 4 bit values. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-05-09 10:50:28 -07:00
Hal Rosenstock	64cb9c6aff	IB/mad: Fix RMPP version check during agent registration Only check that RMPP version is not specified when MAD class does not support RMPP. Just because a class is allowed to use RMPP doesn't mean that rmpp_version needs to be set for the MAD agent to register. Checking this was a recent change which was too pedantic. Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-04-19 11:40:11 -07:00
Michael S. Tsirkin	ce684df05a	IB/cache: Use correct pointer to calculate size When allocating gid_cache, use kmalloc(sizeof gid_cache, ...) rather than kmalloc(sizeof pkey_cache, ...). It doesn't really matter which one is used, since the size ends up the same either way, but it's much better to say what we mean. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-04-10 13:17:43 -07:00
Jack Morgenstein	bf6a9e31cf	IB: simplify static rate encoding Push translation of static rate to HCA format into low-level drivers, where it belongs. For static rate encoding, use encoding of rate field from IB standard PathRecord, with addition of value 0, for backwards compatibility with current usage. The changes are: - Add enum ib_rate to midlayer includes. - Get rid of static rate translation in IPoIB; just use static rate directly from Path and MulticastGroup records. - Update mthca driver to translate absolute static rate into the format used by hardware. This also fixes mthca's static rate handling for HCAs that are capable of 4X DDR. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-04-10 09:43:47 -07:00
Michael S. Tsirkin	37289efe3e	IB/mad: fix oops in cancel_mads We have seen the following OOPs in cancel_mads, when restarting opensm multiple times: Call Trace: [<c010549b>] show_stack+0x9b/0xb0 [<c01055ec>] show_registers+0x11c/0x190 [<c01057cd>] die+0xed/0x160 [<c031b966>] do_page_fault+0x3f6/0x5d0 [<c010511f>] error_code+0x4f/0x60 [<f8ac4e38>] cancel_mads+0x128/0x150 [ib_mad] [<f8ac2811>] unregister_mad_agent+0x11/0x130 [ib_mad] [<f8ac2a12>] ib_unregister_mad_agent+0x12/0x20 [ib_mad] [<f8b10f23>] ib_umad_close+0xf3/0x130 [ib_umad] [<c0162937>] __fput+0x187/0x1c0 [<c01627a9>] fput+0x19/0x20 [<c0160f7a>] filp_close+0x3a/0x60 [<c0121ca8>] put_files_struct+0x68/0xa0 [<c0103cf7>] do_signal+0x47/0x100 [<c0103ded>] do_notify_resume+0x3d/0x40 [<c0103f9e>] work_notifysig+0x13/0x25 We traced this back to local_completions unlocking mad_agent_priv->lock while still keeping a pointer into local_list. A later call to list_del(&local->completion_list) would then corrupt the list. To fix this, remove the entry from local_list after looking it up but before releasing mad_agent_priv->lock, to prevent cancel_mads from finding and freeing it. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-04-02 14:39:19 -07:00
Hal Rosenstock	618a3c03fc	IB/mad: RMPP support for additional classes Add RMPP support for additional management classes that support it. Also, validate RMPP is consistent with management class specified. Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-30 07:19:51 -08:00
Jack Morgenstein	fa9656bbd9	IB/mad: include GID/class when matching receives Received responses are currently matched against sent requests based on TID only. According to the spec, responses should match based on the combination of TID, management class, and requester LID/GID. Without the additional qualification, an agent that is responding to two requests, both of which have the same TID, can match RMPP ACKs with the incorrect transaction. This problem can occur on the SM node when responding to SA queries. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-30 07:19:48 -08:00
Michael S. Tsirkin	dc05980dd7	IB/mad: Fix oopsable race on device removal Fix an oopsable race debugged by Eli Cohen <eli@mellanox.co.il>: After removing the port from port_list, ib_mad_port_close flushes port_priv->wq before destroying the special QPs. This means that a completion event could arrive, and queue a new work in this work queue after flush. This patch also removes an unnecessary flush_workqueue(): destroy_workqueue() already includes a flush. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:25 -08:00
Roland Dreier	048975ac58	IB: Coverity fixes to sysfs.c Fix two bugs found by coverity: - Memory leak in error path of alloc_group_attrs() - Fencepost error in state_show(): the test should be < ARRAY_SIZE(), not <= ARRAY_SIZE(). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:25 -08:00
Ami Perlmutter	702b2aaccf	IB/uverbs: Use correct alt_pkey_index in modify QP The old code incorrectly used the primary P_Key index as the alternate index too. Signed-off-by: Ami Perlmutter <amip@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:24 -08:00
Jack Morgenstein	f36e1793e2	IB/umad: Add support for large RMPP transfers Add support for sending and receiving large RMPP transfers. The old code supports transfers only as large as a single contiguous kernel memory allocation. This patch uses linked list of memory buffers when sending and receiving data to avoid needing contiguous pages for larger transfers. Receive side: copy the arriving MADs in chunks instead of coalescing to one large buffer in kernel space. Send side: split a multipacket MAD buffer to a list of segments, (multipacket_list) and send these using a gather list of size 2. Also, save pointer to last sent segment, and retrieve requested segments by walking list starting at last sent segment. Finally, save pointer to last-acked segment. When retrying, retrieve segments for resending relative to this pointer. When updating last ack, start at this pointer. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:23 -08:00
Sean Hefty	87fd1a11ae	IB/cm: Check cm_id state before handling a REP Move checking the state of a cm_id before modifying it when handling a REP. This fixes a bug seen under MPI scale-up testing, where a NULL timewait_info pointer is dereferenced if a request times out before a REP is received. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:23 -08:00
Dotan Barak	27d5630064	IB/uverbs: Fix query QP return of sq_sig_all The old code didn't convert from the kernel's enum correctly. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:21 -08:00
Dotan Barak	4546d31d84	IB: Fix modify QP checking of "current QP state" attribute According to the IB spec version 1.2, section 11.2.4.2, the current table has a couple of mistakes where it allows the current QP state (IB_QP_CUR_STATE) attribute. For the transitions: RTS -> RTS: IB_QP_CUR_STATE should be allowed for all transports SQD -> SQD: IB_QP_CUR_STATE should never be allowed Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:20 -08:00
Dotan Barak	ea88fd16d6	IB/uverbs: Return actual capacity from create SRQ operation Pass actual capacity of created SRQ back to userspace, so that userspace can report accurate capacities. This requires an ABI bump, to change struct ib_uverbs_create_srq_resp. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:16 -08:00
Dotan Barak	8bdb0e8632	IB/uverbs: Support for query SRQ from userspace Add support to uverbs to handle querying userspace SRQs (shared receive queues), including adding an ABI for marshalling requests and responses. The kernel midlayer already has the underlying ib_query_srq() function. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:14 -08:00
Dotan Barak	7ccc9a24e0	IB/uverbs: Support for query QP from userspace Add support to uverbs to handle querying userspace QPs (queue pairs), including adding an ABI for marshalling requests and responses. The kernel midlayer already has the underlying ib_query_qp() function. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:14 -08:00
Roland Dreier	a74cd4af0b	IB: Whitespace cleanups Remove trailing whitespace and fix indentation that with spaces instead of tabs. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:13 -08:00
Roland Dreier	8a51866f08	IB: Add ib_modify_qp_is_ok() library function The in-kernel mthca driver contains a table of which attributes are valid for each queue pair state transition. It turns out that both other IB drivers -- ipath and ehca -- which are being prepared for merging have copied this table, errors and all. To forestall this code duplication, move this table and the code to check parameters against it into a midlayer library function, ib_modify_qp_is_ok(). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:12 -08:00
Ralph Campbell	5e9f71a16c	IB/mad: Simplify SMI by eliminating smi_check_local_dr_smp() The call to ib_get_agent_port() shouldn't be possible to fail when smi_check_local_dr_smp() is called from ib_mad_recv_done_handler(). When it is called from handle_outgoing_dr_smp(), the device and port_num come from mad_agent_priv so I assume the call to ib_get_agent_port() shouldn't fail either. In either case, smi_check_local_smp() only uses the mad_agent pointer to check that mad_agent->device->process_mad is not NULL. The device pointer would have to be the same as the one passed to smi_check_local_dr_smp() since that pointer is used later instead of the one checked in smi_check_local_smp(). Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:11 -08:00
Ralph Campbell	5f0b67e0d5	IB/mad: Remove redundant check from smi_check_local_dr_smp() smi_check_local_dr_smp() is called only from two places in core/mad.c It returns 0 or 1. In smi_check_local_dr_smp(), it checks for a directed route SMP but this function is only called when the SMP is a directed route so this is a NOP. Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:11 -08:00
Or Gerlitz	d36f34aadf	IB: Enable FMR pool user to set page size This patch allows the consumer to set the page size of "pages" mapped by the pool FMRs, which is a feature already existing in the base verbs API. On the cosmetic side it changes ib_fmr_attr.page_size field to be named page_shift. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:10 -08:00
Roland Dreier	c5bcbbb9fe	IB: Allow userspace to set node description Expose a writable "node_desc" sysfs attribute for InfiniBand devices. This allows userspace to update the node description with information such as the node's hostname, so that IB network management software can tie its view to the real world. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:09 -08:00
Roland Dreier	33b9b3ee97	IB: Add userspace support for resizing CQs Add support to uverbs to handle resizing userspace CQs (completion queues), including adding an ABI for marshalling requests and responses. The kernel midlayer already has ib_resize_cq(). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:07 -08:00
Linus Torvalds	f78cf0dc7b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband	2006-02-14 13:49:37 -08:00
Greg Kroah-Hartman	68f5f99634	[PATCH] IB: fix up major/minor sysfs interface for IB core Current IB code doesn't work with userspace programs that listen only to the kernel event netlink socket as it is trying to create its own dev interface. This small patch fixes this problem, and removes some unneeded code as the driver core handles this logic for you automatically. Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-02-06 12:17:17 -08:00
Ralph Campbell	8cf3f04f45	IB/mad: Handle DR SMPs with a LID routed part Fix handling of directed route SMPs with a beginning or ending LID routed part. Signed-off-by: Ralph Campbell <ralphc@pathscale.com> Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-02-03 14:28:48 -08:00
Michael S. Tsirkin	0f47ae0b3e	IB/sa_query: Flush scheduled work before unloading module sa_query schedules work on IB asynchronous events. After unregistering the async event handler, make sure that this work has completed before releasing the IB device (and possibly allowing the sa_query module text to go away). Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-01-17 09:53:51 -08:00
Michael S. Tsirkin	cc76e33ec9	IB/uverbs: Flush scheduled work before unloading module uverbs might schedule work to clean up when a file is closed. Make sure that this work runs before allowing module text to go away. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-01-17 09:41:47 -08:00
Arjan van de Ven	858119e159	[PATCH] Unlinline a bunch of other functions Remove the "inline" keyword from a bunch of big functions in the kernel with the goal of shrinking it by 30kb to 40kb Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-14 18:27:06 -08:00
Ingo Molnar	95ed644fd1	IB: convert from semaphores to mutexes semaphore to mutex conversion by Ingo and Arjan's script. Signed-off-by: Ingo Molnar <mingo@elte.hu> [ Sanity-checked on real IB hardware ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-01-13 14:51:39 -08:00
Sean Hefty	cf311cd49a	IB: Add node_guid to struct ib_device Add a node_guid field to struct ib_device. It is the responsibility of the low-level driver to initialize this field before registering a device with the midlayer. Convert everyone to looking at this field instead of calling ib_query_device() when all they want is the node GUID, and remove the node_guid field from struct ib_device_attr. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-01-10 07:39:34 -08:00
Linus Torvalds	5367f2d67c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband	2006-01-08 20:18:44 -08:00
Ralph Campbell	4f8448dfe8	IB: Set GIDs correctly in ib_create_ah_from_wc() ib_create_ah_from_wc() doesn't create the correct return address (AH) when there is a GRH present (source & dest GIDs need to be swapped). Signed-off-by: Ralph Campbell <ralphc@pathscale.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-01-06 16:43:47 -08:00
Jack Morgenstein	ac4e7b3557	IB/uverbs: Release event file reference on ib_uverbs_create_cq() error ib_uverbs_create_cq() should release the completion channel event file if an error occurs after it looks it up. Also, if userspace asks for a completion channel and we don't find it, an error should be returned instead of silently creating a CQ without a completion channel. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-01-06 16:43:14 -08:00
Ralph Campbell	ea5d4a6ad2	IB/uverbs: set ah_flags when creating address handle AH attribute's ah_flags need to be set according to the is_global flag passed in from userspace. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-01-06 16:24:45 -08:00
Jack Morgenstein	b4ca1a3f8c	IB/uverbs: Fix reference counting on error paths If an operation fails after incrementing an object's reference count, then it should decrement the reference count on the error path. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-01-06 16:21:19 -08:00
Kay Sievers	312c004d36	[PATCH] driver core: replace "hotplug" by "uevent" Leave the overloaded "hotplug" word to susbsystems which are handling real devices. The driver core does not "plug" anything, it just exports the state to userspace and generates events. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-01-04 16:18:08 -08:00
Jack Morgenstein	0efc4883a6	IB/umad: fix memory leaks Don't leak packet if it had a timeout, and don't leak timeout struct if queue_packet() fails. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-12-09 13:46:32 -08:00
Sean Hefty	de1bb1a64c	IB/cm: avoid reusing local ID Use an increasing local ID to avoid re-using identifiers while messages may still be outstanding on the old ID. Without this, a quick connect-disconnect-connect sequence can fail by matching messages for the new connection with the old connection. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-30 10:01:13 -08:00
Sean Hefty	227eca8369	IB/cm: correct reported reject code Change reject code from TIMEOUT to CONSUMER_REJECT when destroying a cm_id in the process of connecting. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-30 10:00:25 -08:00
Jack Morgenstein	f4e401562c	IB/uverbs: track multicast group membership for userspace QPs uverbs needs to track which multicast groups is each qp attached to, in order to properly detach when cleanup is performed on device file close. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-29 16:57:01 -08:00
Michael S. Tsirkin	bf6d9e23a3	IB/umad: fix RMPP handling ib_umad_write in user_mad.c is looking at rmpp_hdr field in MAD before checking that the MAD actually has the RMPP header. So for a MAD without RMPP header it looks like we are actually checking a bit inside M_Key, or something. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-28 13:07:20 -08:00
Adrian Bunk	2012a116d9	[PATCH] drivers/infiniband/core/mad.c: fix use-after-release case The Coverity checker spotted this obvious use-after-release bug caused by a wrong order of the cleanups. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-11-27 20:23:13 -08:00
Roland Dreier	eabc77935d	IB/umad: make sure write()s have sufficient data Make sure that userspace passes in enough data when sending a MAD. We always copy at least sizeof (struct ib_user_mad) + IB_MGMT_RMPP_HDR bytes from userspace, so anything less is definitely invalid. Also, if the length is less than this limit, it's possible for the second copy_from_user() to get a negative length and trigger a BUG(). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-18 14:18:26 -08:00
Linus Torvalds	78b9c0f91c	Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband	2005-11-10 13:27:06 -08:00
Roland Dreier	94382f3562	[IB] umad: further ib_unregister_mad_agent() deadlock fixes The previous umad deadlock fix left ib_umad_kill_port() still vulnerable to deadlocking. This patch fixes that by downgrading our lock to a read lock when we might end up trying to reacquire the lock for reading. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-10 10:22:51 -08:00
Jack Morgenstein	77369ed31d	[IB] uverbs: have kernel return QP capabilities Move the computation of QP capabilities (max scatter/gather entries, max inline data, etc) into the kernel, and have the uverbs module return the values as part of the create QP response. This keeps precise knowledge of device limits in the low-level kernel driver. This requires an ABI bump, so while we're making changes, get rid of the max_sge parameter for the modify SRQ command -- it's not used and shouldn't be there. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-10 10:22:50 -08:00
Roland Dreier	ec914c52d6	[IB] umad: get rid of unused mr array Now that ib_umad uses the new MAD sending interface, it no longer needs its own L_Key. So just delete the array of MRs that it keeps. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-10 10:22:50 -08:00
Roland Dreier	40de2e548c	[IB] Have cq_resize() method take an int, not int* Change the struct ib_device.resize_cq() method to take a plain integer that holds the new CQ size, rather than a pointer to an integer that it uses to return the new size. This makes the interface match the exported ib_resize_cq() signature, and allows the low-level driver to update the CQ size with proper locking if necessary. No in-tree drivers are exporting this method yet. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-10 10:22:50 -08:00
Roland Dreier	2f76e82947	[IB] umad: avoid potential deadlock when unregistering MAD agents ib_unregister_mad_agent() completes all pending MAD sends and waits for the agent's send_handler routine to return. umad's send_handler() calls queue_packet(), which does down_read() on the port mutex to look up the agent ID. This means that the port mutex cannot be held for writing while calling ib_unregister_mad_agent(), or else it will deadlock. This patch fixes all the calls to ib_unregister_mad_agent() in the umad module to avoid this deadlock. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-10 10:22:50 -08:00
Jesper Juhl	6044ec8882	[PATCH] kfree cleanup: misc remaining drivers This is the remaining misc drivers/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in misc files in drivers/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Aristeu Sergio Rozanski Filho <aris@cathedrallabs.org> Acked-by: Roland Dreier <rolandd@cisco.com> Acked-by: Pierre Ossman <drzeus@drzeus.cx> Acked-by: Jean Delvare <khali@linux-fr.org> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Len Brown <len.brown@intel.com> Acked-by: "Antonino A. Daplas" <adaplas@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-11-07 07:54:05 -08:00
Tim Schmielau	8c65b4a604	[PATCH] fix remaining missing includes Fix more include file problems that surfaced since I submitted the previous fix-missing-includes.patch. This should now allow not to include sched.h from module.h, which is done by a followup patch. Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-11-07 07:53:41 -08:00
Michael S. Tsirkin	8b37b94721	[IB] umad: two small fixes Two small fixes for the umad module: - set kobject name for issm device properly - in ib_umad_add_one(), s is subtracted from the index i when initializing ports, so s should be subtracted from the index when freeing ports in the error path as well. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-06 15:47:02 -08:00
Linus Torvalds	ba77df570c	Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband	2005-11-04 16:31:54 -08:00
Roland Dreier	0c99cb6d5f	[IB] umad: fix hot remove of IB devices Fix hotplug of devices for ib_umad module: when a device goes away, kill off all MAD agents for open files associated with that device, and make sure that the device is not touched again after ib_umad returns from its remove_one function. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-03 12:01:18 -08:00
Roland Dreier	de6eb66b56	[IB] kzalloc() conversions Replace kmalloc()+memset(,0,) with kzalloc(), for a net savings of 35 source lines and about 500 bytes of text. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-02 07:23:14 -08:00
Roland Dreier	7162a3e0db	[IB] uverbs: Avoid NULL pointer deref on CQ async event Userspace CQs that have no completion event channel attached end up with their cq_context set to NULL. However, asynchronous events like "CQ overrun" can still occur on such CQs, so add a uverbs_file member to struct ib_ucq_object that we can follow to deliver these events. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-31 07:10:32 -08:00
Tim Schmielau	4e57b68178	[PATCH] fix missing includes I recently picked up my older work to remove unnecessary #includes of sched.h, starting from a patch by Dave Jones to not include sched.h from module.h. This reduces the number of indirect includes of sched.h by ~300. Another ~400 pointless direct includes can be removed after this disentangling (patch to follow later). However, quite a few indirect includes need to be fixed up for this. In order to feed the patches through -mm with as little disturbance as possible, I've split out the fixes I accumulated up to now (complete for i386 and x86_64, more archs to follow later) and post them before the real patch. This way this large part of the patch is kept simple with only adding #includes, and all hunks are independent of each other. So if any hunk rejects or gets in the way of other patches, just drop it. My scripts will pick it up again in the next round. Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-30 17:37:32 -08:00
Al Viro	2d3c0b7bed	[PATCH] missing include in infiniband use of IS_ERR/PTR_ERR in infiniband/core/agent.c, without a portable chain of includes pulling err.h (breaks on a bunch of platforms). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-29 10:35:07 -07:00
Roland Dreier	4cce3390c9	[IB] fix up class_device_create() calls Fix class_device_create() calls to match the new prototype which takes a parent device pointer. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-28 16:38:15 -07:00
Roland Dreier	70a30e16a8	[IB] uverbs: Fix device lifetime problems Move ib_uverbs module to using cdev_alloc() and class_device_create() so that we can handle device lifetime properly. Now we can make sure we keep all of our data structures around until the last way to reach them is gone. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-28 15:38:26 -07:00
Roland Dreier	a74968f8c3	[IB] umad: Fix device lifetime problems Move ib_umad module to using cdev_alloc() and class_device_create() so that we can handle device lifetime properly. Now we can make sure we keep all of our data structures around until the last way to reach them is gone. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-28 15:37:23 -07:00
Sean Hefty	cb0f0910f4	[IB] ib_umad: various cleanups Simplify user_mad.c code in a few places, and convert from kmalloc() + memset() to kzalloc(). This also fixes a theoretical race window by not accessing packet->length after posting the send buffer (the send could complete and packet could be freed before we get to the return statement at the end of ib_umad_write()). Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-27 20:48:11 -07:00
Roland Dreier	089a1bedd8	[IB] ib_umad: fix crash when freeing send buffers The conversion of user_mad.c to the new MAD send API was slightly off: in a few places, we used packet->msg instead of packet->msg->mad when referring to the actual data buffer, which ended up corrupting the underlying data structure and crashing when we free an invalid pointer. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-27 20:33:43 -07:00
Roland Dreier	7cc656efb5	[IB] simplify mad_rmpp.c:alloc_response_msg() Change alloc_response_msg() in mad_rmpp.c to return the struct it allocates directly (or an error code a la ERR_PTR), rather than returning a status and passing the struct back in a pointer param. This simplifies the code and gets rid of warnings like drivers/infiniband/core/mad_rmpp.c: In function nack_recv: drivers/infiniband/core/mad_rmpp.c:192: warning: msg may be used uninitialized in this function with newer versions of gcc. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-25 15:13:54 -07:00
Sean Hefty	34816ad98e	[IB] Fix MAD layer DMA mappings to avoid touching data buffer once mapped The MAD layer was violating the DMA API by touching data buffers used for sends after the DMA mapping was done. This causes problems on non-cache-coherent architectures, because the device doing DMA won't see updates to the payload buffers that exist only in the CPU cache. Fix this by having all MAD consumers use ib_create_send_mad() to allocate their send buffers, and moving the DMA mapping into the MAD layer so it can be done just before calling send (and after any modifications of the send buffer by the MAD layer). Tested on a non-cache-coherent PowerPC 440SPe system. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-25 10:51:39 -07:00
Sean Hefty	ae7971a770	[IB] CM: Fix initialization of QP attributes for UC QPs. Fix cm_init_qp_init_attr(), cm_init_qp_rtr_attr() and cm_init_qp_rts_attr() so that they correctly handle the differences between UC and RC QPs. This fixes problems with setting up UC QPs through the CM. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-24 12:33:56 -07:00
Roland Dreier	ec329a1359	Manual merge of for-linus to upstream (fix conflicts in drivers/infiniband/core/ucm.c)	2005-10-24 10:55:29 -07:00
Roland Dreier	5d7edb3c1a	[IB] Add idr_destroy() calls on module unload Add idr_destroy() calls to the module_exit() functions of the four IB driver modules that use idrs, so we don't leak idr_layer_cache objects when these modules are unloaded. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-24 10:53:25 -07:00
Roland Dreier	bbf2078609	[IB] user_mad: Use class_device.devt Use devt member of struct class_device so that we don't have to create our own "dev" file in sysfs. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-20 12:54:01 -07:00
Roland Dreier	2e0c512aff	[IB] user_mad: trivial coding style fixes Add spaces after "sizeof" operator to match the rest of file. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-20 12:30:16 -07:00
Roland Dreier	3910f44d79	[IB] cm: Add missing break in switch Add missing "break" in switch statement. Without the break, the CM ended up always falling through and setting every connection request to use RC transport, which meant that UC connections didn't work. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-20 12:29:36 -07:00
Roland Dreier	ba8e931024	[IB] Fail sysfs queries after device is unregistered We keep IB device structures around until the last sysfs reference is gone, but we shouldn't ask the low-level driver to do anything after the LLD unregisters the device. To handle this, check the reg_state field and just fail sysfs show() requests if the device has already been unregistered. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-18 14:14:56 -07:00
Roland Dreier	762a03e21e	[IB] ucm: quiet sparse warnings Make ctx_id_mutex and ctx_id_table static to quiet sparse warnings. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-17 15:38:50 -07:00
Sean Hefty	07d357d0cb	[IB] CM: bind IDs to a specific device Bind communication identifiers to a device to support device removal. Export per HCA CM devices to userspace. Signed-off-by: Sean Hefty <sean.hefty@intel.com>	2005-10-17 15:37:43 -07:00

... 7 8 9 10 11 ...

934 Commits