Commit Graph

2331 Commits

Author SHA1 Message Date
Stefani Seibold e64c026dd0 kfifo: cleanup namespace
change name of __kfifo_* functions to kfifo_*, because the prefix __kfifo
should be reserved for internal functions only.

Signed-off-by: Stefani Seibold <stefani@seibold.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22 14:17:56 -08:00
Stefani Seibold c1e13f2567 kfifo: move out spinlock
Move the pointer to the spinlock out of struct kfifo.  Most users in
tree do not actually use a spinlock, so the few exceptions now have to
call kfifo_{get,put}_locked, which takes an extra argument to a
spinlock.

Signed-off-by: Stefani Seibold <stefani@seibold.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22 14:17:56 -08:00
Stefani Seibold 4546548789 kfifo: move struct kfifo in place
This is a new generic kernel FIFO implementation.

The current kernel fifo API is not very widely used, because it has to
many constrains.  Only 17 files in the current 2.6.31-rc5 used it.
FIFO's are like list's a very basic thing and a kfifo API which handles
the most use case would save a lot of development time and memory
resources.

I think this are the reasons why kfifo is not in use:

 - The API is to simple, important functions are missing
 - A fifo can be only allocated dynamically
 - There is a requirement of a spinlock whether you need it or not
 - There is no support for data records inside a fifo

So I decided to extend the kfifo in a more generic way without blowing up
the API to much.  The new API has the following benefits:

 - Generic usage: For kernel internal use and/or device driver.
 - Provide an API for the most use case.
 - Slim API: The whole API provides 25 functions.
 - Linux style habit.
 - DECLARE_KFIFO, DEFINE_KFIFO and INIT_KFIFO Macros
 - Direct copy_to_user from the fifo and copy_from_user into the fifo.
 - The kfifo itself is an in place member of the using data structure, this save an
   indirection access and does not waste the kernel allocator.
 - Lockless access: if only one reader and one writer is active on the fifo,
   which is the common use case, no additional locking is necessary.
 - Remove spinlock - give the user the freedom of choice what kind of locking to use if
   one is required.
 - Ability to handle records. Three type of records are supported:
   - Variable length records between 0-255 bytes, with a record size
     field of 1 bytes.
   - Variable length records between 0-65535 bytes, with a record size
     field of 2 bytes.
   - Fixed size records, which no record size field.
 - Preserve memory resource.
 - Performance!
 - Easy to use!

This patch:

Since most users want to have the kfifo as part of another object,
reorganize the code to allow including struct kfifo in another data
structure.  This requires changing the kfifo_alloc and kfifo_init
prototypes so that we pass an existing kfifo pointer into them.  This
patch changes the implementation and all existing users.

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22 14:17:55 -08:00
Linus Torvalds bac5e54c29 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
  direct I/O fallback sync simplification
  ocfs: stop using do_sync_mapping_range
  cleanup blockdev_direct_IO locking
  make generic_acl slightly more generic
  sanitize xattr handler prototypes
  libfs: move EXPORT_SYMBOL for d_alloc_name
  vfs: force reval of target when following LAST_BIND symlinks (try #7)
  ima: limit imbalance msg
  Untangling ima mess, part 3: kill dead code in ima
  Untangling ima mess, part 2: deal with counters
  Untangling ima mess, part 1: alloc_file()
  O_TRUNC open shouldn't fail after file truncation
  ima: call ima_inode_free ima_inode_free
  IMA: clean up the IMA counts updating code
  ima: only insert at inode creation time
  ima: valid return code from ima_inode_alloc
  fs: move get_empty_filp() deffinition to internal.h
  Sanitize exec_permission_lite()
  Kill cached_lookup() and real_lookup()
  Kill path_lookup_open()
  ...

Trivial conflicts in fs/direct-io.c
2009-12-16 12:04:02 -08:00
Linus Torvalds e69381b417 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
  RDMA/cxgb3: Fix error paths in post_send and post_recv
  RDMA/nes: Fix stale ARP issue
  RDMA/nes: FIN during MPA startup causes timeout
  RDMA/nes: Free kmap() resources
  RDMA/nes: Check for zero STag
  RDMA/nes: Fix Xansation test crash on cm_node ref_count
  RDMA/nes: Abnormal listener exit causes loopback node crash
  RDMA/nes: Fix crash in nes_accept()
  RDMA/nes: Resource not freed for REJECTed connections
  RDMA/nes: MPA request/response error checking
  RDMA/nes: Fix query of ORD values
  RDMA/nes: Fix MAX_CM_BUFFER define
  RDMA/nes: Pass correct size to ioremap_nocache()
  RDMA/nes: Update copyright and branding string
  RDMA/nes: Add max_cqe check to nes_create_cq()
  RDMA/nes: Clean up struct nes_qp
  RDMA/nes: Implement IB_SIGNAL_ALL_WR as an iWARP extension
  RDMA/nes: Add additional SFP+ PHY uC status check and PHY reset
  RDMA/nes: Correct fast memory registration implementation
  IB/ehca: Fix error paths in post_send and post_recv
  ...
2009-12-16 10:32:31 -08:00
Al Viro 2c48b9c455 switch alloc_file() to passing struct path
... and have the caller grab both mnt and dentry; kill
leak in infiniband, while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:42 -05:00
Roland Dreier 14f369d1d6 Merge branches 'amso1100', 'cma', 'cxgb3', 'ehca', 'ipath', 'ipoib', 'iser', 'misc', 'mlx4' and 'nes' into for-next 2009-12-15 23:39:25 -08:00
Frank Zago 48617f862f RDMA/cxgb3: Fix error paths in post_send and post_recv
Always set bad_wr when an immediate error is detected.  Return ENOMEM
for queue full instead of EINVAL to match other drivers.

Signed-off-by: Frank Zago <fzago@systemfabricworks.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-15 23:39:10 -08:00
Linus Torvalds d0316554d3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
  m68k: rename global variable vmalloc_end to m68k_vmalloc_end
  percpu: add missing per_cpu_ptr_to_phys() definition for UP
  percpu: Fix kdump failure if booted with percpu_alloc=page
  percpu: make misc percpu symbols unique
  percpu: make percpu symbols in ia64 unique
  percpu: make percpu symbols in powerpc unique
  percpu: make percpu symbols in x86 unique
  percpu: make percpu symbols in xen unique
  percpu: make percpu symbols in cpufreq unique
  percpu: make percpu symbols in oprofile unique
  percpu: make percpu symbols in tracer unique
  percpu: make percpu symbols under kernel/ and mm/ unique
  percpu: remove some sparse warnings
  percpu: make alloc_percpu() handle array types
  vmalloc: fix use of non-existent percpu variable in put_cpu_var()
  this_cpu: Use this_cpu_xx in trace_functions_graph.c
  this_cpu: Use this_cpu_xx for ftrace
  this_cpu: Use this_cpu_xx in nmi handling
  this_cpu: Use this_cpu operations in RCU
  this_cpu: Use this_cpu ops for VM statistics
  ...

Fix up trivial (famous last words) global per-cpu naming conflicts in
	arch/x86/kvm/svm.c
	mm/slab.c
2009-12-14 09:58:24 -08:00
Linus Torvalds 4ef58d4e2a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
  tree-wide: fix misspelling of "definition" in comments
  reiserfs: fix misspelling of "journaled"
  doc: Fix a typo in slub.txt.
  inotify: remove superfluous return code check
  hdlc: spelling fix in find_pvc() comment
  doc: fix regulator docs cut-and-pasteism
  mtd: Fix comment in Kconfig
  doc: Fix IRQ chip docs
  tree-wide: fix assorted typos all over the place
  drivers/ata/libata-sff.c: comment spelling fixes
  fix typos/grammos in Documentation/edac.txt
  sysctl: add missing comments
  fs/debugfs/inode.c: fix comment typos
  sgivwfb: Make use of ARRAY_SIZE.
  sky2: fix sky2_link_down copy/paste comment error
  tree-wide: fix typos "couter" -> "counter"
  tree-wide: fix typos "offest" -> "offset"
  fix kerneldoc for set_irq_msi()
  spidev: fix double "of of" in comment
  comment typo fix: sybsystem -> subsystem
  ...
2009-12-09 19:43:33 -08:00
Linus Torvalds 382f51fe2f Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (222 commits)
  [SCSI] zfcp: Remove flag ZFCP_STATUS_FSFREQ_TMFUNCNOTSUPP
  [SCSI] zfcp: Activate fc4s attributes for zfcp in FC transport class
  [SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED
  [SCSI] zfcp: Update FSF error reporting
  [SCSI] zfcp: Improve ELS ADISC handling
  [SCSI] zfcp: Simplify handling of ct and els requests
  [SCSI] zfcp: Remove ZFCP_DID_MASK
  [SCSI] zfcp: Move WKA port to zfcp FC code
  [SCSI] zfcp: Use common code definitions for FC CT structs
  [SCSI] zfcp: Use common code definitions for FC ELS structs
  [SCSI] zfcp: Update FCP protocol related code
  [SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport
  [SCSI] zfcp: Assign scheduled work to driver queue
  [SCSI] zfcp: Remove STATUS_COMMON_REMOVE flag as it is not required anymore
  [SCSI] zfcp: Implement module unloading
  [SCSI] zfcp: Merge trace code for fsf requests in one function
  [SCSI] zfcp: Access ports and units with container_of in sysfs code
  [SCSI] zfcp: Remove suspend callback
  [SCSI] zfcp: Remove global config_mutex
  [SCSI] zfcp: Replace local reference counting with common kref
  ...
2009-12-09 19:42:25 -08:00
Faisal Latif 7a576dfd9e RDMA/nes: Fix stale ARP issue
When the remote node's ethernet address changes, the connection keeps
trying to connect using the old address.  The connection wil continue
failing until the driver is unloaded and loaded again (eiter reboot or
rmmod).  Fix this by checking that the NIC has the correct address
before starting a connection.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:54:33 -08:00
Faisal Latif b1190d3e0d RDMA/nes: FIN during MPA startup causes timeout
A FIN that is received during an MPA start up sequence causes a
timeout in iwcm.c.  The connection has not been completely closed so
the iwcm code is waiting for resources to be cleaned up.  This closes
the connection so everything cleans up correctly.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:54:32 -08:00
Faisal Latif d2fa9b26e1 RDMA/nes: Free kmap() resources
We fail when creating many qps as kmap() fails for sq_vbase.
Fix this by doing kunmap() as soon as we are done with sq_vbase.
We do kunmap() in one of the locations below:

(1) nes_destroy_qp()
(2) nes_accept()
(3) nes_connect_event

We keep a flag to avoid multiple calls to kunmap().

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:54:28 -08:00
Faisal Latif fd000e12a5 RDMA/nes: Check for zero STag
STags are generated randomly but the driver does not correctly prevent
a zero STag.  Using STag zero is privileged and causes a user space
application to fail.  This change prevents the driver from trying to
allocate a zero STag.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:54:23 -08:00
Faisal Latif 886f98a315 RDMA/nes: Fix Xansation test crash on cm_node ref_count
While running a Xansation test, an active side node crashed.  The
problem started on the passive side, which generated an STtag that was
0.  The passive side sent a TERMINATE instead of an MPA REJECT msg.
The active side, receives TERMINATE and sends connect_err() and set
the cm_node state to CLOSED.  The passive side sends FIN + ACK after
TERMINATE.  Active side ends up in handle_ack_pkt() and send_reset().
send_reset() consumes 1 cm_node's ref_count.  Because the cm_node is
in CLOSED state, which means that cm_node will be destroyed after
completion of the connect_err() indication, CM will crash after
send_reset().

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:54:18 -08:00
Faisal Latif f9f3f1e08b RDMA/nes: Abnormal listener exit causes loopback node crash
When the listener is destroyed for a loopback connection, the listener
node gets a reset event.  This causes a crash as the listener is not
expecting a reset event.  Code review of cm_event_reset() during
debugging showed the cm_id ref count is incremented after calling its
event handler and not before.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:54:14 -08:00
Faisal Latif c5a7d48971 RDMA/nes: Fix crash in nes_accept()
While running IMP_EXT's window test, we saw a crash in nes_accept().
Here is the sequence of what happened:

(1) In MVAPICH2, connect request is received for port #0.

FIX:  Add a nes_connect() check to make sure local or remote tcp port
      is not 0.

(2) Remote node's (passive) TCP stack sends a reset when it gets a
    connect request because of port = 0.  Active side set the connect
    error to IW_CM_EVENT_STATUS_REJECTED when it received the RST from
    remote node.

FIX: The corect error code is -ECONNRESET.

(3) Wrong error code of IW_CM_EVENT_STATUS_REJECTED causes the core to
    destroy its listener ports.  Here there are connections that may
    have sent an MPA request up and waiting for accept or reject.  But
    the listener and its cm_nodes have been freed already causing the
    crash noticed.

FIX: The cm_node is freed only if its state is not
     NES_CM_STATE_MPAREQ_RCVD.  If cm_node's state is
     NES_CM_STATE_MPAREQ_RCVD then its new state is set to
     NES_CM_STATE_LISTENER_DESTROYED and it is not freed.  When
     nes_accept() or nes_reject() is received, its state is checked
     for NES_CM_STATE_LISTENER_DESTROYED and in this case the cm_node
     is freed and error is returned.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:54:08 -08:00
Faisal Latif 69524e1aff RDMA/nes: Resource not freed for REJECTed connections
During testing of REJECT connection error handling, we saw that the
cm_id resources are not released.  When the retransmit timer expires,
we need to send a reset message to remote node before issuing the
ABORTED event.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:54:03 -08:00
Faisal Latif 1cf078c995 RDMA/nes: MPA request/response error checking
During Xansation testing, we saw that error handling of MPA frame
msg/response is not handled properly.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:53:54 -08:00
Faisal Latif 8ac7f6e1af RDMA/nes: Fix query of ORD values
The ORD size needs updating as we are supporting more inbound READ
resources per connection.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:53:46 -08:00
Faisal Latif 9b84dbe7f4 RDMA/nes: Fix MAX_CM_BUFFER define
Change MAX_CM_BUFFER for MPA frames to be conformant to RFC 5044:
we need 512 + 20 instead of 512.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:53:36 -08:00
Julia Lawall d85ddd835b RDMA/nes: Pass correct size to ioremap_nocache()
The size argument to ioremap_nocache should be the size of desired
information, not the pointer to it.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@expression@
expression *x;
@@

x =
 <+...
*sizeof(x)
...+>// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:21:57 -08:00
Chien Tung fa6c87d510 RDMA/nes: Update copyright and branding string
Update copyright from Intel-NE, Inc. to Intel Corporation.  Use proper
branding string in Kconfig and simplify description.

Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:21:56 -08:00
Chien Tung 5924aea6e2 RDMA/nes: Add max_cqe check to nes_create_cq()
Add a check to nes_create_cq() to return -EINVAL if creating a CQ with
depth > max_cqe (32766).

Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:21:56 -08:00
Chien Tung 75742c630e RDMA/nes: Clean up struct nes_qp
Remove unused and not really used variables.

Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:21:56 -08:00
Chien Tung d14152da13 RDMA/nes: Implement IB_SIGNAL_ALL_WR as an iWARP extension
Add IB_SINGAL_ALL_WR support as an iWARP extension.  If set, make sure
all WR for the QP are signalled.  Consolidate flags used in nesqp
structure.

Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:21:56 -08:00
Chien Tung a276510328 RDMA/nes: Add additional SFP+ PHY uC status check and PHY reset
Add additional PHY uC status check in case PHY firmware is not running
properly with heartbeat.  Add a hard PHY reset if uC status is 0x0
after initial reset.

Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:21:56 -08:00
Chien Tung e293a26fe9 RDMA/nes: Correct fast memory registration implementation
Replace alloc_fmr, unmap_fmr, dealloc_fmr and map_phys_fmr with
alloc_fast_reg_mr, alloc_fast_reg_page_list, free_fast_reg_page_list.

Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:21:54 -08:00
Frank Zago e147de0361 IB/ehca: Fix error paths in post_send and post_recv
Always set bad_wr when an immediate error is detected.  Do not report
success if an error occurred.

Signed-off-by: Frank Zago <fzago@systemfabricworks.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 15:07:25 -08:00
Frank Zago c597b0240b RDMA/amso1100: Fix error paths in post_send and post_recv
Always set bad_wr when an immediate error is detected.

Signed-off-by: Frank Zago <fzago@systemfabricworks.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 14:56:11 -08:00
Roel Kluin df42245a3c IB/uverbs: Fix return of PTR_ERR() of wrong pointer in ib_uverbs_get_context()
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 14:30:44 -08:00
Chien Tung 649fe4aeab RDMA/nes: Add support for IB_WR_*INV
Add support for IB_WR_SEND_WITH_INV, IB_WR_RDMA_READ_WITH_INV
and IB_WR_LOCAL_INV.

Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 13:51:37 -08:00
Frank Zago 4293fdc115 RDMA/nes: In nes_post_recv() always set bad_wr on error
On error, set bad_wr in nes_post_recv().  Stop processing ib_wr queue
when an error is detected.

Signed-off-by: Frank Zago <fzago@systemfabricworks.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 13:51:36 -08:00
Frank Zago e5dec39474 RDMA/nes: In nes_post_send() always set bad_wr on error
On error, set bad_wr in nes_post_send().  Stop processing ib_wr queue
when an error is detected.

Signed-off-by: Frank Zago <fzago@systemfabricworks.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 13:51:36 -08:00
Alexander Schmidt 9420269428 IB/ehca: Rework destroy_eq()
The ibmebus_free_irq() function, which might sleep, was called with
interrupts disabled.  To fix this, make sure that no interrupts are
running by killing the interrupt tasklet.  Also lock the
shca_list_lock to protect against the poll_eqs_timer running
concurrently.

Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 10:11:04 -08:00
Akinobu Mita 598cb6f327 IB/ipath: Use bitmap_weight()
Use bitmap_weight() instead of finding all set bits in bitmap by hand.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Ralph Campbell <infinipath@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 10:05:28 -08:00
David J. Wilder 0cd4d0fd9b IPoIB: Clear ipoib_neigh.dgid in ipoib_neigh_alloc()
IPoIB can miss a change in destination GID under some conditions.  The
problem is caused when ipoib_neigh->dgid contains a stale address.
The fix is to set ipoib_neigh->dgid to zero in ipoib_neigh_alloc().

This can happen when a system using bonding on its IPoIB interfaces
has switched its active interface from interface A to B and back to A.
The system that fails over will not correctly processes the 2nd
address change, as described below.

When an address has changed neighbor->ha is updated with the new
address.  Each neighbor has an associated ipoib_neigh.
ipoib_neigh->dgid also holds a copy of the remote node's hardware
address.  When an address changes neighbor->ha is updated by the
network layer (arp code) with the new address.  IPoIB detects this
change in ipoib_start_xmit() by comparing neighbor->ha with
ipoib_neigh->dgid.  The bug is that ipoib_neigh->dgid may already
contain the new address (A) thus the change from B to A is missed by
ipoib.  Here is the sequence of events:

    ipoib_neigh->dgid = A  and  neighbor->ha = A

The address is switched to B (the first switch)

    neighbor->ha = B

The change is seen in ipoib_start_xmit() -- neighbor->ha !=
ipoib_neigh->dgid so ipoib_neigh is released, and a new one is
allocated.

The allocator may return the same chunk of memory that was just
released, therefore ipoib_neigh->dgid still contains A at this point.

ipoib_neigh->dgid should be updated in neigh_add_path(), but if the
following conditions are true dgid is not updated:

        1) __path_find() returns a path
        2) path->ah is NULL

The remote system now switches from address B to A, neighbor->ha is
updated to A.

Now we have again : ipoib_neigh->dgid = A  and  neighbor->ha = A

Since the addresses are the same ipoib won't process the change in
address.  Fix this by zeroing out the dgid field when allocating a new
struct ipoib_neigh.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 10:03:00 -08:00
Linus Torvalds 18821b0408 Merge branch 'bkl-drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'bkl-drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  agp: Remove the BKL from agp_open
  inifiband: Remove BKL from ipath_open()
  mips: Remove BKL from tb0219
  drivers: Remove BKL from scx200_gpio
  drivers: Remove BKL from pc8736x_gpio
  parisc: Remove BKL from eisa_eeprom
  rtc: Remove BKL from efirtc
  input: Remove BKL from hp_sdc_rtc
  hw_random: Remove BKL from core
  macintosh: Remove BKL from ans-lcd
  nvram: Drop the bkl from non-generic nvram_llseek()
  nvram: Drop the bkl from nvram_llseek()
  mem_class: Drop the bkl from memory_open()
  spi: Remove BKL from spidev_open
  drivers: Remove BKL from cs5535_gpio
  drivers: Remove BKL from misc_open
2009-12-09 08:07:38 -08:00
Mike Christie b20d038dff [SCSI] iser: set tgt and lu reset timeout
When iser enabled lu reset support it did not set the
bit to allow userspace to get/set the timeout. This
sets the tgt and lu reset timeout bits.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-12-04 12:01:38 -06:00
André Goddard Rosa af901ca181 tree-wide: fix assorted typos all over the place
That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 15:39:55 +01:00
Thadeu Lima de Souza Cascardo 94e2bd6888 tree-wide: fix some typos and punctuation in comments
fix some typos and punctuation in comments

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 15:39:48 +01:00
Sean Hefty d14714df61 IB/addr: Fix IPv6 routing lookup
Include link scope as part of address resolution.  Combine local
and remote address resolution into a single, simpler code path.
Fix error checking in the IPv6 routing lookups.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Fix up cma_check_linklocal() for !IPV6 case.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 16:46:25 -08:00
Sean Hefty 923c100ef0 IB/addr: Simplify resolving IPv4 addresses
Merge resolve local/remote address resolution into a single
data flow to ensure consistent access and use of the local routing
tables.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 13:26:51 -08:00
Sean Hefty 6f8372b69c RDMA/cm: fix loopback address support
The RDMA CM is intended to support the use of a loopback address
when establishing a connection; however, the behavior of the CM
when loopback addresses are used is confusing and does not always
work, depending on whether loopback was specified by the server,
the client, or both.

The defined behavior of rdma_bind_addr is to associate an RDMA
device with an rdma_cm_id, as long as the user specified a non-
zero address.  (ie they weren't just trying to reserve a port)
Currently, if the loopback address is passed to rdam_bind_addr,
no device is associated with the rdma_cm_id.  Fix this.

If a loopback address is specified by the client as the destination
address for a connection, it will fail to establish a connection.
This is true even if the server is listing across all addresses or
on the loopback address itself.  The issue is that the server tries
to translate the IP address carried in the REQ message to a local
net_device address, which fails.  The translation is not needed in
this case, since the REQ carries the actual HW address that should
be used.

Finally, cleanup loopback support to be more transport neutral.
Replace separate calls to get/set the sgid and dgid from the
device address to a single call that behaves correctly depending
on the format of the device address.  And support both IPv4 and
IPv6 address formats.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Fixed RDS build by s/ib_addr_get/rdma_addr_get/  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 13:26:06 -08:00
Sean Hefty c4315d85f9 IB/addr: Store net_device type instead of translating to RDMA transport
The struct rdma_dev_addr stores net_device address information:
the source device address, destination hardware address, and
broadcast address.  For consistency, store the net_device type
rather than converting it to the rdma_node_type.

The type indicates the format of the various hardware addresses,
which is what we're concerned with, and not the RDMA node type
that the address may map to.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:57:18 -08:00
Sean Hefty d2e0886245 IB/addr: Verify source and destination address families match
If a source address is provided, verify that the address family matches
that of the destination address.  If the source is not specified, use the
same address family as the destination.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:55:22 -08:00
Sean Hefty 6266ed6e41 RDMA/cma: Replace net_device pointer with index
Provide the device interface when resolving route information to
ensure that the correct outbound device is used.  This will also
simplify processing of sin6_scope_id for IPv6 support.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthrope@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:55:22 -08:00
Jason Gunthorpe e2e626972e RDMA/cma: Fix AF_INET6 support in multicast joining
If joining to an AF_INET6 address, we need to map the address to a MGID
in the same way as the IP stack.  The old code would just fall through to
the IPv4 case and generate garbage.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:55:22 -08:00
Jason Gunthorpe 1c9b281997 RDMA/cma: Correct detection of SA Created MGID
RDMA CM treats AF_INET6 addresses that are either 0 or prefixed with
FF1x:A01B::/32 as MGIDs, but the detection for the prefix was buggy;
fix it up.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:55:21 -08:00
David S. Miller 3505d1a9fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/sfc/sfe4001.c
	drivers/net/wireless/libertas/cmd.c
	drivers/staging/Kconfig
	drivers/staging/Makefile
	drivers/staging/rtl8187se/Kconfig
	drivers/staging/rtl8192e/Kconfig
2009-11-18 22:19:03 -08:00
Eric Dumazet 0f9ea5d2ab RDMA/addr: Use appropriate locking with for_each_netdev()
for_each_netdev() should be used with RTNL or dev_base_lock held,
or else we risk a crash.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-18 14:24:34 -08:00
Sean Hefty a7ca1f00ed RDMA/ucma: Add option to manually set IB path
Export rdma_set_ib_paths to user space to allow applications to
manually set the IB path used for connections.  This allows
alternative ways for a user space application or library to obtain
path record information, including retrieving path information
from cached data, avoiding direct interaction with the IB SA.
The IB SA is a single, centralized entity that can limit scaling
on large clusters running MPI applications.

Future changes to the rdma cm can expand on this framework to
support the full range of features allowed by the IB CM, such as
separate forward and reverse paths and APM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-16 09:30:33 -08:00
Or Gerlitz c1ccaf2478 IB/iser: Rewrite SG handling for RDMA logic
After dma-mapping an SG list provided by the SCSI midlayer, iser has
to make sure the mapped SG is "aligned for RDMA" in the sense that its
possible to produce one mapping in the HCA IOMMU which represents the
whole SG. Next, the mapped SG is formatted for registration with the HCA.

This patch re-writes the logic that does the above, to make it clearer
and simpler. It also fixes a bug in the being aligned for RDMA checks,
where a "start" check wasn't done but rather only "end" check.

Signed-off-by: Alexander Nezhinsky <alexandern@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-12 11:32:27 -08:00
Eli Cohen 417608c20a IB/mlx4: Remove limitation on LSO header size
Current code has a limitation: an LSO header is not allowed to cross a
64 byte boundary.  This patch removes this limitation by setting the
WQE RR for large headers thus allowing LSO headers of any size.  The
extra buffer reserved for MLX4_IB_QP_LSO QPs has been doubled, from 64
to 128 bytes, assuming this is reasonable upper limit for header
length.  Also, this patch will cause IB_DEVICE_UD_TSO to be set only
for HCA FW versions that set MLX4_DEV_CAP_FLAG_BLH; e.g. FW version
2.6.000 and higher.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-12 11:19:44 -08:00
Eli Cohen ecdc428e4c IB/mlx4: Remove unneeded code
There is no such flag DE - the field is reserved and should be zero.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-12 11:14:13 -08:00
Uwe Kleine-Knig 21ae2956ce tree-wide: fix typos "aquire" -> "acquire", "cumsumed" -> "consumed"
This patch was generated by

	git grep -E -i -l '[Aa]quire' | xargs -r perl -p -i -e 's/([Aa])quire/$1cquire/'

and the cumsumed was found by checking the diff for aquire.

Signed-off-by: Uwe Kleine-Knig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-11-09 09:40:57 +01:00
Thomas Gleixner f96d3015e9 inifiband: Remove BKL from ipath_open()
cycle_kernel_lock() got pushed down to ipath_open(). I tried hard to
understand what it might protect, but finally gave up.

Roland noted that qlogic seems to have abandoned the ipath driver and
came to the following wise conclusion: "So I guess if the BKL stuff is
blocking you in any way, we can just drop it from ipath and leave it
as yet another race condition in a rotting old driver."

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <adad44tj090.fsf@cisco.com>
Cc: Roland Dreier <rdreier@cisco.com>
2009-10-14 17:36:54 +02:00
Alexey Dobriyan d43c36dc6b headers: remove sched.h from interrupt.h
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-10-11 11:20:58 -07:00
Linus Torvalds 69585dd69e Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (34 commits)
  [SCSI] qla2xxx: Fix NULL ptr deref bug in fail path during queue create
  [SCSI] st: fix possible memory use after free after MTSETBLK ioctl
  [SCSI] be2iscsi: Moving to pci_pools v3
  [SCSI] libiscsi: iscsi_session_setup to allow for private space
  [SCSI] be2iscsi: add 10Gbps iSCSI - BladeEngine 2 driver
  [SCSI] zfcp: Fix hang when offlining device with offline chpid
  [SCSI] zfcp: Fix lockdep warning when offlining device with offline chpid
  [SCSI] zfcp: Fix oops during shutdown of offline device
  [SCSI] zfcp: Fix initial device and cfdc for delayed adapter allocation
  [SCSI] zfcp: correctly initialize unchained requests
  [SCSI] mpt2sas: Bump version 02.100.03.00
  [SCSI] mpt2sas: Support dev remove when phy status is MPI2_EVENT_SAS_TOPO_PHYSTATUS_VACANT
  [SCSI] mpt2sas: Timeout occurred within the HANDSHAKE logic while waiting on firmware to ACK.
  [SCSI] mpt2sas: Call init_completion on a per request basis.
  [SCSI] mpt2sas: Target Reset will be issued from Interrupt context.
  [SCSI] mpt2sas: Added SCSIIO, Internal and high priority memory pools to support multiple TM
  [SCSI] mpt2sas: Copyright change to 2009.
  [SCSI] mpt2sas: Added mpi2_history.txt for MPI2 headers.
  [SCSI] mpt2sas: Update driver to MPI2 REV K headers.
  [SCSI] bfa: Brocade BFA FC SCSI driver
  ...
2009-10-11 11:12:33 -07:00
Roland Dreier 335f2d1b24 Merge branches 'cxgb3', 'misc' and 'mlx4' into for-next 2009-10-07 16:03:32 -07:00
David J. Wilder 85f20b39fd RDMA/addr: Fix resolution of local IPv6 addresses
This patch allows a local IPv6 address to be resolved by rdma_cm.

To reproduce the problem:

 $ rping -s -v -a ::0  &
 $ rping -c -v -a <IPv6 address local to this system>
 rdma_resolve_addr error -1

Local IPv6 address was obtained with "ip addr show ib0"

Addresses: https://bugs.openfabrics.org/show_bug.cgi?id=1759
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-10-07 16:03:18 -07:00
Steve Wise e5da4ed8a4 RDMA/cxgb3: Handle NULL inetdev pointer in iwch_query_port()
in_dev_get() can return NULL.  If it does, iwch_query_port() will crash.
Handle the NULL case by mapping it to port state INIT.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-10-07 15:51:07 -07:00
Steve Wise 54e05f15cc RDMA/iwcm: Don't call provider reject func with irqs disabled
In commit cb58160e ("RDMA/iwcm: Reject the connection when the cm_id
is destroyed") a call to the provider's reject handler was added to
destroy_cm_id() to fix a provider endpoint leak.  This call needs to
be done with interrupts enabled.  So unlock and relock around this
call.  This is safe because:

1) the provider will do nothing with this endpoint until the iwcm either
   accepts or rejects.
2) the lock is only released after the iwcm state is changed, so an
   errant iwcm app that is destroying -and- rejecting the connection
   concurrently will get a failure on one of the calls.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-10-07 15:38:12 -07:00
Ben Hutchings 15f0a394c6 net: Convert ethtool {get_stats, self_test}_count() ops to get_sset_count()
These string query operations were supposed to be replaced by the
generic get_sset_count() starting in 2007.  Convert the remaining
implementations.

Also remove calls to these operations to initialise drvinfo->n_stats.
The ethtool core code already does that.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:10:10 -07:00
Alexey Dobriyan a99bbaf5ee headers: remove sched.h from poll.h
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-04 15:05:10 -07:00
Christoph Lameter ca0c9584b1 this_cpu: Straight transformations
Use this_cpu_ptr and __this_cpu_ptr in locations where straight
transformations are possible because per_cpu_ptr is used with
either smp_processor_id() or raw_smp_processor_id().

cc: David Howells <dhowells@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
cc: Ingo Molnar <mingo@elte.hu>
cc: Rusty Russell <rusty@rustcorp.com.au>
cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-10-03 19:48:22 +09:00
Jayamohan Kallickal b8b9e1b812 [SCSI] libiscsi: iscsi_session_setup to allow for private space
This patch contains changes that allow iscsi_session_setup
to allocate private space for LLD's

Signed-off-by: Jayamohan Kallickal <jayamohank@serverengines.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-02 14:01:39 -05:00
Alexey Dobriyan f0f37e2f77 const: mark struct vm_struct_operations
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27 11:39:25 -07:00
Linus Torvalds d7757be133 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Don't turn on carrier for a non-active port
  IB/mthca: Fix access to freed memory in catastrophic event handling
  mlx4_core: Pass cache line size to device FW
  RDMA/nes: Remove duplicate .ndo_set_mac_address field initialization
  IB/mad: Fix lock-lock-timer deadlock in RMPP code
2009-09-24 17:06:01 -07:00
Roland Dreier 216c7f92b9 Merge branches 'ipoib', 'mad', 'mlx4', 'mthca' and 'nes' into for-linus 2009-09-24 12:43:08 -07:00
Moni Shoua 5ee9512084 IPoIB: Don't turn on carrier for a non-active port
Multicast joins can succeed even if the IB port is down.  This happens
when the SM runs on the same port with the requesting port.  However,
IPoIB calls netif_carrier_on() when the join of the broadcast group
succeeds, without caring about the state of the IB port.  The result
is an IPoIB interface in RUNNING state but without an active IB port
to support it.

If a bonding interface uses this IPoIB interface as a slave it might
not detect that this slave is almost useless and failover
functionality will be damaged.  The fix checks the state of the IB
port in the carrier_task before calling netif_carrier_on().

Adresses: https://bugs.openfabrics.org/show_bug.cgi?id=1726
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-24 12:01:05 -07:00
Jack Morgenstein d686159e50 IB/mthca: Fix access to freed memory in catastrophic event handling
catas_reset() uses a pointer to mthca_dev, but mthca_dev is not valid
after the call to __mthca_restart_one().

Based on a similar patch for mlx4 (634354d7, "mlx4: Fix access to
freed memory") by Vitaliy Gusev <vgusev@openvz.org>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-24 11:55:41 -07:00
Julia Lawall bdf643816a RDMA/nes: Remove duplicate .ndo_set_mac_address field initialization
The definition of nes_netdev_ops has initializations of a local function
and eth_mac_addr for its ndo_set_mac_address field.  This change uses only
the local function.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier I, s, fld;
position p0,p;
expression E;
@@

struct I s =@p0 { ... .fld@p = E, ...};

@s@
identifier I, s, r.fld;
position r.p0,p;
expression E;
@@

struct I s =@p0 { ... .fld@p = E, ...};

@script:python@
p0 << r.p0;
fld << r.fld;
ps << s.p;
pr << r.p;
@@

if int(ps[0].line)!=int(pr[0].line) or int(ps[0].column)!=int(pr[0].column):
  cocci.print_main(fld,p0)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-24 10:59:34 -07:00
Roland Dreier 0e442afd92 IB/mad: Fix lock-lock-timer deadlock in RMPP code
Holding agent->lock across cancel_delayed_work() (which does
del_timer_sync()) in ib_cancel_rmpp_recvs() leads to lockdep reports of
possible lock-timer deadlocks if a consumer ever does something that
connects agent->lock to a lock taken in IRQ context (cf
http://marc.info/?l=linux-rdma&m=125243699026045).

Fix this by changing the list items to a new state "CANCELING" while
holding the lock, and then canceling the delayed work without holding
the lock.  If the delayed work runs after the lock is dropped, it will
see the state is CANCELING and return immediately, so the list will
stay stable while we traverse it with the lock not held.

Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-23 11:10:15 -07:00
KAMEZAWA Hiroyuki 908eedc616 walk system ram range
Originally, walk_memory_resource() was introduced to traverse all memory
of "System RAM" for detecting memory hotplug/unplug range.  For doing so,
flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
memory hotplug.

But for using other purpose, /proc/kcore, this may includes some firmware
area marked as IORESOURCE_BUSY | IORESOUCE_MEM.  This patch makes the
check strict to find out busy "System RAM".

Note: PPC64 keeps their own walk_memory_resouce(), which walk through
ppc64's lmb informaton.  Because old kclist_add() is called per lmb, this
patch makes no difference in behavior, finally.

And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
Because pfn_valid() just show "there is memmap or not* and cannot be used
for "there is physical memory or not", this function is useful in generic
to scan physical memory range.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:41 -07:00
Anand Gadiyar 411c940385 trivial: fix typo "for for" in multiple files
trivial: fix typo "for for" in multiple files

Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-21 15:14:54 +02:00
David Brownell a4dbd6740d driver model: constify attribute groups
Let attribute group vectors be declared "const".  We'd
like to let most attribute metadata live in read-only
sections... this is a start.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15 09:50:47 -07:00
Linus Torvalds d7e9660ad9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
  netxen: update copyright
  netxen: fix tx timeout recovery
  netxen: fix file firmware leak
  netxen: improve pci memory access
  netxen: change firmware write size
  tg3: Fix return ring size breakage
  netxen: build fix for INET=n
  cdc-phonet: autoconfigure Phonet address
  Phonet: back-end for autoconfigured addresses
  Phonet: fix netlink address dump error handling
  ipv6: Add IFA_F_DADFAILED flag
  net: Add DEVTYPE support for Ethernet based devices
  mv643xx_eth.c: remove unused txq_set_wrr()
  ucc_geth: Fix hangs after switching from full to half duplex
  ucc_geth: Rearrange some code to avoid forward declarations
  phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
  drivers/net/phy: introduce missing kfree
  drivers/net/wan: introduce missing kfree
  net: force bridge module(s) to be GPL
  Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
  ...

Fixed up trivial conflicts:

 - arch/x86/include/asm/socket.h

   converted to <asm-generic/socket.h> in the x86 tree.  The generic
   header has the same new #define's, so that works out fine.

 - drivers/net/tun.c

   fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
   switched over to using 'tun->socket.sk' instead of the redundantly
   available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
   to the TUN driver") which added a new 'tun->sk' use.

   Noted in 'next' by Stephen Rothwell.
2009-09-14 10:37:28 -07:00
Roland Dreier 73f526da02 Merge branch 'mad' into for-linus
Conflicts:
	drivers/infiniband/core/mad.c
2009-09-10 21:19:45 -07:00
Roland Dreier 45c448a1c0 Merge branches 'cxgb3', 'ehca', 'ipath', 'ipoib', 'misc', 'mlx4', 'mthca' and 'nes' into for-linus 2009-09-10 21:18:07 -07:00
Steve Wise cb58160e72 RDMA/iwcm: Reject the connection when the cm_id is destroyed
If the cm_id of a connect request is destroyed prior to the ULP
accepting or rejecting the connection, then the provider never cleans
up the connection.  The iwcm should explicitly reject these
connections if the cm_id is destroyed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-09 11:37:38 -07:00
Steve Wise ffc40c6433 RDMA/cxgb3: Clean up properly on FW mismatch failures
FW mismatches can cause a crash in the iw_cxgb3 event handler.

- NULL the t3cdev->ulp pointer on failures in cxio_rdev_open()
- Silently ignore events when the ulp ptr is NULL in iwch_err_handler()

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-09 11:25:56 -07:00
Steve Wise 13a239330a RDMA/cxgb3: Don't ignore insert_handle() failures
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-09 11:25:55 -07:00
Hal Rosenstock b76aabc395 IB/mad: Allow tuning of QP0 and QP1 sizes
MADs are UD and can be dropped if there are no receives posted, so
allow receive queue size to be set with a module parameter in case the
queue needs to be lengthened.  Send side tuning is done for symmetry
with receive.

Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-07 08:28:48 -07:00
Roland Dreier 6b2eef8fd7 IB/mad: Fix possible lock-lock-timer deadlock
Lockdep reported a possible deadlock with cm_id_priv->lock,
mad_agent_priv->lock and mad_agent_priv->timed_work.timer; this
happens because the mad module does

	cancel_delayed_work(&mad_agent_priv->timed_work);

while holding mad_agent_priv->lock.  cancel_delayed_work() internally
does del_timer_sync(&mad_agent_priv->timed_work.timer).

This can turn into a deadlock because mad_agent_priv->lock is taken
inside cm_id_priv->lock, so we can get the following set of contexts
that deadlock each other:

 A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
 B: holding mad_agent_priv->lock, waiting for del_timer_sync()
 C: interrupt during mad_agent_priv->timed_work.timer that takes
    cm_id_priv->lock

Fix this by using the new __cancel_delayed_work() interface (which
internally does del_timer() instead of del_timer_sync()) in all the
places where we are holding a lock.

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757
Reported-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-07 08:27:50 -07:00
Chien Tung cd1d3f7abe RDMA/nes: Map MTU to IB_MTU_* and correctly report link state
Old query_port code reports static MTU and link state values.
Instead, map actual MTU to next largest IB_MTU_* constant and
correctly report link state.

Cc: Steve Wise <swise@opengridcomputing.com>
Reported-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:39 -07:00
Don Wood b29a4fc49b RDMA/nes: Rework the disconn routine for terminate and flushing
The disconn routine has been reworked to acoomodate the terminate and
flushing changes.  The routine has been reorganized to make all the
decisions at the start then it performs all the required operations.
This simplified the lock handling and is easier to follow.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:39 -07:00
Don Wood 320cdfd21d RDMA/nes: Use the flush code to fill in cqe error
Use the flush status to fill in cqe status when a specific error has
been identified.  Subsequent flushed completions still use the flushed
value.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:39 -07:00
Don Wood 6eed5e7c8b RDMA/nes: Make poll_cq return correct number of wqes during flush
When a flush request is given to the hw, it will place one cqe marked
as flushed (unless there is nothing to flush).  An application that is
waiting for all wqe's to complete will be left hanging.  This modifies
poll_cq to return the correct number of flushes for the pending
elements on the wq.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:39 -07:00
Don Wood 4b281faec3 RDMA/nes: Use flush mechanism to set status for wqe in error
When an asynchronous event occurs that requires a terminate, it is
sometimes possible to identify the wqe in error.  This change uses
flush to get this information to the poll routine.  The flush
operation puts the status into the cqe.  If this information is not
available, it continues to use the more generic flush code as before.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:38 -07:00
Don Wood 8b1c9dc4ba RDMA/nes: Implement Terminate Packet
Implement the sending and receiving of Terminate packets.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:38 -07:00
Don Wood 3c28b4457a RDMA/nes: Add CQ error handling
CQ errors are not being handled correctly.  Put in the the upcall for
CQ errors.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:38 -07:00
Don Wood 5ee21fe0ea RDMA/nes: Clean out CQ completions when QP is destroyed
When a QP is destroyed, unprocessed CQ entries could still reference
the QP.  This change zeroes the context value at QP destroy time.  By
skipping over cqe's with a zero context, poll_cq no longer processes a
cqe for a destroyed QP.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:37 -07:00
Don Wood ba0c5d9a89 RDMA/nes: Change memory allocation for cqp request to GFP_ATOMIC
The routine to allocate a cqp request is not called from process
context code.  Since it is not OK to sleep, it needs to use GFP_ATOMIC
not GFP_KERNEL.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:37 -07:00
Don Wood 873fcdd4bf RDMA/nes: Allocate work item for disconnect event handling
The code currently has a work structure in the QP.  This requires a
lock and a pending flag to ensure there is never more than one request
active.  When two events happen quickly (such as FIN and LLP CLOSE),
it causes unnecessary timeouts since the second one is dropped.

This fix allocates memory for the work request so the second one can
be queued.  A lock is removed since it is no longer needed.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:37 -07:00
Don Wood c4c3f279cd RDMA/nes: Update refcnt during disconnect
During termination, it is possible for the refcnt to go to zero while
the worker thread is posting events upward.  This fix increments the
refcnt before the request is passed to the worker thread.  The thread
decrements the refcnt when the request is completed.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:36 -07:00
Jack Morgenstein d841064777 IB/mthca: Don't allow userspace open while recovering from catastrophic error
Userspace apps are supposed to release all ib device resources if they
receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
app has no way of knowing when the device has come back up, except to
repeatedly attempt ibv_open_device() until it succeeds.

However, currently there is no protection against the open succeeding
while the device is in being removed following the fatal event.  In
this case, the open will succeed, but as a result the device waits in
the middle of its removal until the new app releases its resources --
and the new app will not do so, since the open succeeded at a point
following the fatal event generation.

This patch adds an "active" flag to the device. The active flag is set
to false (in the fatal event flow) before the "fatal" event is
generated, so any subsequent ibv_dev_open() call to the device will
fail until the device comes back up, thus preventing the above
deadlock.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:16 -07:00
Arputham Benjamin d94a868901 IB/mthca: Distinguish multiple devices in /proc/interrupts
When the mthca driver uses the same name for interrupts for every
device in the system.  This can make it very confusing trying to work
out exactly which device MSI-X interrupts are for.  Change the driver
to add the PCI name of the device to the interrupt name.

Signed-off-by: Arputham Benjamin <abenjamin@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:15 -07:00
Roland Dreier ffe063f32b IB/mthca: Annotate CQ locking
mthca_ib_lock_cqs()/mthca_ib_unlock_cqs() are helper functions that
lock/unlock both CQs attached to a QP in the proper order to avoid
AB-BA deadlocks.  Annotate this so sparse can understand what's going
on (and warn us if we misuse these functions).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:15 -07:00
Roland Dreier deecb5d672 IB/mthca: Remove unnecessary include of <linux/init.h>
mthca_reset.c doesn't have any function annotations, so there's no
reason to include <linux/init.h>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:15 -07:00
Roland Dreier fc1285585f IB/mthca: Remove unnecessary include of <asm/page.h>
mthca_config_reg.h was including <asm/page.h> for no reason -- the whole
file is just defines of constants, so it's entirely self-contained.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:13 -07:00
Jack Morgenstein 3b4a8cd51e IB/mlx4: Don't allow userspace open while recovering from catastrophic error
Userspace apps are supposed to release all ib device resources if they
receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
app has no way of knowing when the device has come back up, except to
repeatedly attempt ibv_open_device() until it succeeds.

However, currently there is no protection against the open succeeding
while the device is in being removed following the fatal event.  In
this case, the open will succeed, but as a result the device waits in
the middle of its removal until the new app releases its resources --
and the new app will not do so, since the open succeeded at a point
following the fatal event generation.

This patch adds an "active" flag to the device. The active flag is set
to false (in the fatal event flow) before the "fatal" event is
generated, so any subsequent ibv_dev_open() call to the device will
fail until the device comes back up, thus preventing the above
deadlock.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:50 -07:00
Roland Dreier 338a8fad27 IB/mlx4: Annotate CQ locking
mlx4_ib_lock_cqs()/mlx4_ib_unlock_cqs() are helper functions that
lock/unlock both CQs attached to a QP in the proper order to avoid
AB-BA deadlocks.  Annotate this so sparse can understand what's going
on (and warn us if we misuse these functions).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:49 -07:00
Roel Kluin 1493ab4083 RDMA/amso1100: Check kmalloc() result in c2_register_device()
dev->ibdev.iwcm allocation may fail, prevent a dereference.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:24 -07:00
Jack Morgenstein b1b8afb833 IB/uverbs: Return ENOSYS for unimplemented commands (not EINVAL)
Since the original commit 883a99c7 ("[IB] uverbs: Add a mask of device
methods allowed for userspace"), the uverbs core returns EINVAL for
commands not implemented by a specific low-level driver.

This creates a problem that there is no way to tell the difference
between an unimplemented command and an implemented one which is
incorrectly invoked (which also returns EINVAL).

The fix is to have unimplemented commands return ENOSYS.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:24 -07:00
Yossi Etigin e1d7806df3 IB/core: Fix send multicast group leave retry
Until now, retries were only sent when joining a multicast group. This
patch will adds retries when leaving a multicast group as well.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:24 -07:00
Marcin Slusarz f1aa78b26e IB: Use printk_once() for driver versions
Replace open-coded reimplementations with printk_once().

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:24 -07:00
Tobias Klauser 181c74e87e RDMA/amso1100: Use %pM conversion specifier
Use the %pM conversion specifier to print a MAC address.

Signed-off-by: Tobias Klauser <klto@zhaw.ch>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:23 -07:00
Roland Dreier 6276e08a9b IB: Use DEFINE_SPINLOCK() for static spinlocks
Rather than just defining static spinlock_t variables and then
initializing them later in init functions, simply define them with
DEFINE_SPINLOCK() and remove the calls to spin_lock_init().  This cleans
up the source a tad and also shrinks the compiled code; eg on x86-64:

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
function                                     old     new   delta
ib_uverbs_init                               336     326     -10
ib_mad_init_module                           147     137     -10
ib_sa_init                                   123     103     -20

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:23 -07:00
Roland Dreier 60f2b652f5 IB/mad: Check hop count field in directed route MAD to avoid array overflow
The hop count field in a directed route MAD is only allowed to be in the
range 0 to 63 (by spec).  Check that this really is the case to avoid
accessing outside the bounds of the hop array.

Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:10 -07:00
Jason Gunthorpe 5e47596bee IPoIB: Check multicast address format
Check that the format of multicast link addresses is correct before
taking them from dev->mc_list to priv->multicast_list.  This way we
never try to send a bogus address to the SA, which prevents badness
from erronous 'ip maddr addr add', broken bonding drivers, etc.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:40 -07:00
Roland Dreier 721d67cdca IPoIB: Drop priv->lock before calling ipoib_send()
IPoIB currently must use irqsave locking for priv->lock, since it is
taken from interrupt context in one path.  However, ipoib_send() does
skb_orphan(), and the network stack locking is not IRQ-safe.
Therefore we need to make sure we don't hold priv->lock when calling
ipoib_send() to avoid lockdep warnings (the code was almost certainly
safe in practice, since the only code path that takes priv->lock from
interrupt context would never call into the network stack).

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757
Reported-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:40 -07:00
Roland Dreier cd0bcf4cb9 IPoIB: Remove unused <rdma/ib_cache.h> includes
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:38 -07:00
Roel Kluin 286b63d096 IB/ipath: strncpy() doesn't always NUL-terminate
strlcpy() will always null terminate the string.  node_desc is not
guaranteed to be NUL-terminated so just use memcpy().

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:21 -07:00
Joachim Fenkes 6303e74c69 IB/ehca: Fix CQE flags reporting
The driver was reporting CQE flags in the wrong bit positions, causing
consumers to miss incoming immediate data.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:55 -07:00
Joachim Fenkes d706834d99 IB/ehca: Construct MAD redirect replies from request MAD
The old code used a lot of hard-coded values, which might not be valid
in all environments (especially routed fabrics or partitioned
subnets).  Copy as much information as possible from the incoming
request to correct that.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:55 -07:00
Alexander Schmidt 50d40b8e53 IB/ehca: Make port autodetect mode the default
Make port autodetect mode the default for the ehca driver. The
autodetect code has been in the kernel for several releases now and
has proved to be stable.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:54 -07:00
Steve Wise a52bf98d99 RDMA/cxgb3: Wake up any waiters on peer close/abort
A close/abort while waiting for a wr_ack during connection migration
can cause a hung process in iwch_accept_cr/iwch_reject_cr.

The fix is to set rpl_error/rpl_done and wake up the waiters when we
get a close/abort while in MPA_REQ_RCVD state.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:38 -07:00
Steve Wise 6e47fe4350 RDMA/cxgb3: Don't free endpoints early
- Keep ref on connection request endpoints until either accepted or
  rejected so it doesn't get freed early.

- Endpoint flags now need to be set via atomic bitops because they can
  be set on both the iw_cxgb3 workqueue thread and user disconnect
  threads.

- Don't move out of CLOSING too early due to multiple calls to
  iwch_ep_disconnect.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:38 -07:00
Steve Wise fa0d4c11c4 RDMA/cxgb3: Handle port events properly
Massage the err_handler upcall into an event handler upcall, pass
netdev port events to the cxgb3 ULPs and generate RDMA port events
based on LLD port events.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:38 -07:00
Steve Wise b496fe82d4 RDMA/cxgb3: Set the appropriate IO channel in rdma_init work requests
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:37 -07:00
Steve Wise 3793d2fc3e RDMA/cxgb3: iwch_unregister_device leaks memory
The iwcm struct mem is never freed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:36 -07:00
Eric Dumazet 451f144398 drivers: Kill now superfluous ->last_rx stores
The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Neil Horman <nhorman@txudriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 23:07:36 -07:00
Stephen Hemminger 0fc0b732ea netdev: drivers should make ethtool_ops const
No need to put ethtool_ops in data, they should be const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:33 -07:00
Roland Dreier 4a7eca824c Merge branches 'ehca', 'misc', 'mlx4', 'mthca' and 'nes' into for-linus 2009-06-23 10:38:47 -07:00
Peter Huewe 716abb1fdf RDMA: Add __init/__exit macros to addr.c and cma.c
Add __init and __exit annotations to the module_init/module_exit
functions from drivers/infiniband/core/addr.c and cma.c.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-23 10:38:42 -07:00
Alexander Schmidt 1d4d6da535 IB/ehca: Bump version number
Increment version number for DMEM toleration.

Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-23 10:30:04 -07:00
Roland Dreier 99987bea47 IB/mthca: Replace dma_sync_single() use with proper functions
dma_sync_single() is deprecated now, and the use in mthca is wrong:
there should be a dma_sync_single_for_cpu() before touching the memory
from the CPU, and a dma_sync_single_for_device() afterwards.  Fix
this, prompted by a kick in the pants from a patch from FUJITA
Tomonori <fujita.tomonori@lab.ntt.co.jp>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-22 23:04:13 -07:00
Faisal Latif 68237a0ff8 RDMA/nes: Fix FIN state handling under error conditions
During cluster testing, one QP was not closed, as FIN is not handled
properly when its rexmit count expires or in some cases when RST is is
received after sending FIN.  The reason is that the cm_id does not get
decremented under these conditions.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-22 22:53:28 -07:00
Faisal Latif 66388d67a0 RDMA/nes: Fix max_qp_init_rd_atom returned from query device
In nes_query_device(), max_qp_init_rd_atom is incorrectly set to
max_qp_wr.  This was found when a test application had a dapl async
event error.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-22 22:52:30 -07:00
Roel Kluin af04662b4d IB/ehca: Ensure that guid_entry index is not negative
This prevents the memcpy() of a guid_entries element using a negative index.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-22 22:23:48 -07:00
Hannes Hering 0cf89dcdbc IB/ehca: Tolerate dynamic memory operations before driver load
Implement toleration of dynamic memory operations and 16 GB gigantic
pages, where "toleration" means that the driver can cope with dynamic
memory operations that happen before the driver is loaded.  While the
ehca driver is loaded, dynamic memory operations are still prohibited
by returning NOTIFY_BAD from the memory notifier.

On module load the driver walks through available system memory,
checks for available memory ranges and then registers the kernel
internal memory region accordingly.  The translation of address ranges
is implemented via a 3-level busmap.

Signed-off-by: Hannes Hering <hering2@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-22 22:18:51 -07:00
Greg Kroah-Hartman f899c2ddd4 infiniband: ehca: remove driver_data direct access of struct device
In the near future, the driver core is going to not allow direct access
to the driver_data pointer in struct device.  Instead, the functions
dev_get_drvdata() and dev_set_drvdata() should be used.  These functions
have been around since the beginning, so are backwards compatible with
all older kernel versions.

Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: general@lists.openfabrics.org
Cc: Christoph Raisch <raisch@de.ibm.com>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15 21:30:27 -07:00
Greg Kroah-Hartman 3f7c58a05f infiniband: remove driver_data direct access of struct device
In the near future, the driver core is going to not allow direct access
to the driver_data pointer in struct device.  Instead, the functions
dev_get_drvdata() and dev_set_drvdata() should be used.  These functions
have been around since the beginning, so are backwards compatible with
all older kernel versions.


Cc: general@lists.openfabrics.org
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15 21:30:26 -07:00
David S. Miller 9cbc1cb8cd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/scsi/fcoe/fcoe.c
	net/core/drop_monitor.c
	net/core/net-traces.c
2009-06-15 03:02:23 -07:00
Linus Torvalds cf5046323e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Don't double-free IRQs when falling back from MSI-X to INTx
  IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTx
  IB/mlx4: Add strong ordering to local inval and fast reg work requests
  IB/ehca: Remove superfluous bitmasks from QP control block
  RDMA/cxgb3: Limit fast register size based on T3 limitations
  RDMA/cxgb3: Report correct port state and MTU
  mlx4_core: Add module parameter for number of MTTs per segment
  IB/mthca: Add module parameter for number of MTTs per segment
  RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes()
  infiniband: Remove void casts
  IB/ehca: Increment version number
  IB/ehca: Remove unnecessary memory operations for userspace queue pairs
  IB/ehca: Fall back to vmalloc() for big allocations
  IB/ehca: Replace vmalloc() with kmalloc() for queue allocation
2009-06-14 13:53:22 -07:00
Roland Dreier 8d34ff3401 Merge branches 'cxgb3', 'ehca', 'misc', 'mlx4', 'mthca' and 'nes' into for-linus 2009-06-14 13:31:19 -07:00
Roland Dreier 9aa0a489d9 IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTx
When both MSI-X and legacy INTx fail to generate an interrupt, the
driver frees the MSI-X interrupts twice.  Fix this by clearing the
have_irq flag for the MSI-X interrupts when they are freed the first
time.

Reported-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-13 15:14:09 -07:00
Jack Morgenstein 2ac6bf4ddc IB/mlx4: Add strong ordering to local inval and fast reg work requests
The ConnectX Programmer's Reference Manual states that the "SO" bit
must be set when posting Fast Register and Local Invalidate send work
requests.  When this bit is set, the work request will be executed
only after all previous work requests on the send queue have been
executed.  (If the bit is not set, Fast Register and Local Invalidate
WQEs may begin execution too early, which violates the defined
semantics for these operations)

This fixes the issue with NFS/RDMA reported in
<http://lists.openfabrics.org/pipermail/general/2009-April/059253.html>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-05 10:36:24 -07:00
Joachim Fenkes 25a5239327 IB/ehca: Remove superfluous bitmasks from QP control block
All the fields in the control block are nicely right-aligned, so no
masking is necessary.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-03 13:25:42 -07:00
Eric Dumazet adf30907d6 net: skb->dst accessors
Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03 02:51:04 -07:00
Eric Dumazet 86d15cd833 net: unset IFF_XMIT_DST_RELEASE for qeth and ipoib
Last two drivers that need skb->dst in their start_xmit() function

Tell dev_hard_start_xmit() to no release it by unsetting  IFF_XMIT_DST_RELEASE

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-30 23:04:46 -07:00
Steve Wise 3026c19a14 RDMA/cxgb3: Limit fast register size based on T3 limitations
T3 firmware only supports one WRs worth of page list for fast register
work requests.  The driver currently allows 2 WRs worth, which
doesn't work for T3, so reduce the limit in the driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-27 14:43:39 -07:00
Steve Wise 7ab1a2b31d RDMA/cxgb3: Report correct port state and MTU
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-27 14:42:36 -07:00
Eli Cohen c1f67a88bf IB/mthca: Add module parameter for number of MTTs per segment
The current MTT allocator uses kmalloc() to allocate a buffer for its
buddy allocator, and thus is limited in the amount of MTT segments
that it can control.  As a result, the size of memory that can be
registered is limited too.  This patch uses a module parameter to
control the number of MTT entries that each segment represents,
allowing more memory to be registered with the same number of
segments.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-27 14:36:16 -07:00
Mike Christie b3cd5050bf [SCSI] libiscsi: add task aborted state
If a task did not complete normally due to a TMF, libiscsi will
now complete the task with the state ISCSI_TASK_ABRT_TMF. Drivers
like bnx2i that need to free resources if a command did not complete normally
can then check the task state. If a driver does not need to send
a special command if we have dropped the session then they can check
for ISCSI_TASK_ABRT_SESS_RECOV.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-05-23 15:44:13 -05:00
Mike Christie 10eb0f013c [SCSI] iscsi: pass ep connect shost
When we create the tcp/ip connection by calling ep_connect, we currently
just go by the routing table info.

I think there are two problems with this.

1. Some drivers do not have access to a routing table. Some drivers like
qla4xxx do not even know about other ports.

2. If you have two initiator ports on the same subnet, the user may have
set things up so that session1 was supposed to be run through port1. and
session2 was supposed to be run through port2. It looks like we could
end with both sessions going through one of the ports.

Fixes for cxgb3i from Karen Xie.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-05-23 15:44:09 -05:00
Eric W. Biederman 26574401fe net: Fix ipoib rtnl_lock sysfs deadlock.
Network device sysfs files that grab the rtnl_lock unconditionally
will deadlock if accessed when the network device is being
unregistered.  So use trylock and syscall_restart to avoid this
deadlock.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:15:59 -07:00
Roel Kluin 28e43a519b RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes()
With a postfix increment, i is incremented one past 10K/5K before the
loop ends, so the error messages will be displayed too soon if the
test succeeds on the last iteration.  Fix the comparisons to be >
instead of >=.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-15 10:16:45 -07:00