The current work request posting code writes the LSO segment before
writing any data segments. This leaves a window where the LSO segment
overwrites the stamping in one cacheline that the HCA prefetches
before the rest of the cacheline is filled with the correct data
segments. When the HCA processes this work request, a local
protection error may result.
Fix this by saving the LSO header size field off and writing it only
after all data segments are written. This fix is a cleaned-up version
of a patch from Jack Morgenstein <jackm@dev.mellanox.co.il>.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1383>.
Reported-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a deadlock between child interface creation/deletion and ipoib
start/stop. The former takes vlan_mutex, and then might take RTNL via
register_netdev()/unregister_netdev(). The latter is executed with
RTNL held, and tries to take vlan_mutex, which can lead to an AB-BA
deadlock.
Fix this by having the child interface creation/deletion code take the
RTNL first so vlan_mutex always nests inside RTNL. We can use
register_netdevice() for child interfaces because we form the
interface name from the parent interface and hence don't need the '%'
expansion of register_netdev().
Reported-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
After commit fe25c561 ("IPoIB: Don't enable NAPI when it's already
enabled"), if an interface is brought up but the corresponding P_Key
never appears, then ipoib_stop() will hang in napi_disable(), because
ipoib_open() returns before it does napi_enable().
Fix this by changing ipoib_open() to call napi_enable() even if the
P_Key isn't present.
Reported-by: Yossi Etigin <yosefe@Voltaire.COM>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/iser: Add dependency on INFINIBAND_ADDR_TRANS
IPoIB: Do not join broadcast group if interface is brought down
RDMA/nes: Fix for NIPQUAD removal
IPoIB: Fix loss of connectivity after bonding failover on both sides
IB/mlx4: Don't register IB device for adapters with no IB ports
mlx4_core: Fix warning from min()
IB/ehca: spin_lock_irqsave() takes an unsigned long
Fix ib_iser build to depend on INFINIBAND_ADDR_TRANS; if INET=y but
IPV6=n, then the RDMA CM is not built but INFINIBAND_ISER can be
enabled, leading to:
ERROR: "rdma_destroy_id" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
ERROR: "rdma_connect" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
ERROR: "rdma_destroy_qp" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
ERROR: "rdma_create_id" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
ERROR: "rdma_create_qp" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
ERROR: "rdma_resolve_route" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
ERROR: "rdma_disconnect" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
ERROR: "rdma_resolve_addr" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Because the ipoib_workqueue is not flushed when ipoib interface is
brought down, ipoib_mcast_join() may trigger a join to the broadcast
group after priv->broadcast was set to NULL (during cleanup). This
will cause the system to be a member of the broadcast group when
interface is down. As a side effect, this breaks the optimization of
setting the Q_key only when joining the broadcast group.
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 63779436 ("drivers: replace NIPQUAD()") accidentally replaced
some HIPQUAD()s, causing IP addresses to be printed in reverse order.
Add temporary local vars until the byteswapping can be pushed further
up the stack.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix bonding failover in the case both peers failover and the
gratuitous ARP is lost. In that case, the sender side will create an
ipoib_neigh and issue a path request with the old GID first. When
skb->dst->neighbour->ha changes due to ARP refresh, this ipoib_neigh
will not be added to the path->list of the path of the new GID,
because the ipoib_neigh already exists. It will not have an AH
either, because of sender-side failover. Therefore, it will not get
an AH when the path is resolved.
The solution here is to compare GIDs in ipoib_start_xmit() even if
neigh->ah is invalid. Comparing with an uninitialized value of
neigh->dgid should be fine, since a spurious match is harmless (and
astronomically unlikely too).
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the mlx4_ib driver finds an adapter that has only ethernet ports, the
current code will register an IB device with 0 ports. Nothing useful or
sensible can be done with such a device, so just skip registering it.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When I review ocfs2 code, find there are 2 typos to "successfull". After
doing grep "successfull " in kernel tree, 22 typos found totally -- great
minds always think alike :)
This patch fixes all the similar typos. Thanks for Randy's ack and comments.
Signed-off-by: Coly Li <coyli@suse.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
trivial: chack -> check typo fix in main Makefile
trivial: Add a space (and a comma) to a printk in 8250 driver
trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
trivial: Fix misspelling of "firmware" in powerpc Makefile
trivial: Fix misspelling of "firmware" in usb.c
trivial: Fix misspelling of "firmware" in qla1280.c
trivial: Fix misspelling of "firmware" in a100u2w.c
trivial: Fix misspelling of "firmware" in megaraid.c
trivial: Fix misspelling of "firmware" in ql4_mbx.c
trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
trivial: Fix misspelling of "firmware" in ipw2100.c
trivial: Fix misspelling of "firmware" in atmel.c
trivial: Fix misspelled firmware in Kconfig
trivial: fix an -> a typos in documentation and comments
trivial: fix then -> than typos in comments and documentation
trivial: update Jesper Juhl CREDITS entry with new email
trivial: fix singal -> signal typo
trivial: Fix incorrect use of "loose" in event.c
trivial: printk: fix indentation of new_text_line declaration
trivial: rtc-stk17ta8: fix sparse warning
...
The flags argument to spin_lock_irqsave() should really be unsigned
long. This will also help prevent some warnings when we change u64 to
unsigned long long.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
... and don't bother in callers. Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.
i_mode is not worth the effort; it has no common default value.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mlx4: Fix reading SL field out of cqe->sl_vid
RDMA/addr: Fix build breakage when IPv6 is disabled
Commit f780a9f1 ("mlx4_core: Add ethernet fields to CQE struct")
introduced a bug in how wc->sl is set in mlx4_ib_poll_one() -- since
cqe->sl_vid is a big-endian value, the shift must be done after
converting to host endianness.
This bug was found using sparse endianness checking.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 38617c64 ("RDMA/addr: Add support for translating IPv6
addresses") broke the build when CONFIG_IPV6=n, because the ib_addr
module unconditionally attempted to call ipv6_chk_addr() and other
IPv6 functions that are not defined when IPv6 is disabled. Fix this
by only building IPv6 support if CONFIG_IPV6 is turned on, and
add a Kconfig dependency to prevent the ib_addr code from being built
in when IPv6 is built modular.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Impact: cleanup
We're moving from handing around cpumask_t's to handing around struct
cpumask *'s. cpus_*, cpumask_t and cpu_*_map are deprecated: convert
to cpumask_*, cpu_*_mask.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ralph Campbell <infinipath@qlogic.com>
Impact: cleanup
We're moving from handing around cpumask_t's to handing around struct
cpumask *'s. cpus_*, cpumask_t and cpu_*_map are deprecated: convert
to cpumask_*, cpu_*_mask.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Tested-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Impact: cleanup
In future, accessing cpu numbers beyond nr_cpu_ids (the runtime limit)
will be undefined. We can avoid future problems by using
for_each_online_cpu() here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Tested-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
We do not need to allocate a itt for data_out, so this
passes the opcode to the alloc_pdu callout.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This just converts iser to new alloc_pdu api. It still
preallocates the pdu, so there is no difference.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
When we removed the network device argument from several
NAPI interfaces in 908a7a16b8
("net: Remove unused netdev arg from some NAPI interfaces.")
several drivers now started getting unused variable warnings.
This fixes those up.
Signed-off-by: David S. Miller <davem@davemloft.net>
When resizing a CQ, when copying over unpolled CQEs from the old CQE
buffer to the new buffer, the ownership bit must be set appropriately
for the new buffer, or the ownership bit in the new buffer gets
corrupted.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There is no lock protecting tx_free_list thus causing a system crash
when skb_dequeue() is called and the list is empty. Since it did not give
any performance boost under heavy load, remove it to simplify the code.
Replace get_free_pkt() with dev_alloc_skb() to allocate MAX_CM_BUFFER skb
for connection establishment/teardown as well as MPA request/response.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Handle AF_INET6 cases where required, and use struct sockaddr_storage
wherever an IPv6 address might be stored.
Signed-off-by: Aleksey Senin <aleksey@alst60.(none)>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for translating AF_INET6 addresses to the IB address
translation service. This requires using struct sockaddr_storage
instead of struct sockaddr wherever an IPv6 address might be stored,
and adding cases to handle IPv6 in addition to IPv4 to the various
translation functions.
Signed-off-by: Aleksey Senin <aleksey@alst60.(none)>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the napi api was changed to separate its 1:1 binding to the net_device
struct, the netif_rx_[prep|schedule|complete] api failed to remove the now
vestigual net_device structure parameter. This patch cleans up that api by
properly removing it..
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using MSI-X mode, create a completion event queue for each CPU.
Report the number of completion EQs in a new struct mlx4_caps member,
num_comp_vectors, and extend the mlx4_cq_alloc() interface with a
vector parameter so that consumers can specify which completion EQ
should be used to report events for the CQ being created.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
iSCSI/iSER targets may send PDUs without a prior request from the
initiator. RFC 5046 refers to these PDUs as "unexpected". NOP-In PDUs
with itt=RESERVED and Asynchronous Message PDUs occupy this category.
The amount of active "unexpected" PDU's an iSER target may have at any
time is governed by the MaxOutstandingUnexpectedPDUs key, which is not
yet supported.
Currently when an iSER target sends an "unexpected" PDU, the
initiators recv buffer consumed by the PDU is not replaced. If over
initial_post_recv_bufs_num "unexpected" PDUs are received then the
receive queue will run out of receive work requests entirely.
This patch ensures recv buffers consumed by "unexpected" PDUs are
replaced in the next iser_post_receive_control() call.
Signed-off-by: David Disseldorp <ddiss@sgi.com>
Signed-off-by: Ken Sandars <ksandars@sgi.com>
Acked-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
vpage is checked not to be NULL just after it is initialized at the
beginning of each loop iteration.
A simplified version of the semantic patch that makes this change is
as follows: (http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
expression E;
position p1,p2;
@@
if (x@p1 == NULL || ...) { ... when forall
return ...; }
... when != \(x=E\|x--\|x++\|--x\|++x\|x-=E\|x+=E\|x|=E\|x&=E\|&x\)
(
x@p2 == NULL
|
x@p2 != NULL
)
// another path to the test that is not through p1?
@s exists@
local idexpression r.x;
position r.p1,r.p2;
@@
... when != x@p1
(
x@p2 == NULL
|
x@p2 != NULL
)
@fix depends on !s@
position r.p1,r.p2;
expression x,E;
statement S1,S2;
@@
(
- if ((x@p2 != NULL) || ...)
S1
|
- if ((x@p2 == NULL) && ...) S1
|
- BUG_ON(x@p2 == NULL);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With the latest flush error completion patch we introduced modulus
operation to calculate the next index within a qmap. Based on
comments from other mailing lists we decided to optimize this
operation by using an addition and an if-statement instead of modulus,
even though this is on the error path.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fixes timing race resulting in panic. Not a performance sensitive path.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipath_piobufbase was a single value offset, but is multiple values on
newer chips, so use only the 32 bits for the 2K buffers (4K buffers
are currently used only by the driver).
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Implement the ignoring of ibsymbol errors and linkrecover errors while
the link is at less than INIT (long needed), to get accurate counts.
Particularly an issue when doing non-IBTA DDR negotiation with chips
from vendors that do not support IBTA mode negotiation. If the driver
is unloaded, and there is a delta, the adjusted counters are written
back to the chip, so they stay adjusted across driver reload.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This fixes an obvious oversight where the return value is not checked
for error.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The PSN of the first packet after an RDMA read is based on the size of
the RDMA read request. This is calculated correctly for the WQE sent
after the first request message but not on subsequent requests if the
RDMA read is resent.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Wrap NES_DEBUG and assert macros with do while (0) to avoid ambiguous
else. No one is using sk_buff * returned from form_cm_frame(), so
drop the return. drop_packet() should not be incrementing reset
counter on receiving a FIN.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Between the first empty list check and locking the list, the list can
change. Check it again after it is locked to make sure the list is
still not empty.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ANVL testing showed we are not handling all cm_node states during
connection establishment. Add missing state handlers and fix sequence
number send reset in handle_tcp_options().
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Under heavy traffic, there is a small windows when an APBVT entry is
not yet removed and a new connection is established. Packets for the
new connection are dropped until APBVT entry is removed. This patch
will forward the packets instead of dropping them.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In passive open, after indicating MPA request to rdma_cm, an incoming
RST would fire a reset event to rdma_cm causing it to crash, since the
current state is not connected. The solution is to wait for
nes_accept() or nes_reject() before firing the reset event. If
nes_accept() or nes_reject() is already done, then the reset event
will be fired when RST is processed.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
While processing connected_nodes list, we would release the lock when
we need to send reset to remote partner. That created a window where
the list can be modified. Change this into a two step process: place
nodes that need processing on a local list then process the local list.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use nes_free_cqp_request() instead of open coding. Change some
continue to break in nes_cm_timer_tick, because send_entry used to be
a list processed in a loop (so continue went to the next item). Now
it is a single item, so using break is correct.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Receive work queue entries are checked for L_Key validity, and
pointers to the memory region structure are saved in an allocated
structure. For UD loopback packets, this structure is allocated and
freed for each packet. This patch changes that to allocate/free
during QP creation and destruction.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The return from lookup_one_len() is assigned to *dentry, so that's
what we should be checking with IS_ERR().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
shca_list_lock is taken from softirq context in ehca_poll_eqs, so we
need to lock IRQ safe elsewhere. Found by lockdep.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When resizing a CQ, MTTs associated with the old CQE buffer were not
freed. As a result, if any app used resize CQ repeatedly, all MTTs
were eventually exhausted, which led to all memory registration
operations failing until the driver is reloaded.
Once the RESIZE_CQ command returns successfully from FW, FW no longer
accesses the old CQ buffer, so it is safe to deallocate the MTT
entries used by the old CQ buffer.
Finally, if the RESIZE_CQ command fails, the MTTs allocated for the
new CQEs buffer also need to be de-allocated.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1416>.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This fix enables ehca device driver to generate flush work completions
even if the application doesn't request completions for all work
requests. The current implementation of ehca will generate flush work
completions for the wrong work requests if an application uses non
signaled work completions.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The error message printed when the eHCA driver prevents memory hotplug
is misleading -- the user might think that hot-removing the lhca,
hotplugging memory, then hot-adding the lhca again will work, but it
actually doesn't.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This use of netdev->priv is wrong.
The right way is:
alloc_netdev() with no memory for private data.
make netdev->ml_priv to point to c2_dev.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IPoIB: Fix crash in path_rec_completion()
IPoIB: Fix hang in ipoib_flush_paths()
IPoIB: Don't enable NAPI when it's already enabled
RDMA/cxgb3: Fix deadlock in iw_cxgb3 (hang when configuring interface)
IB/ehca: Remove reference to special QP in case of port activation failure
IB/mlx4: Set umem field to NULL in mlx4_ib_alloc_fast_reg_mr()
mlx4_core: Fix unused variable warning
RDMA/nes: Mitigate compatibility issue regarding PCIe write credits
RDMA/nes: Fix CQ allocation scheme for multicast receive queue apps
RDMA/nes: Correct handling of PBL resources
RDMA/nes: Reindent mis-indented spinlocks
RDMA/cxgb3: Fix too-big reserved field zeroing in iwch_post_zb_read()
IB/ipath: Fix RDMA write with immediate copy of last packet
Fix a crash in path_rec_completion() during an SM up/down loop. If
more than one path record request is issued, the first completion
releases path->done, allowing ipoib_flush_paths() to free the path,
and thus corrupting it for the second completion.
Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") added the field path->valid and changed the test "if
(!path)" to "if (!path || !path->valid)". This change made it
possible for a path with an outstanding query to pass the test and
issue another query on the same path. Having two queries on the same
path leads to a crash.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1325>.
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipoib_flush_paths() can hang during an SM up/down loop: if
path_rec_start() fails (for instance, because there is no sm_ah), the
path is still added to the path list by neigh_add_path(). Then,
ipoib_flush_paths() will wait for path->done, but it will never
complete because the request was not issued at all. Fix this by
completing path->done if issuing the query fails.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1329>.
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a P_Key is not present when an interface is created, ipoib_open()
will return after doing napi_enable(). ipoib_open() will be called
again from ipoib_pkey_poll() when the P_Key appears, after NAPI has
already been enabled, and try to enable it again. This triggers a
BUG_ON() in napi_enable().
Fix this by moving the call to napi_enable() to after the test for
P_Key presence.
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the iw_cxgb3 module's cxgb3_client "add" func gets called by the
cxgb3 module, the iwarp driver ends up calling the ethtool ops
get_drvinfo function in cxgb3 to get the fw version and other info.
Currently the iwarp driver grabs the rtnl lock around this down call
to serialize. As of 2.6.27 or so, things changed such that the rtnl
lock is held around the call to the netdev driver open function. Also
the cxgb3_client "add" function doesn't get called if the device is
down.
So, if you load cxgb3, then load iw_cxgb3, then ifconfig up the
device, the iw_cxgb3 add func gets called with the rtnl_lock held. If
you load cxgb3, ifconfig up the device, then load iw_cxgb3, the add
func gets called without the rtnl_lock held. The former causes the
deadlock, the latter does not.
In addition, there are iw_cxgb3 sysfs handlers that also can call down
into cxgb3 to gather the fw and hw versions. These can be called
concurrently on different processors and at any time. Thus we need to
push this serialization down in the cxgb3 driver get_drvinfo func.
The fix is to remove rtnl lock usage, and use a per-device lock in cxgb3.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the initialization of a special QP (e.g. AQP1) fails due to a
software timeout, we have to remove the reference to that special QP
struct from the port struct to stop the driver from accessing the QP,
since it will be/has been destroyed by the caller, eg in this case
ib_mad.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Set mr->umem to NULL in mlx4_ib_alloc_fast_reg_mr(). Otherwise
ib_dereg_mr() may invoke ib_umem_release() on a random pointer value
and get an oops.
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Under heavy load, there is an compatibility issue regarding PCIe write
credits with certain chipsets. It can be mitigated by limiting read
requests to 256 Bytes.
This workaround is always enabled for Tbird2 on Gladius. We also add
a module parameter to enable workaround for non-Gladius cards.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix CQ allocation for multicast receive queue applications. Before
this patch, the CQ was not lined up with the right NIC.
Signed-off-by: Vadim Makhervaks <vadim.makhervaks@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* Roll back allocated structures on failures.
* Use GFP_ATOMIC instead of GFP_KERNEL since we are holding a lock.
* Acquire nesadapter->pbl_lock when modifying PBL counters.
* Decrement PBL counters on deallocation.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The array wqe->read.reserved has only two entries, but
iwch_post_zb_read() sets [0], [1], and [2], which is one too many.
This is harmless since it runs into the next field, rem_stag, which is
initialized correctly immediately after, but we might as well get
things right, especially since it makes the code smaller.
This was spotted by the Coverity checker (CID 2475).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.
So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a bonus, removes some unnecessary byteswapping.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG()
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This converts pretty much everything to print_mac. There were
a few things that had conflicts which I have just dropped for
now, no harm done.
I've built an allyesconfig with this and looked at the files
that weren't built very carefully, but it's a huge patch.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the last packet of a RDMA write with immediate is received, the
next receive work queue entry ID should be used to generate a completion
entry. The code was incorrectly resetting part of the state used to copy
the last packet.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/ehca: Reject dynamic memory add/remove when ehca adapter is present
IB/ehca: Fix reported max number of QPs and CQs in systems with >1 adapter
IPoIB: Set netdev offload features properly for child (VLAN) interfaces
IPoIB: Clean up ethtool support
mlx4_core: Add Ethernet PCI device IDs
mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC
mlx4_core: Multiple port type support
mlx4_core: Ethernet MAC/VLAN management
mlx4_core: Get ethernet MTU and default address from firmware
mlx4_core: Support multiple pre-reserved QP regions
Update NetEffect maintainer emails to Intel emails
RDMA/cxgb3: Remove cmid reference on tid allocation failures
IB/mad: Use krealloc() to resize snoop table
IPoIB: Always initialize poll_timer to avoid crash on unload
IB/ehca: Don't allow creating UC QP with SRQ
mlx4_core: Add QP range reservation support
RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR()
Since the ehca device driver does not support dynamic memory add and
remove operations, the driver must explicitly reject such requests in
order to prevent unpredictable behaviors related to existing memory
regions that cover all of memory being used by InfiniBand protocols in
the kernel.
The solution (for now at least) is to add a memory notifier to the
ehca device driver and if a request for dynamic memory add or remove
comes in, ehca will always reject it. The user can add or remove
memory by hot-removing the ehca adapter, performing the memory
operation, and then hot-adding the ehca adapter back.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Because ehca adapters can differ in the maximum number of QPs and CQs
we have to save the maximum number of these ressources per adapter and
not globally per ehca driver. This fix introduces 2 new members to the
shca structure to store the maximum value for QPs and CQs per adapter.
The module parameters are now used as initial values for those
variables. If a user selects an invalid number of CQs or QPs we don't
print an error any longer, instead we will inform the user with a
warning and set the values to the respective maximum supported by the
HW.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Child devices were created without any offload features set, fix this by
moving the code that computes the features into generic function which is
now called through non-child and child device creation.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
-- v1 has a bug where the 'result' flag in ipoib_vlan_add may be used uninitialized
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a get_rx_csum method. Remove the driver's own get_tso method, as
the ethtool kernel code uses the default one if nothing is provided.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Multi-protocol adapters support different port types. Each consumer
of mlx4_core queries for supported port types; in particular mlx4_ib
can no longer assume that all physical ports belong to it. Port type
is configured through a sysfs interface. When the type of a port is
changed, all mlx4 interfaces are unregistered, and then registered
again with the new port types.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Tejun's commit 7b595756ec made sysfs
attribute->owner unnecessary. But the field was left in the structure to
ease the merge. It's been over a year since that change and it is now
time to start killing attribute->owner along with its users - one arch at
a time!
This patch is attempt #1 to get rid of attribute->owner only for
CONFIG_X86_64 or CONFIG_X86_32 . We will deal with other arches later on
as and when possible - avr32 will be the next since that is something I
can test. Compile (make allyesconfig / make allmodconfig / custom config)
and boot tested.
akpm: the idea is that we put the declaration of sttribute.owner inside
`#ifndef CONFIG_X86'. But that proved to be too ambitious for now because
new usages kept on turning up in subsystem trees.
[akpm: remove the ifdef for now]
Signed-off-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that device_create() has been audited, rename things back to the
original call to be sane.
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The error path in iwch_connect() can fail to drop the cmid reference,
which will cause the process to hang when destroying the cmid.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use krealloc() instead of kmalloc() followed by memcpy() when resizing
the MAD module's snoop table.
Also put parentheses around the new table size to avoid calculating
the wrong size to allocate, which fixes a bug pointed out by Haven
Hash <haven.hash@isilon.com>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.
This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I had this in my patchset to add target reset support, but
it got dropped due to patching conflicts. This initial patch
just renames the function and users. We are actually just
dropping the session, and so this does not have anything to do
with the host exactly. It does for software iscsi because
we allocate a host per session, but for cxgb3i this makes no
sense.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If the driver knows when hardware is removed like with cxgb3i,
bnx2i, qla4xxx and iser then we will want to remove the sessions/devices
that are bound to that device before removing the host.
cxgb3i and in the future bnx2i will remove the host and that will
remove all the sessions on the hba. iser can call iscsi_kill_session
when it gets an event that indicates that a hca is removed.
And when qla4xxx is hooked in to the lib (it is only hooked into
the class right now) it can call iscsi remove host like the
partial offload card drivers.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>