semaphore to mutex conversion by Ingo and Arjan's script.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Sanity-checked on real IB hardware ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
build_mlx_header() was using sqp->ud_header.grh_present before it was
initialized by mthca_read_ah(). Furthermore, header->grh_present is
set by ib_ud_header_init, so there's no need to set it again in
mthca_read_ah().
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use the ALIGN macro to simplify some rounding code.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix memory leaks in mthca_create_qp() and mthca_create_srq()
error handling.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Convert "/ (1 << lg)" to ">> lg" for a slight code size reduction.
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-24 (-24)
function old new delta
mthca_map_cmd 613 589 -24
Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current handling of multicast groups in IPoIB ends up never
freeing send-only multicast groups. It turns out the logic was much
more complicated than it needed to be; we can fix this bug and
completely kill ipoib_mcast_dev_down() at the same time.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
dev->mc_list accesses must be protected by dev->xmit_lock.
Found by Eli Cohen <eli@mellanox.co.il>.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Multiple ipoib_neigh structures on mcast->neigh_list may point to the
same ah. This means that ipoib_mcast_free() can't just make a list of
ah structs to free, since this might end up trying to add the same ah
to the list more than once. Handle this in ipoib_multicast.c in the
same way as it is handled in ipoib_main.c for struct ipoib_path.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Don't leak memory on allocation failure for broadcast mcast group.
Also, print a warning to match handling for other mcast groups.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a node_guid field to struct ib_device. It is the responsibility
of the low-level driver to initialize this field before registering a
device with the midlayer. Convert everyone to looking at this field
instead of calling ib_query_device() when all they want is the node
GUID, and remove the node_guid field from struct ib_device_attr.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Factor out common code for initializing MAD packets, which is shared
by many query routines in mthca_provider.c, into init_query_mad().
add/remove: 1/0 grow/shrink: 0/4 up/down: 16/-44 (-28)
function old new delta
init_query_mad - 16 +16
mthca_query_port 521 518 -3
mthca_query_pkey 301 294 -7
mthca_query_device 648 641 -7
mthca_query_gid 453 426 -27
Signed-off-by: Roland Dreier <rolandd@cisco.com>
I am seeing EQ overruns in SDP stress tests: if the CQ completion
handler arms a CQ, this could generate more EQEs, so that EQ will
never get empty and consumer index will never get updated.
This is similiar to what we have with command interface:
/*
* cmd_event() may add more commands.
* The card will think the queue has overflowed if
* we don't tell it we've been processing events.
*/
However, for completion events, we *don't* want to update the consumer
index on each event. So, perform EQ doorbell coalescing: allocate EQs
with some spare EQEs, and update once we run out of them.
The value 0x80 was selected to avoid any performance impact.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
For all pages except possibly the last one, the byte beyond the buffer
end must be page aligned. Therefore, when computing the page shift to
use, OR the end addresses of the buffers as well as the start
addresses into the mask we check.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Include fixes for 2.6.14-git11. Should allow to remove sched.h from
module.h on i386, x86_64, arm, ia64, ppc, ppc64, and s390. Probably more
to come since I haven't yet checked the other archs.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ib_create_ah_from_wc() doesn't create the correct return address (AH)
when there is a GRH present (source & dest GIDs need to be swapped).
Signed-off-by: Ralph Campbell <ralphc@pathscale.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ib_uverbs_create_cq() should release the completion channel event file
if an error occurs after it looks it up. Also, if userspace asks for
a completion channel and we don't find it, an error should be returned
instead of silently creating a CQ without a completion channel.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If an operation fails after incrementing an object's reference count,
then it should decrement the reference count on the error path.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add code to modify QP operation to handle setting alternate paths for
connected QPs.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fill vendor_err field in completion with error.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Multicast group management fixes:
. Fix leak of mailbox memory in error handling on multicast group operations.
. Free AMGM indices at detach and in attach error handling.
. Fix amount to shift for aligning next_gid_index in mailbox: it
starts at bit 6, not bit 5.
. Allocate AMGM index after end of MGM table, in the range num_mgms to
multicast table size - 1. Add some BUG_ON checks to catch cases
where the index falls in the MGM hash area.
. Initialize the list of QPs in a newly-allocated group from AMGM to 0
This is necessary since when a group is moved from AMGM to MGM (in the
case where the MGM entry has been emptied of QPs), the AMGM entry is
not reset to 0 (and we don't want an extra command to do that).
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
PKEY_INDEX is not a legal parameter in the RTR->RTS transition.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fixes to SQEr->RTS transition in modify_qp:
1. The flag IB_QP_ACCESS_FLAGS is optional for UC qps
2. The SQEr state is not supported for RC qps
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a case where copying max_inline_data from a successful create_qp
capabilities output to create_qp input could cause EINVAL error:
mthca_set_qp_size must check max_inline_data directly against
max_desc_sz; checking qp->sq.max_gs is wrong since max_inline_data
depends on the qp type and does not involve max_sg.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix mthca_create_eq for when the EQ size is not a power of 2.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Modify_qp should check that the physical port number provided
is a legal value.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Check error return on call to mthca_dev_lim for Tavor
(as is done for memfree).
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Leave the overloaded "hotplug" word to susbsystems which are handling
real devices. The driver core does not "plug" anything, it just exports
the state to userspace and generates events.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thinko: 64 bytes is the minimum SRQ WQE size (not the maximum).
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.
Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sae and sre bits should only be set when setting sra_max. Further, in
the old code, if the caller specifies max_rd_atomic = 0, the sre and
sae bits are still set, with the result that the QP ends up with
max_rd_atomic = 1 in effect.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch corrects some corner cases in managing the RAE/RRE bits in
the mthca qp context. These bits need to be zero if the user requests
max_dest_rd_atomic of zero. The bits need to be restored to the value
implied by the qp access flags attribute in a previous (or the
current) modify-qp command if the dest_rd_atomic variable is changed
to non-zero.
In the current implementation, the following scenario will not work:
RESET-to-INIT set QP access flags to all disabled (zeroes)
INIT-to-RTR set max_dest_rd_atomic=10, AND
set qp_access_flags = IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_ATOMIC
The current code will incorrectly take the access-flags value set in
the RESET-to-INIT transition.
We can simplify, and correct, this IB_QP_ACCESS_FLAGS handling: it is
always safe to set qp access flags in the firmware command if either
of IB_QP_MAX_DEST_RD_ATOMIC or IB_QP_ACCESS_FLAGS is set, so let's
just set it to the correct value, always.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When cleaning up a CQ for a QP attached to SRQ, need to free an SRQ
WQE only if the CQE is a receive completion.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
break only escapes from the innermost loop, and we want to escape both
loops and return an answer. Noticed by Ishai Rabinovitch.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Only change the driver's copy of the QP attributes in modify QP after
checking the modify QP command completed successfully.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix thinko in rd_atomic calculation: ffs(x) - 1 does not find the next
power of 2 -- it should be fls(x - 1).
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add limit checking on rd_atomic and dest_rd_atomic attributes:
especially for max_dest_rd_atomic, a value that is larger than HCA
capability can cause RDB overflow and corruption of another QP.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Free the memory allocated in mthca_init_user_db_tab() when releasing
the db_tab in mthca_cleanup_user_db_tab().
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Don't leak packet if it had a timeout, and don't leak timeout struct
if queue_packet() fails.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use an increasing local ID to avoid re-using identifiers while
messages may still be outstanding on the old ID. Without this, a
quick connect-disconnect-connect sequence can fail by matching
messages for the new connection with the old connection.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change reject code from TIMEOUT to CONSUMER_REJECT when destroying a
cm_id in the process of connecting.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Unlike tavor, the max work queue size is an exact power of 2 for arbel
mode, despite what the documentation (of the QUERY_DEV_LIM firmware
command) says. Without this patch, on Arbel, we can start with a QP
of a valid size and get above the reported limit after rounding to the
next power of two.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
uverbs needs to track which multicast groups is each qp
attached to, in order to properly detach when cleanup
is performed on device file close.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
On mem-free HCAs, when posting a long list of send requests, a
doorbell must be rung every 255 requests. Add code to handle this.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If ipoib_ib_dev_up() fails after ipoib_ib_dev_open() is called, then
ipoib_ib_dev_stop() needs to be called to clean up.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
race condition: ipoib_ib_dev_flush is accessing child list without locks.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipoib_mcast_alloc() uses kzalloc(), so there's no need to zero out
members of the mcast struct after it's allocated.
Signed-off-by: Roland Dreier <rolandd@cisco.com>