Let attribute group vectors be declared "const". We'd
like to let most attribute metadata live in read-only
sections... this is a start.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The driver was reporting CQE flags in the wrong bit positions, causing
consumers to miss incoming immediate data.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The old code used a lot of hard-coded values, which might not be valid
in all environments (especially routed fabrics or partitioned
subnets). Copy as much information as possible from the incoming
request to correct that.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make port autodetect mode the default for the ehca driver. The
autodetect code has been in the kernel for several releases now and
has proved to be stable.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Increment version number for DMEM toleration.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This prevents the memcpy() of a guid_entries element using a negative index.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Implement toleration of dynamic memory operations and 16 GB gigantic
pages, where "toleration" means that the driver can cope with dynamic
memory operations that happen before the driver is loaded. While the
ehca driver is loaded, dynamic memory operations are still prohibited
by returning NOTIFY_BAD from the memory notifier.
On module load the driver walks through available system memory,
checks for available memory ranges and then registers the kernel
internal memory region accordingly. The translation of address ranges
is implemented via a 3-level busmap.
Signed-off-by: Hannes Hering <hering2@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In the near future, the driver core is going to not allow direct access
to the driver_data pointer in struct device. Instead, the functions
dev_get_drvdata() and dev_set_drvdata() should be used. These functions
have been around since the beginning, so are backwards compatible with
all older kernel versions.
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: general@lists.openfabrics.org
Cc: Christoph Raisch <raisch@de.ibm.com>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
All the fields in the control block are nicely right-aligned, so no
masking is necessary.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The queue map for flush completion circumvention is only used for
kernel space queue pairs. This patch skips the allocation of the
queue maps in case the QP is created for userspace. In addition, this
patch does not iomap the galpas for kernel usage if the queue pair is
only used in userspace. These changes will improve the performance of
creation of userspace queue pairs.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In case of large queue pairs there is the possibillity of allocation
failures due to memory fragmentation when using kmalloc(). To ensure
the memory is allocated even if kmalloc() can not find chunks which
are big enough, we fall back to allocating the memory with vmalloc().
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
To improve performance of driver resource allocation, replace
vmalloc() calls with kmalloc().
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The base versions handle constant folding just fine, use them
directly. The replacements are OK in the include/ files as they are
not exported to userspace so we don't need the __ prefixed versions.
This patch does not affect code generation at all.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ehca_plpar_hcall9() takes an unsigned long array, so make all callers
pass that in. This fixes warnings introduced by commit fe333321
("powerpc: Change u64/s64 to a long long integer type"), which changed
u64 from unsigned long to unsigned long long.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit fe333321 ("powerpc: Change u64/s64 to a long long integer
type") changed u64 from unsigned long to unsigned long long, which
means that printk formats for printing u64 values should use "ll"
instead of "l" to avoid warnings. Fix all the places affected by this
in ehca.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/iser: Add dependency on INFINIBAND_ADDR_TRANS
IPoIB: Do not join broadcast group if interface is brought down
RDMA/nes: Fix for NIPQUAD removal
IPoIB: Fix loss of connectivity after bonding failover on both sides
IB/mlx4: Don't register IB device for adapters with no IB ports
mlx4_core: Fix warning from min()
IB/ehca: spin_lock_irqsave() takes an unsigned long
The flags argument to spin_lock_irqsave() should really be unsigned
long. This will also help prevent some warnings when we change u64 to
unsigned long long.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Impact: cleanup
We're moving from handing around cpumask_t's to handing around struct
cpumask *'s. cpus_*, cpumask_t and cpu_*_map are deprecated: convert
to cpumask_*, cpu_*_mask.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Tested-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Impact: cleanup
In future, accessing cpu numbers beyond nr_cpu_ids (the runtime limit)
will be undefined. We can avoid future problems by using
for_each_online_cpu() here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Tested-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
vpage is checked not to be NULL just after it is initialized at the
beginning of each loop iteration.
A simplified version of the semantic patch that makes this change is
as follows: (http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
expression E;
position p1,p2;
@@
if (x@p1 == NULL || ...) { ... when forall
return ...; }
... when != \(x=E\|x--\|x++\|--x\|++x\|x-=E\|x+=E\|x|=E\|x&=E\|&x\)
(
x@p2 == NULL
|
x@p2 != NULL
)
// another path to the test that is not through p1?
@s exists@
local idexpression r.x;
position r.p1,r.p2;
@@
... when != x@p1
(
x@p2 == NULL
|
x@p2 != NULL
)
@fix depends on !s@
position r.p1,r.p2;
expression x,E;
statement S1,S2;
@@
(
- if ((x@p2 != NULL) || ...)
S1
|
- if ((x@p2 == NULL) && ...) S1
|
- BUG_ON(x@p2 == NULL);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With the latest flush error completion patch we introduced modulus
operation to calculate the next index within a qmap. Based on
comments from other mailing lists we decided to optimize this
operation by using an addition and an if-statement instead of modulus,
even though this is on the error path.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
shca_list_lock is taken from softirq context in ehca_poll_eqs, so we
need to lock IRQ safe elsewhere. Found by lockdep.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This fix enables ehca device driver to generate flush work completions
even if the application doesn't request completions for all work
requests. The current implementation of ehca will generate flush work
completions for the wrong work requests if an application uses non
signaled work completions.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The error message printed when the eHCA driver prevents memory hotplug
is misleading -- the user might think that hot-removing the lhca,
hotplugging memory, then hot-adding the lhca again will work, but it
actually doesn't.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the initialization of a special QP (e.g. AQP1) fails due to a
software timeout, we have to remove the reference to that special QP
struct from the port struct to stop the driver from accessing the QP,
since it will be/has been destroyed by the caller, eg in this case
ib_mad.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Since the ehca device driver does not support dynamic memory add and
remove operations, the driver must explicitly reject such requests in
order to prevent unpredictable behaviors related to existing memory
regions that cover all of memory being used by InfiniBand protocols in
the kernel.
The solution (for now at least) is to add a memory notifier to the
ehca device driver and if a request for dynamic memory add or remove
comes in, ehca will always reject it. The user can add or remove
memory by hot-removing the ehca adapter, performing the memory
operation, and then hot-adding the ehca adapter back.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Because ehca adapters can differ in the maximum number of QPs and CQs
we have to save the maximum number of these ressources per adapter and
not globally per ehca driver. This fix introduces 2 new members to the
shca structure to store the maximum value for QPs and CQs per adapter.
The module parameters are now used as initial values for those
variables. If a user selects an invalid number of CQs or QPs we don't
print an error any longer, instead we will inform the user with a
warning and set the values to the respective maximum supported by the
HW.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch prevents a UC QP to be created attached to an SRQ, since
current firmware does not support this feature.
Signed-off-by: Michael Faath <micfaath@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a QP goes into error state, it is required that CQ entries with a
flush error status are delivered to the application for any
outstanding work requests. eHCA does not do this in hardware, so this
patch adds software flush CQE generation to the ehca driver.
Whenever a QP gets into error state, it is added to the QP error list
of its respective CQ. If the error QP list of a CQ is not empty,
poll_cq() generates flush CQEs before polling the actual CQ.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch lets the files using linux/version.h match the files that
#include it.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Under rare circumstances, the ehca hardware might erroneously generate
two CQEs for the same WQE, which is not compliant to the IB spec and
will cause unpredictable errors like memory being freed twice. To
avoid this problem, the driver needs to detect the second CQE and
discard it.
For this purpose, introduce an array holding as many elements as the
SQ of the QP, called sq_map. Each sq_map entry stores a "reported"
flag for one WQE in the SQ. When a work request is posted to the SQ,
the respective "reported" flag is set to zero. After the arrival of a
CQE, the flag is set to 1, which allows to detect the occurence of a
second CQE.
The mapping between WQE / CQE and the corresponding sq_map element is
implemented by replacing the lowest 16 Bits of the wr_id with the
index in the queue map. The original 16 Bits are stored in the sq_map
entry and are restored when the CQE is passed to the application.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The idr_find() function may fail when trying to get the QP that is
associated with a CQE, e.g. when a QP has been destroyed between the
generation of a CQE and the poll request for it. In consequence, the
return value of idr_find() must be checked and the CQE must be
discarded when the QP cannot be found.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the ehca driver detects an invalid opcode in a CQE, it currently
passes the CQE to the application and returns with success. This patch
changes the CQE handling to discard CQEs with invalid opcodes and to
continue reading the next CQE from the CQ.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Rename the "poll_cq_one_read_cqe" goto label to what it actually does,
namely "repoll".
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Since the introduction of the port auto-detect mode for ehca, calls to
modify_qp() may be cached in the device driver when the ports are not
activated yet. When a modify_qp() call is cached, the qp state remains
untouched until the port is activated, which will leave the qp in the
reset state. In the reset state, however, it is not allowed to post SQ
WQEs, which confuses applications like ib_mad.
The solution for this problem is to immediately set the qp state as
requested by modify_qp(), even when the call is cached.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
from include/asm-powerpc. This is the result of a
mkdir arch/powerpc/include/asm
git mv include/asm-powerpc/* arch/powerpc/include/asm
Followed by a few documentation/comment fixups and a couple of places
where <asm-powepc/...> was being used explicitly. Of the latter only
one was outside the arch code and it is a driver only built for powerpc.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
MAINTAINERS: Remove Glenn Streiff from NetEffect entry
mlx4_core: Improve error message when not enough UAR pages are available
IB/mlx4: Add support for memory management extensions and local DMA L_Key
IB/mthca: Keep free count for MTT buddy allocator
mlx4_core: Keep free count for MTT buddy allocator
mlx4_code: Add missing FW status return code
IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg
mlx4_core: Add module parameter to enable QoS support
RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes
IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one()
IB/ehca: Release mutex in error path of alloc_small_queue_page()
IB/ehca: Use default value for Local CA ACK Delay if FW returns 0
IB/ehca: Filter PATH_MIG events if QP was never armed
IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event
RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
The pd->lock mutex is released on a successful return, so it should be
released on an error return as well.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression l;
@@
mutex_lock(l);
... when != mutex_unlock(l)
when any
when strict
(
if (...) { ... when != mutex_unlock(l)
+ mutex_unlock(l);
return ...;
}
|
mutex_unlock(l);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some firmware versions report a Local CA ACK Delay of 0. In that
case, return a more sensible default value of 12 (-> 16 msec) instead.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Certain firmware versions sometimes cause spurious PATH_MIG events to
occur during QP creation. Filter these events by making sure PATH_MIG
events are only handed down when they actually make sense (i.e. when
the QP has been armed at least once).
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This gives ehca an autogenerated modalias and therefore enables automatic loading.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
During corner case testing, we noticed that some versions of ehca do
not properly transition to interrupt done in special load situations.
This can be resolved by periodically triggering EOI through H_EOI, if
EQEs are pending.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement). The new operations are:
- Allocate an ib_mr for use in fast register work requests.
- Allocate/free a physical buffer lists for use in fast register work
requests. This allows device drivers to allocate this memory as
needed for use in posting send requests (eg via dma_alloc_coherent).
- New send queue work requests:
* send with remote invalidate
* fast register memory region
* local invalidate memory region
* RDMA read with invalidate local memory region (iWARP only)
Consumer interface details:
- A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
to indicate device support for these features.
- New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
IB_WR_RDMA_READ_WITH_INV are added.
- A new consumer API function, ib_alloc_mr() is added to allocate
fast register memory regions.
- New consumer API functions, ib_alloc_fast_reg_page_list() and
ib_free_fast_reg_page_list() are added to allocate and free
device-specific memory for fast registration page lists.
- A new consumer API function, ib_update_fast_reg_key(), is added to
allow the key portion of the R_Key and L_Key of a fast registration
MR to be updated. Consumers call this if desired before posting
a IB_WR_FAST_REG_MR work request.
Consumers can use this as follows:
- MR is allocated with ib_alloc_mr().
- Page list memory is allocated with ib_alloc_fast_reg_page_list().
- MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().
- MR made VALID and bound to a specific page list via
ib_post_send(IB_WR_FAST_REG_MR)
- MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
invalidate operation.
- MR is deallocated with ib_dereg_mr()
- page lists dealloced via ib_free_fast_reg_page_list().
Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ). For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.
The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key. The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is necessary because, in a multicore environment, a race between
uverbs async handler and destroy QP could occur.
Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Also remove duplicate assignment of local_ca_ack_delay and change
min_t check for local_ca_ack_delay to u8 instead of int.
Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a lot of QPs fall into Error state at once and the EQ of the
respective HCA is too small, it might overrun, causing the eHCA driver
to stop processing completion events and calling the application's
completion handlers, effectively causing traffic to stop.
Fix this by limiting available QPs and CQs to a customizable max
count, and determining EQ size based on these counts and a worst-case
assumption.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ehca_create_eq() was assigning a signed return value to an unsiged
local variable and then checking if the variable was < 0, which meant
that errors were always ignored. Fix this by using one variable for
signed integer return values and another for u64 hcall return values.
Bug originally found by Roel Kluin <12o3l@tiscali.nl>.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1
when mapping user-allocated CQs with ib_umem_get().
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Always enable large page support; didn't seem to cause problems for anyone.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Also, introduce a few inline helper functions to make the code more readable.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions. Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated. Add this new union to struct
ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().
Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.
Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit. The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing). Remove the flag from
all existing drivers that set it until we know which ones are OK.
The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.
This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a create_flags member to struct ib_qp_init_attr that will allow a
kernel verbs consumer to create a pass special flags when creating a QP.
Add a flag value for telling low-level drivers that a QP will be used
for IPoIB UD LSO. The create_flags member will also be useful for XRC
and ehca low-latency QP support.
Since no create_flags handling is implemented yet, add code to all
low-level drivers to return -EINVAL if create_flags is non-zero.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Pavel Emelyanov <xemul@openvz.org> mentioned in <http://lkml.org/lkml/2008/3/17/131>
that the task_struct->tgid field is about to become deprecated, so the
uses in the ehca driver need to be fixed up.
However, all the uses in ehca are for some object ownership checking
that is not really needed, and anyway is implementing a policy that
should be in common code rather than a low-level driver. So just
remove all the checks.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
__FUNCTION__ is gcc-specific, use __func__ instead.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Allow the compiler to optimize better and generate smaller code:
add/remove: 0/6 grow/shrink: 2/0 up/down: 1528/-1864 (-336)
function old new delta
.ehca_set_pagebuf 1344 2172 +828
.ehca_probe 2312 3012 +700
ehca_set_pagebuf_phys 24 - -24
ehca_set_pagebuf_fmr 24 - -24
ehca_init_device 24 - -24
.ehca_set_pagebuf_fmr 480 - -480
.ehca_set_pagebuf_phys 512 - -512
.ehca_init_device 800 - -800
Also this fixes warnings like:
drivers/infiniband/hw/ehca/ehca_mrmw.c:2015:5: warning: symbol 'ehca_set_pagebuf_fmr' was not declared. Should it be static?
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch enables ehca to redirect any PMA queries to the
actual PMA QP.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Reviewed-by: Joachim Fenkes <fenkes@de.ibm.com>
Reviewed-by: Christoph Raisch <raisch@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IB spec doesn't allow packets to QP0 sent on any other VL than VL15.
Hardware doesn't filter those packets on the send side, so we need to do
this in the driver and firmware.
As eHCA doesn't support QP0, we can just filter out all traffic going to
QP0, regardless of SL or VL.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some HW revisions of eHCA2 may cause an RC connection to break if they
received RDMA Reads over that connection before. This can be
prevented by assuring that, after the first RDMA Read, the QP receives
a new RDMA Read every few million link packets.
Include code into the driver that inserts an empty (size 0) RDMA Read
into the message stream every now and then if the consumer doesn't
post them frequently enough.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch enhances ehca with a capability to "autodetect" the ports
being connected physically. In order to utilize that function the
module option nr_ports must be set to -1 (default is 2 - two
ports). This feature is experimental and will made the default later.
More detail:
If the user connects only one port to the switch, current code requires
1) port one to be connected and
2) module option nr_ports=1 to be given.
If autodetect is enabled, ehca will not wait at creation of the GSI QP
for the respective port to become active. Since firmware does not
accept modify_qp() while the port is down at initialization, we need
to cache all calls to modify_qp() for the SMI/GSI QP and just return a
good return code.
When a port is activated and we get a PORT_ACTIVE event, we replay the
cached modify-qp() parms and re-trigger any posted recv WRs. Only then
do we forward the PORT_ACTIVE event to registered clients.
The result of this autodetect patch is that all ports will be
accessible by the users. Depending on their respective cabling only
those ports that are connected properly will become operable. If a
user tries to modify a regular QP of a non-connected port, modify_qp()
will fail. Furthermore, ibv_devinfo should show the port state
accordingly.
Note that this patch primarily improves the loading behaviour of
ehca. If the cable is removed while the driver is operating and
plugged in again, firmware will handle that properly by sending an
appropriate async event.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use round_jiffies() to align ehca's 1-second timer with other timers
and potentially save power by sleeping cores for longer.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch allows ehca to forward event client-reregister-required to
registered clients. One such event is generated by a switch eg. after
its reboot.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make the ipath driver use the new driver functions so that it does not
touch the sysfs portion of the driver structure.
We also remove the redundant symlink from the device back to the driver,
as it is already in the sysfs tree. Any userspace tools should be using
the standard symlink, not some driver specific one.
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: Arthur Jones <arthur.jones@qlogic.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Several pSeries firmware versions share a rare locking issue in the
HCA-related hCalls. Check for a feature flag that indicates the issue
being fixed and serialize all HCA hCalls if not.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Firmware would round up the number of SGEs to four, because the WQE
structure holds four SGEs. For SRQ, only three are supported, so return
a fixed value instead.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The formula would yield -1 if the path is faster than the link, which
is wrong in a bad way (max throttling). Clamp to 0, which is the
correct value.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Wrong choice of port number caused modify_qp() to fail -- fixed.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IPD (inter-packet delay) formula was a little off and assumed a
fixed physical link rate; fix the formula and query the actual
physical link rate, now that we can get it. Also, refactor the
calculation into a common function ehca_calc_ipd() and use that
instead of duplicating code.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Newer firmware versions return physical port information to the
partition, so hand that information to the consumer if it's present.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
IPoIB/cm: Use common CQ for CM send completions
IB/uverbs: Fix checking of userspace object ownership
IB/mlx4: Sanity check userspace send queue sizes
IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
IB/ehca: Enable large page MRs by default
IB/ehca: Change meaning of hca_cap_mr_pgsize
IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
IB/ehca: Fix masking error in {,re}reg_phys_mr()
IB/ehca: Supply QP token for SRQ base QPs
IPoIB: Use round_jiffies() for ah_reap_task
RDMA/cma: Fix deadlock destroying listen requests
RDMA/cma: Add locking around QP accesses
IB/mthca: Avoid alignment traps when writing doorbells
mlx4_core: Kill mlx4_write64_raw()
More fallout from sg_page changes:
drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_set_pagebuf_user1':
drivers/infiniband/hw/ehca/ehca_mrmw.c:1779: error: 'struct scatterlist' has no member named 'page'
drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_check_kpages_per_ate':
drivers/infiniband/hw/ehca/ehca_mrmw.c:1835: error: 'struct scatterlist' has no member named 'page'
drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_set_pagebuf_user2':
drivers/infiniband/hw/ehca/ehca_mrmw.c:1870: error: 'struct scatterlist' has no member named 'page'
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Found these while looking at printk uses.
Add missing newlines to dev_<level> uses
Add missing KERN_<level> prefixes to multiline dev_<level>s
Fixed a wierd->weird spelling typo
Added a newline to a printk
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Smart <James.Smart@Emulex.Com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ehca_shca.hca_cap_mr_pgsize now contains all supported page sizes ORed
together. This makes some checks easier to code and understand, plus
we can return this value verbatim in query_hca(), fixing a problem
with SRP (reported by Anton Blanchard -- thanks!).
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Simplify ehca_encode_hwpage_size(), fixing an infinite loop for pgsize == 0
in the process. Fix the bug in alloc_fmr() that triggered the loop.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Because hardware reports the SRQ token in RWQEs of SRQ base QPs, supply the
base QP token as SRQ token, so we can properly find the SRQ base QP.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Replace struct ibmebus_dev and struct ibmebus_driver with struct of_device
and struct of_platform_driver, respectively. Match the external ibmebus
interface and drivers using it.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Doing min_t(int, foo, INT_MAX) doesn't work correctly, because if foo
is bigger than INT_MAX, then when treated as a signed integer, it will
become negative and hence such an expression is just an elaborate NOP.
Fix such cases in ehca to do min_t(unsigned, foo, INT_MAX) instead.
This fixes negative reported values for max_cqe, max_pd and max_ah:
Before:
max_cqe: -64
max_pd: -1
max_ah: -1
After:
max_cqe: 2147483647
max_pd: 2147483647
max_ah: 2147483647
Based on a bug report and fix from Anton Blanchard <anton@samba.org>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* Replace {un}register_cpu_notifier with {un}register_hotcpu_notifier
thereby losing a couple of #ifdef HOTPLUG_CPU pairs.
* Move comp_pool_callback_nb declaration to below that of callback
function so that initialization of .notifier_call and .priority can
occur at build time itself and not runtime.
* Mark the notifier_block (and callback function, and another static
function used by it) as __cpuinit{data} for the sake of consistency
and remove enclosing #ifdef. (This may increase size for modular
build of this module, however, because these are no longer dropped
unconditionally now.)
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>