* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mlx4: Check correct variable for allocation failure
RDMA/nes: Correct cap.max_inline_data assignment in nes_query_qp()
RDMA/cm: Set num_paths when manually assigning path records
IB/cm: Fix device_create() return value check
cap.max_inline_data is incorrectly set in init_attr instead of attr.
Set it in attr so subsequent init_attr.cap assignment will get the
correct value.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Set assume_aligned_header bit in QP context as requested by hardware group.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change the nes driver to return -ENOMEM on SQ/RQ overflow to match the
return code of other RDMA HW drivers (e.g cxgb3, ehca, mlx4, mthca).
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Acked-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We fail when creating many qps as kmap() fails for sq_vbase.
Fix this by doing kunmap() as soon as we are done with sq_vbase.
We do kunmap() in one of the locations below:
(1) nes_destroy_qp()
(2) nes_accept()
(3) nes_connect_event
We keep a flag to avoid multiple calls to kunmap().
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
STags are generated randomly but the driver does not correctly prevent
a zero STag. Using STag zero is privileged and causes a user space
application to fail. This change prevents the driver from trying to
allocate a zero STag.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The ORD size needs updating as we are supporting more inbound READ
resources per connection.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Update copyright from Intel-NE, Inc. to Intel Corporation. Use proper
branding string in Kconfig and simplify description.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a check to nes_create_cq() to return -EINVAL if creating a CQ with
depth > max_cqe (32766).
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add IB_SINGAL_ALL_WR support as an iWARP extension. If set, make sure
all WR for the QP are signalled. Consolidate flags used in nesqp
structure.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for IB_WR_SEND_WITH_INV, IB_WR_RDMA_READ_WITH_INV
and IB_WR_LOCAL_INV.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
On error, set bad_wr in nes_post_recv(). Stop processing ib_wr queue
when an error is detected.
Signed-off-by: Frank Zago <fzago@systemfabricworks.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
On error, set bad_wr in nes_post_send(). Stop processing ib_wr queue
when an error is detected.
Signed-off-by: Frank Zago <fzago@systemfabricworks.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Old query_port code reports static MTU and link state values.
Instead, map actual MTU to next largest IB_MTU_* constant and
correctly report link state.
Cc: Steve Wise <swise@opengridcomputing.com>
Reported-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use the flush status to fill in cqe status when a specific error has
been identified. Subsequent flushed completions still use the flushed
value.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a flush request is given to the hw, it will place one cqe marked
as flushed (unless there is nothing to flush). An application that is
waiting for all wqe's to complete will be left hanging. This modifies
poll_cq to return the correct number of flushes for the pending
elements on the wq.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Implement the sending and receiving of Terminate packets.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a QP is destroyed, unprocessed CQ entries could still reference
the QP. This change zeroes the context value at QP destroy time. By
skipping over cqe's with a zero context, poll_cq no longer processes a
cqe for a destroyed QP.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In nes_query_device(), max_qp_init_rd_atom is incorrectly set to
max_qp_wr. This was found when a test application had a dapl async
event error.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
/sys/class/infiniband/nes?/fw_ver is not displaying firmware version
properly (it shows 0.0.0 with the current code). Fill in the correct
firmware version number.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In error paths where a CQ is not created, pbl is not freeed properly.
In nes_destroy_cq(), add the corresponding check for nescq->mcrqf to
not call nes_free_resource() when it is already done in nes_create_cq().
Signed-off-by: Miroslaw Walukiewicz <miroslaw.walukiewicz@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The code incorrectly failed memory registration if the buffer was not
page aligned. Also, the length field is mangled causing the hardware
to think the registration is much larger than it really is.
The fix is to remove the page alignment restriction as well the
incorrect length adjustment. Also make sure that all buffers after
the first start at a page boundary, and all buffers except the last
end on a page boundary.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Initialize pbl_count_256 to 0 to get rid of the warning:
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_reg_mr':
drivers/infiniband/hw/nes/nes_verbs.c:1955: warning: 'pbl_count_256' may be used uninitialized in this function
Reported-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
RDMA/cxgb3: Enforce required firmware
IB/mlx4: Unregister IB device prior to CLOSE PORT command
mlx4_core: Add link type autosensing
mlx4_core: Don't perform SET_PORT command for Ethernet ports
RDMA/nes: Handle MPA Reject message properly
RDMA/nes: Improve use of PBLs
RDMA/nes: Remove LLTX
RDMA/nes: Inform hardware that asynchronous event has been handled
RDMA/nes: Fix tmp_addr compilation warning
RDMA/nes: Report correct vendor_id and vendor_part_id
RDMA/nes: Update copyright to new legal entity and year
RDMA/nes: Account for freed PBL after HW operation
IB: Remove useless ibdev_is_alive() tests from sysfs code
IB/sa_query: Fix AH leak due to update_sm_ah() race
IB/mad: Fix ib_post_send_mad() returning 0 with no generate send comp
IB/mad: initialize mad_agent_priv before putting on lists
IB/mad: Fix null pointer dereference in local_completions()
IB/mad: Fix RMPP header RRespTime manipulation
IB/iser: Remove hard setting of path MTU
mlx4_core: Add device IDs for MT25458 10GigE devices
...
STag zero is a special STag that allows consumers to access any bus
address without registering memory. The nes driver unfortunately
allows STag zero to be used even with QPs created by unprivileged
userspace consumers, which means that any process with direct verbs
access to the nes device can read and write any memory accessible to
the underlying PCI device (usually any memory in the system). Such
access is usually given for cluster software such as MPI to use, so
this is a local privilege escalation bug on most systems running this
driver.
The driver was using STag zero to receive the last streaming mode
data; to allow STag zero to be disabled for unprivileged QPs, the
driver now registers a special MR for this data.
Cc: <stable@kernel.org>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two level 256 byte PBLs was not implemented so the driver could report
out of memory when in fact there were PBLs still available.
This solution prefers to use 4KB PBLs over two level 256B PBLs until
the number of 4KB PBLs falls below a threshold. At this point the 4KB
PBL structure is converted to use 256B PBLs which prevents the driver
from running out of 4KB PBLs too quickly.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Update copyright to the new legal entity, Intel-NE, Inc., an Intel
company. Update copyright for the new year.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix occurrences where the software PBL counts were changed before the
hardware was updated. This bug allowed another thread to overallocate
the hardware resources.
Add proper PBL accounting in case nes_reg_mr() fails.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use nes_free_cqp_request() instead of open coding. Change some
continue to break in nes_cm_timer_tick, because send_entry used to be
a list processed in a loop (so continue went to the next item). Now
it is a single item, so using break is correct.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix CQ allocation for multicast receive queue applications. Before
this patch, the CQ was not lined up with the right NIC.
Signed-off-by: Vadim Makhervaks <vadim.makhervaks@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* Roll back allocated structures on failures.
* Use GFP_ATOMIC instead of GFP_KERNEL since we are holding a lock.
* Acquire nesadapter->pbl_lock when modifying PBL counters.
* Decrement PBL counters on deallocation.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A break after a return serves no purpose, remove it.
Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Major rework of CM connection setup/teardown. We had a number of issues
with MPI applications not starting/terminating properly over time.
With these changes we were able to run longer on larger clusters.
* Remove memory allocation from nes_connect() and nes_cm_connect().
* Fix mini_cm_dec_refcnt_listen() when destroying listener.
* Remove unnecessary code from schedule_nes_timer() and nes_cm_timer_tick().
* Functionalize mini_cm_recv_pkt() and process_packet().
* Clean up cm_node->ref_count usage.
* Reuse skbs if available.
Signed-off-by: Faisal Latif <flatif@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Every caller of nes_post_cqp_request() passed it NES_CQP_REQUEST_RING_DOORBELL,
so just remove that parameter and always ring the doorbell.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Faisal Latif <flatif@neteffect.com>
nes_reg_user_mr() should fail if page_count becomes >= 1024 * 512
rather than just testing for strict >, because page_count is
essentially used as an index into an array with 1024 * 512 entries, so
allowing the loop to continue with page_count == 1024 * 512 means that
memory after the end of the array is corrupted. This leads to a crash
triggerable by a userspace application that requests registration of a
too-big region.
Also get rid of the call to pci_free_consistent() here to avoid
corrupting state with a double free, since the same memory will be
freed in the code jumped to at reg_user_mr_err.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Various cleanups:
- Change // to /* .. */
- Place whitespace around binary operators.
- Trim down a few long lines.
- Some minor alignment formatting for better readability.
- Remove some silly tabs.
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1
when mapping user-allocated CQs with ib_umem_get().
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the volatile qualifier from the cq_vbase member of struct
nes_hw_cq, and add an rmb() in the one place where it looks like
access order might make a difference. As usual, removing a volatile
qualifier in a declaration is actually a bug fix, since a volatile
qualifier is not sufficient to make sure that aggressively
out-of-order CPUs don't reorder things and cause incorrect results.
For example, a CPU might speculatively execute reads of other cqe
fields before the NIC hardware has written those fields and before it
has set the NES_CQE_VALID bit (even though those reads come after the
test of the NES_CQE_VALID bit in program order), but then when the CPU
actually executes the conditional test of the NES_CQE_VALID, the bit
has been set, and the CPU will proceed with the results of the earlier
speculative execution and end up using bogus data.
This also gets rid of the warning:
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_destroy_cq':
drivers/infiniband/hw/nes/nes_verbs.c:1978: warning: passing argument 3 of 'pci_free_consistent' discards qualifiers from pointer target type
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This converts the main ib_device to use struct device instead of struct
class_device as class_device is going away.
Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add a create_flags member to struct ib_qp_init_attr that will allow a
kernel verbs consumer to create a pass special flags when creating a QP.
Add a flag value for telling low-level drivers that a QP will be used
for IPoIB UD LSO. The create_flags member will also be useful for XRC
and ehca low-latency QP support.
Since no create_flags handling is implemented yet, add code to all
low-level drivers to return -EINVAL if create_flags is non-zero.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
__FUNCTION__ is gcc-specific, use __func__ instead.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
On some platforms, eg sparc64, dma_addr_t is not the same size as a
pointer, so printing dma_addr_t values by casting to void * and using
a %p format generates warnings. Fix this by casting to unsigned long
and using %lx instead. This fixes the warnings:
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_setup_virt_qp':
drivers/infiniband/hw/nes/nes_verbs.c:1047: warning: cast to pointer from integer of different size
drivers/infiniband/hw/nes/nes_verbs.c:1078: warning: cast to pointer from integer of different size
drivers/infiniband/hw/nes/nes_verbs.c:1078: warning: cast to pointer from integer of different size
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_reg_user_mr':
drivers/infiniband/hw/nes/nes_verbs.c:2657: warning: cast to pointer from integer of different size
Reported by Andrew Morton <akpm@linux-foundation.org>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
nes_unregister_ofa_device() dereferences the nesibdev pointer before
testing if it's NULL. Also, the test is doubly redundant because the
only caller of nes_unregister_ofa_device() is nes_destroy_ofa_device(),
which already tests if nesibdev is NULL. Remove the unnecessary test.
This was spotted by the Coverity checker (CID 2190).
Signed-off-by: Roland Dreier <rolandd@cisco.com>