Commit Graph

421 Commits

Author SHA1 Message Date
Linus Torvalds b8ba452683 Round two of 4.6 merge window patches
- A few minor core fixups needed for the next patch series
 - The IB SRIOV series.  This has bounced around for several versions.
   Of note is the fact that the first patch in this series effects
   the net core.  It was directed to netdev and DaveM for each iteration
   of the series (three versions total).  Dave did not object, but did
   not respond either.  I've taken this as permission to move forward
   with the series.
 - The new Intel X722 iWARP driver
 - A huge set of updates to the Intel hfi1 driver.  Of particular interest
   here is that we have left the driver in staging since it still has an
   API that people object to.  Intel is working on a fix, but getting
   these patches in now helps keep me sane as the upstream and Intel's
   trees were over 300 patches apart.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW8HR9AAoJELgmozMOVy/dDYMP+wSBALhIdV/pqVzdLCGfIUbK
 H5agonm/3b/Oj74W30w2JYqXBFfZC2LGVJy6OwocJ3wK04v/KfZbA9G+QsOuh2hQ
 Db+tFn1eoltvzrcx3k/a7x6zHGC4YyxyH9OX2B3QfRsNHeE7PG9KGp5dfEs2OH1r
 WGp3jMLAsHf7o8uKpa0jyTEUEErATaTlG+YoaJ+BGHwurgCNy8ni+wAn+EAFiJ3w
 iEJhcXB6KY69vkLsrLYuT9xxJn4udFJ3QEk8xdPkpLKsu+6Ue5i/eNQ19VfbpZgR
 c6fTc8genfIv5S+fis+0P44u1oA7Kl2JT6IZYLi35gJ60ZmxTD+7GruWP3xX/wJ2
 zuR3sTj5fjcFWenk087RSIU/EK87ONPD4g9QPdZpf3FtgleTVKk3YDlqwjqf8pgv
 cO6gQ1BcOBnixJvhjNFiX1c2hvNhb3CkgObly1JBwhcCzZhLkV7BNFPbZuDHAeAx
 VqzNEUse4hupkgiiuiGgudcJ4fsSxMW37kyfX9QC/qyk6YVuUDbrekcWI+MAKot7
 5e5dHqFExpbn1Zgvc8yfvh88H2MUQAgaYwjanWF/qpppOPRd01nTisVQIOJn7s5C
 arcWzvocpQe0GL2UsvDoWwAABXznL3bnnAoCyTWOES2RhOOcw0Ibw46Jl8FQ8gnl
 2IRxQ+ltNEscb2cwi5wE
 =t2Ko
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
 "Round two of 4.6 merge window patches.

  This is a monster pull request.  I held off on the hfi1 driver updates
  (the hfi1 driver is intimately tied to the qib driver and the new
  rdmavt software library that was created to help both of them) in my
  first pull request.  The hfi1/qib/rdmavt update is probably 90% of
  this pull request.  The hfi1 driver is being left in staging so that
  it can be fixed up in regards to the API that Al and yourself didn't
  like.  Intel has agreed to do the work, but in the meantime, this
  clears out 300+ patches in the backlog queue and brings my tree and
  their tree closer to sync.

  This also includes about 10 patches to the core and a few to mlx5 to
  create an infrastructure for configuring SRIOV ports on IB devices.
  That series includes one patch to the net core that we sent to netdev@
  and Dave Miller with each of the three revisions to the series.  We
  didn't get any response to the patch, so we took that as implicit
  approval.

  Finally, this series includes Intel's new iWARP driver for their x722
  cards.  It's not nearly the beast as the hfi1 driver.  It also has a
  linux-next merge issue, but that has been resolved and it now passes
  just fine.

  Summary:

   - A few minor core fixups needed for the next patch series

   - The IB SRIOV series.  This has bounced around for several versions.
     Of note is the fact that the first patch in this series effects the
     net core.  It was directed to netdev and DaveM for each iteration
     of the series (three versions total).  Dave did not object, but did
     not respond either.  I've taken this as permission to move forward
     with the series.

   - The new Intel X722 iWARP driver

   - A huge set of updates to the Intel hfi1 driver.  Of particular
     interest here is that we have left the driver in staging since it
     still has an API that people object to.  Intel is working on a fix,
     but getting these patches in now helps keep me sane as the upstream
     and Intel's trees were over 300 patches apart"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits)
  IB/ipoib: Allow mcast packets from other VFs
  IB/mlx5: Implement callbacks for manipulating VFs
  net/mlx5_core: Implement modify HCA vport command
  net/mlx5_core: Add VF param when querying vport counter
  IB/ipoib: Add ndo operations for configuring VFs
  IB/core: Add interfaces to control VF attributes
  IB/core: Support accessing SA in virtualized environment
  IB/core: Add subnet prefix to port info
  IB/mlx5: Fix decision on using MAD_IFC
  net/core: Add support for configuring VF GUIDs
  IB/{core, ulp} Support above 32 possible device capability flags
  IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
  net/mlx5_core: Introduce offload arithmetic hardware capabilities
  net/mlx5_core: Refactor device capability function
  net/mlx5_core: Fix caching ATOMIC endian mode capability
  ib_srpt: fix a WARN_ON() message
  i40iw: Replace the obsolete crypto hash interface with shash
  IB/hfi1: Add SDMA cache eviction algorithm
  IB/hfi1: Switch to using the pin query function
  IB/hfi1: Specify mm when releasing pages
  ...
2016-03-22 15:48:44 -07:00
Mitko Haralanov 5511d78107 IB/hfi1: Add SDMA cache eviction algorithm
This commit adds a cache eviction algorithm for the SDMA
user buffer cache.

Besides the interval RB tree used for node lookup, the cache
nodes are also arranged in a doubly-linked list. When a node is
used, it is put at the beginning of the list. Less frequently
used nodes naturally move to the tail of the list.

When the cache limit is reached, the eviction code starts
traversing the linked list in reverse, freeing buffers until
enough space has been freed to fit the new user buffer. This
guarantees that only the least used cache nodes will be removed
from the cache.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:25 -04:00
Mitko Haralanov a7922f7ddf IB/hfi1: Switch to using the pin query function
Use the new function to query whether the expected receive
user buffer can be pinned successfully. This requires that
a new variable be added to the hfi1_filedata structure used
to hold the number of pages pinned by the expected receive
code.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:25 -04:00
Mitko Haralanov bd3a8947de IB/hfi1: Specify mm when releasing pages
This change adds a pointer to the process mm_struct when
calling hfi1_release_user_pages().

Previously, the function used the mm_struct of the current
process to adjust the number of pinned pages. However, is some
cases, namely when unpinning pages due to a MMU notifier call,
we want to drop into that code block as it will cause a deadlock
(the MMU notifiers take the process' mmap_sem prior to calling
the callbacks).

By allowing to caller to specify the pointer to the mm_struct,
the caller has finer control over that part of hfi1_release_user_pages().

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:25 -04:00
Mitko Haralanov 2c97ce4f3c IB/hfi1: Add pin query function
System administrators can use the locked memory
ulimit setting to set the maximum amount of memory
a user can lock/pin. However, this setting alone is not
enough to guarantee good operation of the hfi1 driver
due to the fact that the setting does not have fine
enough granularity to account for the limit being used
by multiple user processes and caches.

Therefore, a better limiting algorithm is needed. This
is where the new hfi1_can_pin_pages() function and the
cache_size module parameter come in.

The function works by looking at the ulimit and cache_size
value to compute a cache size. The algorithm examines the
ulimit value and, if it is not "unlimited", computes a
per-cache limit based on the number of configured user
contexts.

After that, the lower of the two - cache_size and computed
per-cache limit - is used.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:24 -04:00
Mitko Haralanov 5cd3a88d7f IB/hfi1: Implement SDMA-side buffer caching
Add support for caching of user buffers used for SDMA
transfers. This change improves performance by
avoiding repeatedly pinning the pages of buffers, which
are being re-used by the application.

While the cost of the pinning operation has been made
heavier by adding the extra code to search the cache tree,
re-allocate pages arrays, and future cache evictions,
that cost will be amortized against the savings when the
same buffer is re-used. It is also worth noting that in
most cases, the cost of pinning should be much lower due
to the buffer already being in the cache.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:24 -04:00
Mitko Haralanov a489876010 IB/hfi1: Adjust last address values for intervals
Last address values for intervals in the interval RB tree
nodes should be non-inclusive in order to avoid confusing
ranges.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:23 -04:00
Mitko Haralanov 0f310a00e0 IB/hfi1: Add filter callback
This commit adds a filter callback, which can be used to filter
out interval RB nodes matching a certain interval down to a
single one.

This is needed for the upcoming SDMA-side caching where buffers
will need to be filtered by their virtual address.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:23 -04:00
Mitko Haralanov b8718e2e2e IB/hfi1: Remove compare callback
Interval RB trees provide their own searching function,
which also takes care of determining the path through
the tree that should be taken.

This make the compare callback unnecessary.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:23 -04:00
Mitko Haralanov 353b71c7c0 IB/hfi1: Add MMU tracing
Add a new tracepoint type for the MMU functions and calls
to that tracepoint to allow tracing of MMU functionality.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:22 -04:00
Mitko Haralanov df5a00f81d IB/hfi1: Use interval RB trees
The interval RB trees can handle RB nodes which
hold ranged information. This is exactly the usage
for the buffer cache implemented in the expected
receive code path.

Convert the MMU/RB functions to use the interval RB
tree API. This will help with future users of the
caching API, as well.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:22 -04:00
Mitko Haralanov 909e2cd004 IB/hfi1: Notify remove MMU/RB callback of calling context
Tell the remove MMU/RB callback if it's being called as
part of a memory invalidation or not. This can be important
in preventing a deadlock if the remove callback attempts to
take the map_sem semaphore because the kernel's MMU
invalidation functions have already taken it.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:22 -04:00
Mitko Haralanov 368f2b59d0 IB/hfi1: Remove the use of add/remove RB function pointers
The usage of function pointers for RB node insertion
and removal in the expected receive code path was
meant to be a small performance optimization. However,
maintaining it, especially with the new MMU API, would
become more troublesome as the API is extended.

Since the performance optimization is minor, remove the
function pointers and replace with direct calls.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:21 -04:00
Mitko Haralanov eef9c896a9 IB/hfi1: Allow remove MMU callbacks to free nodes
In order to allow the remove MMU callbacks to free the
RB nodes, it is necessary to prevent any references to
the nodes after the remove callback has been called.

Therefore, remove the node from the tree prior to calling
the callback. In other words, the MMU/RB API now guarantees
that all RB node operations it performs will be done prior
to calling the remove callback and that the RB node will
not be touched afterwards.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:21 -04:00
Mitko Haralanov 4b00d9490f IB/hfi1: Prevent NULL pointer dereference
Prevent a potential NULL pointer dereference (found
by code inspection) when unregistering an MMU handler.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:20 -04:00
Mitko Haralanov c81e1f6452 IB/hfi1: Allow MMU function execution in IRQ context
Future users of the MMU/RB functions might be searching or
manipulating the MMU RB trees in interrupt context. Therefore,
the MMU/RB functions need to be able to run in interrupt
context. This requires that we use the IRQ-aware API for
spin locks.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:20 -04:00
Mitko Haralanov 06e0ffa693 IB/hfi1: Re-factor MMU notification code
The MMU notification code added to the
expected receive side has been re-factored and
split into it's own file. This was done in
order to make the code more general and, therefore,
usable by other parts of the driver.

The caching behavior remains the same. However,
the handling of the RB tree (insertion, deletions,
and searching) as well as the MMU invalidation
processing is now handled by functions in the
mmu_rb.[ch] files.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:54:59 -04:00
Mike Marciniszyn d0e859c328 IB/hfi1: Enable adaptive pio by default
Set the piothreshold to the agreed upon default of 256B.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:21 -04:00
Mike Marciniszyn 47177f1bac IB/hfi1: Fix adaptive pio packet corruption
The adaptive pio heuristic missed a case that causes a corrupted
packet on the wire.

The case is if SDMA egress had been chosen for a pio-able packet and
then encountered a ring space wait, the packet is queued.   The sge
cursor had been incremented as part of the packet build out for SDMA.

After the send engine restart, the heuristic might now chose pio based
on the sdma count being zero and start the mmio copy using the already
incremented sge cursor.

Fix this by forcing SDMA egress when the SDMA descriptor has already
been built.

Additionally, the code to wait for a QPs pio count to zero when
switching to SDMA was missing.  Add it.

There is also an issue with UD QPs, in that the different SLs can pick
a different egress send context.  For now, just insure the UD/GSI
always go through SDMA.

Reviewed-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:21 -04:00
Mike Marciniszyn cef504c5c0 IB/hfi1: Fix panic in adaptive pio
The following panic occurs while running ib_send_bw -a with
adaptive pio turned on:

[ 8551.143596] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 8551.152986] IP: [<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
[ 8551.160926] PGD 80db21067 PUD 80bb45067 PMD 0
[ 8551.166431] Oops: 0000 [#1] SMP
[ 8551.276725] task: ffff880816bf15c0 ti: ffff880812ac0000 task.ti: ffff880812ac0000
[ 8551.285705] RIP: 0010:[<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
[ 8551.296462] RSP: 0018:ffff880812ac3b58  EFLAGS: 00010282
[ 8551.303029] RAX: 000000000000002d RBX: 0000000000000000 RCX: 0000000000000800
[ 8551.311633] RDX: ffff880812ac3c08 RSI: 0000000000000000 RDI: ffff8800b6665e40
[ 8551.320228] RBP: ffff880812ac3ba0 R08: 0000000000001000 R09: ffffffffa09039a0
[ 8551.328820] R10: ffff880817a0c000 R11: 0000000000000000 R12: ffff8800b6665e40
[ 8551.337406] R13: ffff880817a0c000 R14: ffff8800b6665800 R15: ffff8800b6665e40
[ 8551.355640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8551.362674] CR2: 0000000000000000 CR3: 000000080abe8000 CR4: 00000000001406e0
[ 8551.371262] Stack:
[ 8551.374119]  ffff880812ac3bf0 ffff88080cf54010 ffff880800000800 ffff880812ac3c08
[ 8551.383036]  ffff8800b6665800 ffff8800b6665e40 0000000000000202 ffffffffa08e7b80
[ 8551.391941]  00000001007de431 ffff880812ac3bc8 ffffffffa0904645 ffff8800b6665800
[ 8551.400859] Call Trace:
[ 8551.404214]  [<ffffffffa08e7b80>] ? hfi1_del_timers_sync+0x30/0x30 [hfi1]
[ 8551.412417]  [<ffffffffa0904645>] hfi1_verbs_send+0x215/0x330 [hfi1]
[ 8551.420154]  [<ffffffffa08ec126>] hfi1_do_send+0x166/0x350 [hfi1]
[ 8551.427618]  [<ffffffffa055a533>] rvt_post_send+0x533/0x6a0 [rdmavt]
[ 8551.435367]  [<ffffffffa050760f>] ib_uverbs_post_send+0x30f/0x530 [ib_uverbs]
[ 8551.443999]  [<ffffffffa0501367>] ib_uverbs_write+0x117/0x380 [ib_uverbs]
[ 8551.452269]  [<ffffffff815810ab>] ? sock_recvmsg+0x3b/0x50
[ 8551.459071]  [<ffffffff81581152>] ? sock_read_iter+0x92/0xe0
[ 8551.466068]  [<ffffffff81212857>] __vfs_write+0x37/0x100
[ 8551.472692]  [<ffffffff81213532>] ? rw_verify_area+0x52/0xd0
[ 8551.479682]  [<ffffffff81213782>] vfs_write+0xa2/0x1a0
[ 8551.486089]  [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
[ 8551.493891]  [<ffffffff812146c5>] SyS_write+0x55/0xc0
[ 8551.500220]  [<ffffffff816ae0ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 8551.531284] RIP  [<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
[ 8551.539508]  RSP <ffff880812ac3b58>
[ 8551.544110] CR2: 0000000000000000

The priv s_sendcontext pointer was not setup properly.  Fix with this
patch by using the s_sendcontext and eliminating its send engine use.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:21 -04:00
Mike Marciniszyn 60df29581f IB/hfi1: Fix PIO wakeup timing hole
There is a timing hole if there had been greater than
PIO_WAIT_BATCH_SIZE waiters.  This code will dispatch the first
batch but leave the others in the queue.   If the restarted waiters
don't in turn wait on a buffer, there is a hang.

Fix by forcing a return when the QP queue is non-empty.

Reviewed-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:20 -04:00
Mike Marciniszyn 5326dfbf00 IB/hfi1: Fix ordering of trace for accuracy
The postitioning of the sdma ibhdr trace was
causing an extra trace message when the tx send
returned -EBUSY.

Move the trace to just before the return
and handle negative return values to avoid
any trace.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:20 -04:00
Mike Marciniszyn 1db78eeebe IB/hfi1: Add unique trace point for pio and sdma send
This allows for separately enabling pio and sdma
tracepoints to cut the volume of trace information.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:19 -04:00
Mike Marciniszyn ef6d8c4ec8 IB/hfi1: Fix issues with qp_stats print
The changes are to aid in coorelating trace information
with QPs between the trace and qp_stats information

Such changes include adds a space after QP and clarifying that the second
QP is actually the remote QP.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:19 -04:00
Mike Marciniszyn ef086c0d5d IB/hfi1: Report pid in qp_stats to aid debug
Tracking user/QP ownership is needed to debug issues with
user ULPs like OpenMPI.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:19 -04:00
Easwar Hariharan 2243472e9d IB/hfi1: Improve LED beaconing
The current LED beaconing code is unclear and uses the timer handler to
turn off the timer. This patch simplifies the code by removing the
special semantics of timeon = timeoff = 0 being interpreted as a request
to turn off the beaconing.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:18 -04:00
Kaike Wan 831464ce4b IB/hfi1: Don't call cond_resched in atomic mode when sending packets
This patch fixed the problem where the driver might reschedule in atomic
mode when sending packets. This is due to the fact that the call to
cond_resched() in hfi1_do_send() might occur in atomic mode and a check is
required to avoid the warning message:
    "kernel: BUG: scheduling while atomic: swapper/2/0/0x10000100."

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:18 -04:00
Dean Luick 528ee9fbf0 IB/hfi1: Add adaptive cacheless verbs copy
The kernel memcpy is faster than a cacheless copy.  However,
if too much of the L3 cache is overwritten by one-time copies
then overall bandwidth suffers.  Implement an adaptive scheme
where full page copies are tracked and if the number of unique
entries are larger than a threshold, verbs will use a cacheless
copy.  Tracked entries are gradually cleaned, allowing memcpy to
resume once the larger copies have stopped.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:17 -04:00
Jubin John 8fefef125e IB/hfi1: Handle host handshake timeout
Host handshake timeout can occur during the verify capability
state. This is a LNI related failure and should be
handled in the same way as other LNI failures.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:17 -04:00
Dean Luick c9c8ea3d47 IB/hfi1: Add ASIC flag view/clear
Different OSes using parts of the same hardware may leave
cross-device flags set.  Export a debugfs file to view and
clear these flags if needed.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:17 -04:00
Dean Luick ae993e7fba IB/hfi1: Hold i2c resource across debugfs open/close
External i2c firmware updates are done in multiple steps and
cannot have other things done in between.  For debugfs files,
acquire the resource on open and release it on close.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:16 -04:00
Dean Luick b0506f4c56 IB/hfi1: Reduce hardware mutex timeout
The hardware mutex is now held only long enough to set
or clear flags.  Reduce the timeout to something more
reasonable.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:16 -04:00
Dean Luick 7a8f28ca3d IB/hfi1: Remove unused HFI1_DO_INIT_ASIC flag
The flag HFI1_DO_INIT_ASIC flag is no longer used.  Remove
the flag and the code that sets it.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:15 -04:00
Dean Luick a453698b52 IB/hfi1: Change thermal init to use resource reservation
Use the resource reservation system to flag that the ASIC
thermal has been initialized.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:15 -04:00
Dean Luick 765a6fac91 IB/hfi1: Change QSFP functions to use resource reservation
Remove the mutex guarding each operation in favor the ASIC
resource acquire/release.  Push the resource acquire/release,
above each operation call to allow exclusive access across
multiple operations.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:15 -04:00
Dean Luick 576531fde8 IB/hfi1: Change SBus handling to use resource reservation
The SBus resource includes SBUS, PCIE, and THERM registers.
Change SBus handling to use the new ASIC resource reservation system.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:14 -04:00
Dean Luick 60c708285c IB/hfi1: Change EPROM handling to use resource reservation
Change EPROM handling to use the new ASIC resource reservation system.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:14 -04:00
Dean Luick a2ee27a455 IB/hfi1: Add ASIC resource reservation functions
The ASIC block is a shared hardware resource between two devices
on the chip.  Add functions to acquire and release these resources
in a way that is safe for both multiple users on the same OS
and multiple users on different OSes, while holding the hardware
mutex as little as possible.

Reservations are noted in a scratch register in the shared region.
There are two types of reservations: per-HFI dynamic and permanent.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:13 -04:00
Dean Luick 78eb129d47 IB/hfi1: Add shared ASIC structure
Create a shared structure to exist between devices that share the
same ASIC.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:13 -04:00
Dean Luick 3afb6f637e IB/hfi1: Remove ASIC block clear
The ASIC block is shared between two HFIs. Individual devices
should not initialize registers there. Retain the power-on values.
Individual users set registers as needed with one exception.
Clear sbus fast mode on "slow" calls.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:12 -04:00
Harish Chegondi 2b8b34a948 IB/hfi1: Replace kmalloc and memcpy with a kmemdup
This change was recommended by Coccinelle tool when I ran the command:
-bash-4.2$ make coccicheck MODE=patch M=drivers/infiniband/hw/hfi1/

Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:12 -04:00
Harish Chegondi bf640096e6 IB/hfi1: Move constant to the right in bitwise operations
Implement changes recommended by the Coccinelle tool to move constant to
the right in bitwise operations

-bash-4.2$ make coccicheck MODE=report M=drivers/infiniband/hw/hfi1/

drivers/infiniband/hw/hfi1/pio.c:765:4-16: Move constant to right.
drivers/infiniband/hw/hfi1/rc.c:2503:19-29: Move constant to right.
drivers/infiniband/hw/hfi1/chip.c:9813:11-22: Move constant to right.
drivers/infiniband/hw/hfi1/chip.c:14468:29-40: Move constant to right.

Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:12 -04:00
Harish Chegondi 42d6ec19c9 IB/hfi1: Add the break statement that was removed in an earlier patch
The break statement was unintentionally removed in this patch
commit 41ca419abc0ca7ee65d765408cdc1a7fed2897a3
("staging/rdma/hfi1: Remove hfi1 MR and hfi1 specific qp type")

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:11 -04:00
Amitoj Kaur Chawla 437b29d115 staging: rdma: hfi1: file_ops: Replace ALIGN with PAGE_ALIGN
mm.h contains a helper function PAGE_ALIGN which aligns the pointer
to the page boundary instead of using ALIGN(expression, PAGE_SIZE)

This change was made with the help of the following Coccinelle
semantic patch:
//<smpl>
@@
expression e;
symbol PAGE_SIZE;
@@
(
- ALIGN(e, PAGE_SIZE)
+ PAGE_ALIGN(e)
|
- IS_ALIGNED(e, PAGE_SIZE)
+ PAGE_ALIGNED(e)
)
//</smpl>

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-11 22:09:09 -08:00
Amitoj Kaur Chawla f538966081 staging: rdma: hfi1: driver: Replace IS_ALIGNED with PAGE_ALIGNED
mm.h contains a helper function PAGE_ALIGNED which tests whether
an address is aligned to PAGE_SIZE instead of using
IS_ALIGNED(expression, PAGE_SIZE)

This change was made with the help of the following Coccinelle
semantic patch:
//<smpl>
@@
expression e;
symbol PAGE_SIZE;
@@
(
- ALIGN(e, PAGE_SIZE)
+ PAGE_ALIGN(e)
|
- IS_ALIGNED(e, PAGE_SIZE)
+ PAGE_ALIGNED(e)
)
//</smpl>

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-11 22:09:09 -08:00
Amitoj Kaur Chawla 8444991755 staging: rdma: hfi1: Replace ALIGN with PAGE_ALIGN
mm.h contains a helper function PAGE_ALIGN which aligns the pointer
to the page boundary instead of using ALIGN(expression, PAGE_SIZE)

This change was made with the help of the following Coccinelle
semantic patch:
//<smpl>
@@
expression e;
symbol PAGE_SIZE;
@@
(
- ALIGN(e, PAGE_SIZE)
+ PAGE_ALIGN(e)
|
- IS_ALIGNED(e, PAGE_SIZE)
+ PAGE_ALIGNED(e)
)
//</smpl>

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-11 22:09:09 -08:00
Bhumika Goyal c754db403d Staging: rdma: Use min macro instead of ternary operator
This patch replaces ternary operator with macro min as it shorter and
thus increases code readability. Macro min return the minimum of the
two compared values.
Made a semantic patch for changes:

@@
type T;
T x;
T y;
@@
(
- x < y ? x : y
+ min(x,y)
|
- x > y ? x : y
+ max(x,y)
)

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-11 22:09:09 -08:00
Janani Ravichandran 16ccad0475 staging: rdma: hfi1: user_sdma.c: Drop void pointer cast
Void pointers need not be cast to other pointer types.
Semantic patch used:

@r@
expression x;
void *e;
type T;
identifier f;
@@

(
  *((T *)e)
|
  ((T *)x) [...]
|
  ((T *)x)->f
|
- (T *)
  e
)

Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-11 22:09:09 -08:00
Bhaktipriya Shridhar acc17d671c staging: rdma: hfi1: Remove unnecessary parantheses
Removed parantheses on the right hand side of assignments as they are not
needed. Coccinelle patch used:

@@ expression a, b; @@

a = &
-(
b
-)

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-11 22:09:09 -08:00
Bhaktipriya Shridhar a4fe1bc164 staging: rdma: hfi1: Remove casts of pointer to same type
Casting a pointer to a pointer of the same type is unnecessary, so
remove these unnecessary casts.

This was done with Coccinelle:

@@
type T;
T *ptr;
@@
- (T *)ptr
+ ptr

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-11 22:09:09 -08:00