In preparation to support the new OPA MAD Base version, add a base version
parameter to ib_create_send_mad and set it to IB_MGMT_BASE_VERSION for current
users.
Definition of the new base version and it's processing will occur in later
patches.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This driver already makes use of ioremap_wc() on PIO buffers,
so convert it to use arch_phys_wc_add().
The qib driver uses a mmap() special case for when PAT is
not used, this behaviour used to be determined with a
module parameter but since we have been asked to just
remove that module parameter this checks for the WC cookie,
if not set we can assume PAT was used. If its set we do
what we used to do for the mmap for when MTRR was enabled.
The removal of the module parameter is OK given that Andy
notes that even if users of module parameter are still around
it will not prevent loading of the module on recent kernels.
Cc: Doug Ledford <dledford@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: infinipath@intel.com
Cc: linux-rdma@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Upstream checkpatch now requires this.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add support to recognize another board variation named QMH7360.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This changeset removes all the code that allows the driver to write to
the EEPROM and update the recorded error counters and power on hours.
These two stats are unused and writing them exposes a timing risk
which could leave the EEPROM in a bad state preventing further normal
operation of the HCA.
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The counters, unicast_xmit, unicast_rcv, multicast_xmit, multicast_rcv
are now maintained as percpu variables.
The mad code is modified to add a z_ latch so that the percpu counters
monotonically increase with appropriate adjustments in the reset,
read logic to maintain the z_ latch.
This patch also corrects the fact the unitcast_xmit wasn't handled
at all for UC and RC QPs.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch replaces the dd->int_counter with a percpu counter.
The maintanance of qib_stats.sps_ints and int_counter are
combined into the new counter.
There are two new functions added to read the counter:
- qib_int_counter (for a particular qib_devdata)
- qib_sps_ints (for all HCAs)
A z_int_counter is added to allow the interrupt detection logic
to determine if interrupts have occured since z_int_counter
was "reset".
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Research has shown that commit a77fcf8950 ("IB/qib: Use a single
txselect module parameter for serdes tuning") missed a key serdes init
sequence.
This patch add that sequence.
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 7fac33014f54("IB/qib: checkpatch fixes") was overzealous in
removing a simple_strtoul for a parse routine, setup_txselect(). That
routine is required to handle a multi-value string.
Unwind that aspect of the fix.
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 0b3ddf380c ("Log all SDMA errors unconditionally") missed
part of the patch.
This also corrects a format warning when dma_addr_t is 32 bits
on a 64 bit system.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds code to log SDMA errors for supportability purposes.
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The Dell blade chassis got an updated backplane which requires new
transmitter tuning settings.
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds DCA cache warming for systems that support DCA.
The code uses cpu affinity notification to react to an affinity change
from a user mode program like irqbalance and (re-)program the chip
accordingly. This notification avoids reading the current cpu on every
interrupt.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
[ Add Kconfig dependency on SMP && GENERIC_HARDIRQS to avoid failure to
build due to undefined struct irq_affinity_notify. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Convert a 0 error return code to a negative one, as returned elsewhere
in the function.
A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
identifier ret;
expression e,e1,e2,e3,e4,x;
@@
(
if (\(ret != 0\|ret < 0\) || ...) { ... return ...; }
|
ret = 0
)
... when != ret = e1
*x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...);
... when != x = e2
when != ret = e3
*if (x == NULL || ...)
{
... when != ret = e4
* return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Elminate some simple_strto* usage.
checkpatch also noted pr_ conversations, which have been done as
recommended. The pr_fmt() define is used to shorten line length.
Other multi-line string warnings are also elmininated.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit af061a644a ("IB/qib: Use RCU for qpn lookup") introduced sparse
warnings.
This patch corrects those issues.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The SERDES was using the incorrect Frequency Loop Bandwidth setting
causing the link to cycle through the Physical link negotiation state
machine. Fixing the Frequency Loop Bandwidth setting in the SERDES
helps the link come up faster and more reliably.
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch optimizes pio buffer allocation in the kernel.
For qib, kernel pio buffers are used for sending acks. The code to
allocate the buffer would always start at 0 until it found a buffer.
This means that an average of 64 comparisions were done on each
allocate, since the busy bit won't be cleared until the bits are
refreshed when buffers are exhausted.
This patch adds two new fields in the devdata struct, last_pio and
min_kernel_pio. last_pio is the last buffer that was allocated.
min_kernel_pio is the lowest potential available buffer.
min_kernel_pio is modifed as contexts are allocated and deallocted.
Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Call irq_set_affinity_hint() to give userspace programs such as
irqbalance the information to be able to distribute qib interrupts
appropriately.
The logic allocates all non-receive interrupts to the first CPU local
to the HCA. Receive interrupts are allocated round robin starting
with the second CPU local to the HCA with potential wrap back to the
second CPU.
This patch also adds a refinement to the name registered for MSI-X
interrupts so that user level scripts can determine the device
associated with the IRQs when there are multiple HCAs with a
potentially different set of local CPUs.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Minimize the need for users to have to set module parameters to get
good performance.
The following two parameters are changed:
- rcvhdrcnt to twice the rcvegrcnt
- pcie_caps=0x51
The rcvhdrcnt at twice the egrcount allows the preemptive NAK code
during reception to function in 100% of the cases rather than a sender
jiffies-based timeout.
The pcie_caps default of 0x51 will set the proposed MaxPayload and
MaxReceiveReqest to 256 and 4096 respectively. The capabilities on
the root complex will be used to limit those values.
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Prevent a receive data corruption by ensuring that the write to update
the rcvhdrheadn register to generate an interrupt is at the very end
of the receive processing.
Signed-off-by: Ramkrishna Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The qib driver makes use of the the 64-bit jiffies API.
Code inspection reveals that that version of the API is not really
required. This patch converts to use the "normal" jiffies.
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Don't over-schedule QSFP work on driver initialization. It could end
up being run simultaneously on two different CPUs resulting in bad
EEPROM reads. In combination with setting the physical IB link state
prior to the IBC being brought out of reset, this can cause the link
state machine to start training early with wrong settings.
Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
It was mistakenly introduced by dde05cbdf8 ("IB/qib: Hold links
until tuning data is available").
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Fix an issue where the link would come up after replugging a cable
even if it has been DISABLED manually.
Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Hold the link state machine until the tuning data is read from the
QSFP EEPROM so correct tuning settings are applied before the state
machine attempts to bring the link up. Link is also held on cable
unplug in case a different cable is used.
Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This was probably present from initial submission.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The context init now saves a shift from rcvegrbufs_perchunk
rcvegrbufs_perchunk_shift using ilog2. A BUG_ON() protects the
power of 2 assumption.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
With ib_qib options:
options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4
a run of ib_write_bw -a yields the following:
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
1048576 5000 2910.64 229.80
------------------------------------------------------------------
The top cpu use in a profile is:
CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
of 0x00 (No unit mask) count 1002300
Counted LLC_MISSES events (Last level cache demand requests from this core that
missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples % samples % app name symbol name
15237 29.2642 964 17.1195 ib_qib.ko qib_7322intr
12320 23.6618 1040 18.4692 ib_qib.ko handle_7322_errors
4106 7.8860 0 0 vmlinux vsnprintf
Analysis of the stats, profile, the code, and the annotated profile indicate:
- All of the overflow interrupts (one per packet overflow) are
serviced on CPU0 with no mitigation on the frequency.
- All of the receive interrupts are being serviced by CPU0. (That is
the way truescale.cmds statically allocates the kctx IRQs to CPU)
- The code is spending all of its time servicing QIB_I_C_ERROR
RcvEgrFullErr interrupts on CPU0, starving the packet receive
processing.
- The decode_err routine is very inefficient, using a printf variant
to format a "%s" and continues to loop when the errs mask has been
cleared.
- Both qib_7322intr and handle_7322_errors read pci registers, which
is very inefficient.
The fix does the following:
- Adds a tasklet to service QIB_I_C_ERROR
- Replaces the very inefficient scnprintf() with a memcpy(). A field
is added to qib_hwerror_msgs to save the sizeof("string") at
compile time so that a strlen is not needed during err_decode().
- The most frequent errors (Overflows) are serviced first to exit the
loop as early as possible.
- The loop now exits as soon as the errs mask is clear rather than
fruitlessly looping through the msp array.
With this fix the performance changes to:
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
1048576 5000 2990.64 2941.35
------------------------------------------------------------------
During testing of the error handling overflow patch, it was determined
that some CPU's were slower when servicing both overflow and receive
interrupts on CPU0 with different MSI interrupt vectors.
This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
interrupt for kctx's < 2 and to service them on the default interrupt.
For some CPUs, the cost of the interrupt enter/exit is more costly
than then the additional PCI read in the default handler.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Due to timing, it is possible for the LOS and DFE to remain on. This
is due to the link progressing to LinkUP prior to the driver getting
the first Status Changed interrupt. By expanding the conditions under
which LOS is turned off and DFE timeout is being set, timing is no
longer an issue.
Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The time limit test now correctly checks against current jiffies to
avoid the hang.
Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
For active and far-EQ cables use an LE2 value of 0 for improved SI.
Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Hold the IB link at DISABLED until we get the correct TX settings
on mezz boards.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
RDMA: Update workqueue usage
RDMA/nes: Fix incorrect SFP+ link status detection on driver init
RDMA/nes: Fix SFP+ link down detection issue with switch port disable
RDMA/nes: Generate IB_EVENT_PORT_ERR/PORT_ACTIVE events
RDMA/nes: Fix bonding on iw_nes
IB/srp: Test only once whether iu allocation succeeded
IB/mlx4: Handle protocol field in multicast table
RDMA: Use vzalloc() to replace vmalloc()+memset(0)
mlx4_{core, ib, en}: Fix driver when sizeof (phys_addr_t) > sizeof (long)
IB/mthca: Fix driver when sizeof (phys_addr_t) > sizeof (long)
* ib_wq is added, which is used as the common workqueue for infiniband
instead of the system workqueue. All system workqueue usages
including flush_scheduled_work() callers are converted to use and
flush ib_wq.
* cancel_delayed_work() + flush_scheduled_work() converted to
cancel_delayed_work_sync().
* qib_wq is removed and ib_wq is used instead.
This is to prepare for deprecation of flush_scheduled_work().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Improve the QMH SERDES tunning on initial driver load by having the
driver go through a link state change.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The basic idea is that on SusieQ, the difficult part of mapping QPN to
context is handled by the mapping registers so the generic QPN
allocation doesn't need to worry about chip specifics. For Monty and
Linda, there is no mapping table so the qpt->mask (same as
dd->qpn_mask), is used to see if the QPN to context falls within
[zero..dd->n_krcv_queues).
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
For SusieQ we need to write to the interrupt timer register before
updating the header queue head with interrupt count. This is to
ensure that the timer is enabled properly and a receive available
interrupt is delivered. Otherwise this interrupt can be lost if the
receiver header/eager queues are full before the timer is enabled.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add new SERDES tuning to aid manufacturing.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Implement new SERDES initialization routine and improvements to signal
integrity -- disable LE1 adaptation, disable LOS after link-up, set
better SERDES parameters.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support to recognize another board variation named QME7362.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The receive header queue sizes need to modified for performance
tuning. Three module parameters are added to support this.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>