Commit Graph

1986 Commits

Author SHA1 Message Date
Jack Morgenstein a0c64a17ab mlx4: Add alias_guid mechanism
For IB ports, we paravirtualize the GUID at index 0 on slaves.  The
GUID at index 0 seen by a slave is the actual GUID occupying the GUID
table at the slave-id index.

The driver, by default, requests at startup time that subnet manager
populate its entire guid table with GUIDs. These guids are then mapped
(paravirtualized) to the slaves, and appear for each slave as its GUID
at index 0.

Until each slave has such a guid, its port status is DOWN.

The guid table is cached to support special QP paravirtualization, and
event propagation to slaves on guid change (we test to see if the guid
really changed before propagating an event to the slave).

To support this caching, add capability to __mlx4_ib_query_gid() to
obtain the network view (i.e., physical view) gid at index X, not just
the host (paravirtualized) view.

Based on a patch from Erez Shitrit <erezsh@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:37 -07:00
Amir Vadai 3cf69cc8db IB/mlx4: Add CM paravirtualization
In CM para-virtualization:

1. Incoming requests are steered to the correct vHCA according to the
   embedded GID.
2. Communication IDs on outgoing requests are replaced by a globally
   unique ID, generated by the PPF, since there is no synchronization
   of ID generation between guests (and so these IDs are not
   guaranteed to be globally unique).  The guest's comm ID is stored,
   and is returned to the response MAD when it arrives.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:36 -07:00
Oren Duer b9c5d6a643 IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV
MCG paravirtualization support includes:
- Creating multicast groups by VFs, and keeping accounting of them
- Leaving multicast groups by VFs
- Updating SM only with real changes in the overall picture of MCGs status
- Creation of MGID=0 groups (let SM choose MGID)

Note that the MCG module maintains its own internal MCG object
reference counts.  The reason for this is that the IB core is used to
track only the multicast groups joins generated by the PF it runs
over.  The PF IB core layer is unaware of slaves, so it cannot be used
to keep track of MCG joins they generate.

Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:35 -07:00
Jack Morgenstein 0a9a01884d mlx4: MAD_IFC paravirtualization
The MAD_IFC firmware command fulfills two functions.

First, it is used in the QP0/QP1 MAD-handling flow to obtain
information from the FW (for answering queries), and for setting
variables in the HCA (MAD SET packets).

For this, MAD_IFC should provide the FW (physical) view of the data.
This is the view that OpenSM needs.  We call this the "network view".

In the second case, MAD_IFC is used by various verbs to obtain data
regarding the local HCA (e.g., ib_query_device()).  We call this the
"host view".

This data needs to be paravirtualized.

MAD_IFC therefore needs a wrapper function, and also needs another
flag indicating whether it should provide the network view (when it is
called by ib_process_mad in special-qp packet handling), or the host
view (when it is called while implementing a verb).

There are currently 2 flag parameters in mlx4_MAD_IFC already:
ignore_bkey and ignore_mkey.  These two parameters are replaced by a
single "mad_ifc_flags" parameter, with different bits set for each
flag.  A third flag is added: "network-view/host-view".

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:34 -07:00
Jack Morgenstein 37bfc7c1e8 IB/mlx4: SR-IOV multiplex and demultiplex MADs
Special QPs are paravirtualized.

vHCAs are not given direct access to QP0/1. Rather, these QPs are
operated by a special context hosted by the PF, which mediates access
to/from vHCAs.  This is done by opening a "tunnel" per vHCA port per
QP0/1. A tunnel comprises a pair of UD QPs: a "Tunnel QP" in the
PF-context and a "Proxy QP" in the vHCA.  All vHCA MAD traffic must
pass through the corresponding tunnel.  vHCA QPs cannot be assigned to
VL15 and are denied of the well-known QKey.

Outgoing messages are "de-multiplexed" (i.e., directed to the wire via
the real special QP).

Incoming messages are "multiplexed" (i.e. steered by the PPF to the
correct VF or to the PF)

QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual)
QP0s, but they never receive any SMPs and all SMPs sent are discarded.
QP1 traffic is allowed for all vHCAs, but special care is required to
bridge the gap between the host and network views.

Specifically:
- Transaction IDs are mapped to guarantee uniqueness among vHCAs
- CM para-virtualization
  o   Incoming requests are steered to the correct vHCA according to the embedded GID
  o   Local communication IDs are mapped to ensure uniqueness among vHCAs
  (see the patch that adds CM paravirtualization.)
- Multicast para-virtualization
  o   The PF context aggregates membership state from all vHCAs
  o   The SA is contacted only when the aggregate membership changes
  o   If the aggregate does not change, the PF context will provide the
      requesting vHCA with the proper response.
  (see the patch that adds multicast group paravirtualization)

Incoming MADs are steered according to:
- the DGID If a GRH is present
- the mapped transaction ID for response MADs
- the embedded GID in CM requests
- the remote communication ID in other CM messages

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:34 -07:00
Jack Morgenstein 54679e1482 mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop
This requires:

1. Replacing the paravirtualized P_Key index (inserted by the guest)
   with the real P_Key index.

2. For UD QPs, placing the guest's true source GID index in the
   address path structure mgid field, and setting the ud_force_mgid
   bit so that the mgid is taken from the QP context and not from the
   WQE when posting sends.

3. For UC and RC QPs, placing the guest's true source GID index in the
   address path structure mgid field.

4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
   proxy/tunnel pair.

Since not all the above adjustments occur in all the QP transitions,
the QP transitions require separate wrapper functions.

Secondly, initialize the P_Key virtualization table to its default
values: Master virtualized table is 1-1 with the real P_Key table,
guest virtualized table has P_Key index 0 mapped to the real P_Key
index 0, and all the other P_Key indices mapped to the reserved
(invalid) P_Key at index 127.

Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
and generating events on the master only if a P_Key actually changed.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:33 -07:00
Jack Morgenstein fc06573dfa IB/mlx4: Initialize SR-IOV IB support for slaves in master context
Allocate SR-IOV paravirtualization resources and MAD demuxing contexts
on the master.

This has two parts.  The first part is to initialize the structures to
contain the contexts.  This is done at master startup time in
mlx4_ib_init_sriov().

The second part is to actually create the tunneling resources required
on the master to support a slave.  This is performed the master
detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event
generated when a slave initializes its comm channel).

For the master, there is no such startup event, so it creates its own
tunneling resources when it starts up.  In addition, the master also
creates the real special QPs.  The ib_core layer on the master causes
creation of proxy special QPs, since the master is also
paravirtualized at the ib_core layer.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:32 -07:00
Jack Morgenstein 1ffeb2eb8b IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support
1. Introduce the basic SR-IOV parvirtualization context objects for
   multiplexing and demultiplexing MADs.
2. Introduce support for the new proxy and tunnel QP types.

This patch introduces the objects required by the master for managing
QP paravirtualization for guests.

struct mlx4_ib_sriov is created by the master only.
It is a container for the following:

1. All the info required by the PPF to multiplex and de-multiplex MADs
   (including those from the PF). (struct mlx4_ib_demux_ctx demux)
2. All the info required to manage alias GUIDs (i.e., the GUID at
   index 0 that each guest perceives.  In fact, this is not the GUID
   which is actually at index 0, but is, in fact, the GUID which is at
   index[<VF number>] in the physical table.
3. structures which are used to manage CM paravirtualization
4. structures for managing the real special QPs when running in SR-IOV
   mode.  The real SQPs are controlled by the PPF in this case.  All
   SQPs created and controlled by the ib core layer are proxy SQP.

struct mlx4_ib_demux_ctx contains the information per port needed
to manage paravirtualization:

1. All multicast paravirt info
2. All tunnel-qp paravirt info for the port.
3. GUID-table and GUID-prefix for the port
4. work queues.

struct mlx4_ib_demux_pv_ctx contains all the info for managing the
paravirtualized QPs for one slave/port.

struct mlx4_ib_demux_pv_qp contains the info need to run an individual
QP (either tunnel qp or real SQP).

Note:  We made use of the 2 most significant bits in enum
mlx4_ib_qp_flags (based on enum ib_qp_create_flags in ib_verbs.h).
We need these bits in the low-level driver for internal purposes.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:30 -07:00
Roland Dreier 2116fe4e8b Merge branches 'cxgb4', 'ipoib', 'mlx4', 'ocrdma' and 'qib' into for-next 2012-09-14 10:42:52 -07:00
Mike Marciniszyn 4c3550057b IB/qib: Fix failure of compliance test C14-024#06_LocalPortNum
Commit 3236b2d469 ("IB/qib: MADs with misset M_Keys should return
failure") introduced a return code assignment that unfortunately
introduced an unconditional exit for the routine due to missing braces.

This patch adds the braces to correct the original patch.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-14 10:42:32 -07:00
Parav Pandit ae3bca90e9 RDMA/ocrdma: Fix CQE expansion of unsignaled WQE
Fix CQE expansion of unsignaled WQE -- don't expand the CQE when the
WQE index of the completed CQE matches with last pending WQE (tail) in
the queue.

Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-14 10:40:58 -07:00
Wei Yongjun 92dd6c3d4d RDMA/cxgb4: Move dereference below NULL test
spatch with a semantic match is used to found this.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-07 16:19:03 -07:00
Roland Dreier c0369b296e Merge branches 'cma', 'ipoib', 'misc', 'mlx4', 'ocrdma', 'qib' and 'srp' into for-next 2012-08-16 09:38:39 -07:00
Kleber Sacilotto de Souza a0675a386a IB/mlx4: Check iboe netdev pointer before dereferencing it
Unlike other parts of the mlx4_ib code, the function build_mlx_header()
doesn't check if the iboe netdev of the given port is valid before
dereferencing it, which can cause a crash if the ethernet interface
has already been taken down.

Fix this by checking for a valid netdev pointer before using it to get
the port MAC address.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-16 09:38:19 -07:00
Julia Lawall 51fa3ca37e IB/qib: Fix error return code in qib_init_7322_variables()
Convert a 0 error return code to a negative one, as returned elsewhere
in the function.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret;
expression e,e1,e2,e3,e4,x;
@@

(
if (\(ret != 0\|ret < 0\) || ...) { ... return ...; }
|
ret = 0
)
... when != ret = e1
*x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...);
... when != x = e2
    when != ret = e3
*if (x == NULL || ...)
{
  ... when != ret = e4
*  return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 11:58:21 -07:00
Masanari Iida 142ad5db2b IB: Fix typos in infiniband drivers
Correct spelling typos in comments in drivers/infiniband.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 11:56:19 -07:00
Roland Dreier d549f55f2e RDMA/ocrdma: Don't call vlan_dev_real_dev() for non-VLAN netdevs
If CONFIG_VLAN_8021Q is not set, then vlan_dev_real_dev() just goes BUG(),
so we shouldn't call it unless we're actually dealing with a VLAN netdev.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-10 16:52:13 -07:00
Jack Morgenstein df7fba6647 IB/mlx4: Fix possible deadlock on sm_lock spinlock
The sm_lock spinlock is taken in the process context by
mlx4_ib_modify_device, and in the interrupt context by update_sm_ah,
so we need to take that spinlock with irqsave, and release it with
irqrestore.

Lockdeps reports this as follows:

    [ INFO: inconsistent lock state ]
    3.5.0+ #20 Not tainted
    inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
    swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
    (&(&ibdev->sm_lock)->rlock){?.+...}, at: [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib]
    {HARDIRQ-ON-W} state was registered at:
      [<ffffffff810b84a0>] mark_irqflags+0x120/0x190
      [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0
      [<ffffffff810b9f51>] lock_acquire+0xb1/0x150
      [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50
      [<ffffffffa028d563>] mlx4_ib_modify_device+0x63/0x240 [mlx4_ib]
      [<ffffffffa026d1fc>] ib_modify_device+0x1c/0x20 [ib_core]
      [<ffffffffa026c353>] set_node_desc+0x83/0xc0 [ib_core]
      [<ffffffff8136a150>] dev_attr_store+0x20/0x30
      [<ffffffff81201fd6>] sysfs_write_file+0xe6/0x170
      [<ffffffff8118da38>] vfs_write+0xc8/0x190
      [<ffffffff8118dc01>] sys_write+0x51/0x90
      [<ffffffff8155b869>] system_call_fastpath+0x16/0x1b

    ...
    *** DEADLOCK ***

    1 lock held by swapper/0/0:

    stack backtrace:
    Pid: 0, comm: swapper/0 Not tainted 3.5.0+ #20
    Call Trace:
    <IRQ>  [<ffffffff810b7bea>] print_usage_bug+0x18a/0x190
    [<ffffffff810b7370>] ? print_irq_inversion_bug+0x210/0x210
    [<ffffffff810b7fb2>] mark_lock_irq+0xf2/0x280
    [<ffffffff810b8290>] mark_lock+0x150/0x240
    [<ffffffff810b84ef>] mark_irqflags+0x16f/0x190
    [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0
    [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
    [<ffffffff810b9f51>] lock_acquire+0xb1/0x150
    [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
    [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50
    [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
    [<ffffffffa026b2fa>] ? ib_create_ah+0x1a/0x40 [ib_core]
    [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib]
    [<ffffffff810c27c3>] ? is_module_address+0x23/0x30
    [<ffffffffa028b05b>] handle_port_mgmt_change_event+0xeb/0x150 [mlx4_ib]
    [<ffffffffa028c177>] mlx4_ib_event+0x117/0x160 [mlx4_ib]
    [<ffffffff81552501>] ? _raw_spin_lock_irqsave+0x61/0x70
    [<ffffffffa022718c>] mlx4_dispatch_event+0x6c/0x90 [mlx4_core]
    [<ffffffffa0221b40>] mlx4_eq_int+0x500/0x950 [mlx4_core]

Reported by: Or Gerlitz <ogerlitz@mellanox.com>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-10 13:02:24 -07:00
Roland Dreier 1da9b6b43e Merge branches 'cma', 'ipoib', 'ocrdma' and 'qib' into for-next 2012-07-30 07:47:27 -07:00
Mike Marciniszyn 5d7fe4efbf IB/qib: Fix size of cc_supported_table_entries
Commit 36a8f01cd2 ("IB/qib: Add congestion control agent
implementation") tries to store the value 1984 in a u8, which leads to
truncation.  Fix this by making the member big enough.

This bug was detected by a smatch warning.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-29 20:26:10 -07:00
Roland Dreier 9e8fa040cb RDMA/ocrdma: Fix check of GSI CQs
It looks like one check was accidentally duplicated, and the other 3
checks were left out.  This was detected by scripts/coccinelle/tests/doubletest.cocci:

    drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:895:6-54: duplicated argument to && or ||

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-27 13:14:44 -07:00
Linus Torvalds 5dedb9f3bd InfiniBand/RDMA changes for the 3.6 merge window:
- Updates to the qib low-level driver
  - First chunk of changes for SR-IOV support for mlx4 IB
  - RDMA CM support for IPv6-only binding
  - Other misc cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
 6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
 FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
 a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
 ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
 xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
 wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
 ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
 goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
 +JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
 ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
 I7h9iJ1OMKFZ8bzmHsWb
 =f09j
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Updates to the qib low-level driver
 - First chunk of changes for SR-IOV support for mlx4 IB
 - RDMA CM support for IPv6-only binding
 - Other misc cleanups and fixes

Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c

* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
  IB/qib: checkpatch fixes
  IB/qib: Add congestion control agent implementation
  IB/qib: Reduce sdma_lock contention
  IB/qib: Fix an incorrect log message
  IB/qib: Fix QP RCU sparse warnings
  mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
  mlx4_core: Allow guests to have IB ports
  mlx4_core: Implement mechanism for reserved Q_Keys
  net/mlx4_core: Free ICM table in case of error
  IB/cm: Destroy idr as part of the module init error flow
  mlx4_core: Remove double function declarations
  IB/mlx4: Fill the masked_atomic_cap attribute in query device
  IB/mthca: Fill in sq_sig_type in query QP
  IB/mthca: Warning about event for non-existent QPs should show event type
  IB/qib: Fix sparse RCU warnings in qib_keys.c
  net/mlx4_core: Initialize IB port capabilities for all slaves
  mlx4: Use port management change event instead of smp_snoop
  IB/qib: RCU locking for MR validation
  IB/qib: Avoid returning EBUSY from MR deregister
  IB/qib: Fix UC MR refs for immediate operations
  ...
2012-07-24 13:56:26 -07:00
Roland Dreier 089117e1ad Merge branches 'cma', 'cxgb4', 'misc', 'mlx4-sriov', 'mlx-cleanups', 'ocrdma' and 'qib' into for-linus 2012-07-22 23:26:17 -07:00
Mike Marciniszyn 7fac33014f IB/qib: checkpatch fixes
Elminate some simple_strto* usage.

checkpatch also noted pr_ conversations, which have been done as
recommended.  The pr_fmt() define is used to shorten line length.

Other multi-line string warnings are also elmininated.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19 11:20:04 -07:00
Mike Marciniszyn 36a8f01cd2 IB/qib: Add congestion control agent implementation
Add a congestion control agent in the driver that handles gets and
sets from the congestion control manager in the fabric for the
Performance Scale Messaging (PSM) library.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19 11:20:04 -07:00
Mike Marciniszyn 551ace124d IB/qib: Reduce sdma_lock contention
Profiling has shown that sdma_lock is proving a bottleneck for
performance. The situations include:
 - RDMA reads when krcvqs > 1
 - post sends from multiple threads

For RDMA read the current global qib_wq mechanism runs on all CPUs
and contends for the sdma_lock when multiple RMDA read requests are
fielded on differenct CPUs. For post sends, the direct call to
qib_do_send() from multiple threads causes the contention.

Since the sdma mechanism is per port, this fix converts the existing
workqueue to a per port single thread workqueue to reduce the lock
contention in the RDMA read case, and for any other case where the QP
is scheduled via the workqueue mechanism from more than 1 CPU.

For the post send case, This patch modifies the post send code to test
for a non empty sdma engine.  If the sdma is not idle the (now single
thread) workqueue will be used to trigger the send engine instead of
the direct call to qib_do_send().

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19 11:19:58 -07:00
Betty Dall f3331f88a4 IB/qib: Fix an incorrect log message
There is a cut-and-paste typo in the function qib_pci_slot_reset()
where it prints that the "link_reset" function is called rather than
the "slot_reset" function.  This makes the message misleading.

Signed-off-by: Betty Dall <betty.dall@hp.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19 11:19:08 -07:00
Amir Vadai d9236c3f10 {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq
Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap.
If supplied, the assigned IRQ is tracked using rmap infrastructure.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Mike Marciniszyn 1fb9fed6d4 IB/qib: Fix QP RCU sparse warnings
Commit af061a644a ("IB/qib: Use RCU for qpn lookup") introduced sparse
warnings.

This patch corrects those issues.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-17 10:18:37 -07:00
Jack Morgenstein 6634961c14 mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
To allow easy paravirtualization of P_Key and GID table sizes, keep
paravirtualized sizes in mlx4_dev->caps, but save the actual physical
sizes from FW in struct: mlx4_dev->phys_cap.

In addition, in SR-IOV mode, do the following:

1. Reduce reported P_Key table size by 1.
   This is done to reserve the highest P_Key index for internal use,
   for declaring an invalid P_Key in P_Key paravirtualization.
   We require a P_Key index which always contain an invalid P_Key
   value for this purpose (i.e., one which cannot be modified by
   the subnet manager).  The way to do this is to reduce the
   P_Key table size reported to the subnet manager by 1, so that
   it will not attempt to access the P_Key at index #127.

2. Paravirtualize the GID table size to 1. Thus, each guest sees
   only a single GID (at its paravirtualized index 0).

In addition, since we are paravirtualizing the GID table size to 1, we
add paravirtualization of the master GID event here (i.e., we do not
do ib_dispatch_event() for the GUID change event on the master, since
its (only) GUID never changes).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:52:23 -07:00
Dotan Barak 47e956b2a6 IB/mlx4: Fill the masked_atomic_cap attribute in query device
When the user queries for device capabilities, fill in the
masked_atomic_cap attribute with the real support level of atomic
capabilities instead of using a hard coded value.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 09:22:57 -07:00
Dotan Barak 16551d4501 IB/mthca: Fill in sq_sig_type in query QP
The query QP code was didn't fill that attribute, do that.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 09:22:57 -07:00
Dotan Barak 9bbeb6663e IB/mthca: Warning about event for non-existent QPs should show event type
Events received for non-existent QPs should generate a warning that includes
the event type that was received.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 09:22:57 -07:00
Mike Marciniszyn 7e23017704 IB/qib: Fix sparse RCU warnings in qib_keys.c
Commit 8aac4cc3a9 ("IB/qib: RCU locking for MR validation") introduced
new sparse warnings in qib_keys.c.

Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 10:01:56 -07:00
Jack Morgenstein 00f5ce99dc mlx4: Use port management change event instead of smp_snoop
The port management change event can replace smp_snoop.  If the
capability bit for this event is set in dev-caps, the event is used
(by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
mask in the MAP_EQ fw command).  In this case, when the driver passes
incoming SMP PORT_INFO SET mads to the FW, the FW generates port
management change events to signal any changes to the driver.

If the FW generates these events, smp_snoop shouldn't be invoked in
ib_process_mad(), or duplicate events will occur (once from the
FW-generated event, and once from smp_snoop).

In the case where the FW does not generate port management change
events smp_snoop needs to be invoked to create these events.  The flow
in smp_snoop has been modified to make use of the same procedures as
in the fw-generated-event event case to generate the port management
events (LID change, Client-rereg, Pkey change, and/or GID change).

Port management change event handling required changing the
mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
(last argument) had to be changed to unsigned long in order to
accomodate passing the EQE pointer.

We also needed to move the definition of struct mlx4_eqe from
net/mlx4.h to file device.h -- to make it available to the IB driver,
to handle port management change events.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 09:47:10 -07:00
Mike Marciniszyn 8aac4cc3a9 IB/qib: RCU locking for MR validation
Profiling indicates that MR validation locking is expensive.  The MR
table is largely read-only and is a suitable candidate for RCU locking.

The patch uses RCU locking during validation to eliminate one
lock/unlock during that validation.

Reviewed-by: Mike Heinz <michael.william.heinz@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:19 -07:00
Mike Marciniszyn 6a82649f21 IB/qib: Avoid returning EBUSY from MR deregister
A timing issue can occur where qib_mr_dereg can return -EBUSY if the
MR use count is not zero.

This can occur if the MR is de-registered while RDMA read response
packets are being progressed from the SDMA ring.  The suspicion is
that the peer sent an RDMA read request, which has already been copied
across to the peer.  The peer sees the completion of his request and
then communicates to the responder that the MR is not needed any
longer.  The responder tries to de-register the MR, catching some
responses remaining in the SDMA ring holding the MR use count.

The code now uses a get/put paradigm to track MR use counts and
coordinates with the MR de-registration process using a completion
when the count has reached zero.  A timeout on the delay is in place
to catch other EBUSY issues.

The reference count protocol is as follows:
- The return to the user counts as 1
- A reference from the lk_table or the qib_ibdev counts as 1.
- Transient I/O operations increase/decrease as necessary

A lot of code duplication has been folded into the new routines
init_qib_mregion() and deinit_qib_mregion().  Additionally, explicit
initialization of fields to zero is now handled by kzalloc().

Also, duplicated code 'while.*num_sge' that decrements reference
counts have been consolidated in qib_put_ss().

Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:19 -07:00
Mike Marciniszyn 354dff1bd8 IB/qib: Fix UC MR refs for immediate operations
An MR reference leak exists when handling UC RDMA writes with
immediate data because we manipulate the reference counts as if the
operation had been a send.

This patch moves the last_imm label so that the RDMA write operations
with immediate data converge at the cq building code.  The copy/mr
deref code is now done correctly prior to the branch to last_imm.

Reviewed-by: Edward Mascarenhas <edward.mascarenhas@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:19 -07:00
Jack Morgenstein b1d8eb5a21 IB/mlx4: Add debug prints
Define pr_fmt and add some pr_debug prints.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:06 -07:00
Roland Dreier d90f9b3591 IB: Use IS_ENABLED(CONFIG_IPV6)
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:04:32 -07:00
Roland Dreier f747c34af4 RDMA/cxgb4: Fix endianness of addition to mpa->private_data_size
sparse correctly warns that if mpa->private_data_size is __be16, then
doing += on it is wrong, even if we do += htons(<something>) -- on a
little endian system, carries will go the wrong way.  Fix this up by
doing the addition in native byte order.

Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:02:33 -07:00
Hadar Hen Zion 0ff1fb654b {NET, IB}/mlx4: Add device managed flow steering firmware API
The driver is modified to support three operation modes.

If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.

When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.

When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.

Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Roland Dreier d1e09ebf42 RDMA/ocrdma: Fix assignment of max_srq_sge in device query
We want to set attr->max_srq_sge to dev->attr.max_srq_sge, not to itself.

This was detected by Coverity (CID 709210).

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-07 15:13:47 -07:00
David S. Miller 534cb283ef cxgb3: Convert t3_l2t_get() over to dst_neigh_lookup().
This means passing in a suitable destination address.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 02:29:40 -07:00
Dan Carpenter 7b33dc2b05 RDMA/ocrdma: Fix off by one in ocrdma_query_gid()
The dev->sgid_tbl[] array is allocated in ocrdma_alloc_resources().
It has OCRDMA_MAX_SGID elements so the test here is off by one.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-14 13:30:41 -07:00
Parav Pandit a3698a9b91 RDMA/ocrdma: Fixed RQ error CQE polling
Fix RQ/SRQ error CQE polling.  Return error CQE to consumer for error
case which was not returned previously.

Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-11 09:38:36 -07:00
Mahesh Vardhamanaiah 634c5796a5 RDMA/ocrdma: Correct queue SGE calculation
Fix max sge calculation for sq, rq, srq for all hardware types.

Signed-off-by: Mahesh Vardhamanaiah <mahesh.vardhamanaiah@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-11 09:38:35 -07:00
Mahesh Vardhamanaiah 07bb54244e RDMA/ocrdma: Correct reported max queue sizes
Fix code to read the max wqe and max rqe values from mailbox response.

Signed-off-by: Mahesh Vardhamanaiah <mahesh.vardhamanaiah@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-11 09:38:34 -07:00
Parav Pandit 6ab6827ee9 RDMA/ocrdma: Fixed GID table for vlan and events
1. Fix reporting GID table addition events.
2. Enable vlan based GID entries only when VLAN is enabled at compile
   time (test CONFIG_VLAN_8021Q / CONFIG_VLAN_8021Q_MODULE).

Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-11 09:38:32 -07:00
Roland Dreier 20952cdd8e Merge branches 'cxgb4', 'mlx4' and 'ocrdma' into for-linus 2012-06-06 10:08:11 -07:00