Commit Graph

5054 Commits

Author SHA1 Message Date
Moni Shoua 7e57b85c44 IB/mlx4: Add support for setting RoCEv2 gids in hardware
To tell hardware about a gid with type RoCEv2, software needs a new
modifier to the SET_PORT command: MLX4_SET_PORT_ROCE_ADDR. This can
replace the old method, MLX4_SET_PORT_GID_TABLE, for  RoCEv1 gids.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:35:00 -05:00
Moni Shoua b699a859d1 IB/mlx4: Add gid_type to GID properties
IB core driver adds a property of type to struct ib_gid_attr.
The mlx4 driver should take that in consideration when modifying or
querying the hardware gid table.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:35:00 -05:00
Matan Barak c3efe7500a IB/core: Use hop-limit from IP stack for RoCE
Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for
RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as
hop limit values.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:26:56 -05:00
Matan Barak f7f4b23e27 IB/core: Rename rdma_addr_find_dmac_by_grh
rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and
downsteram patch will also add hop_limit as an output parameter,
thus we rename it to rdma_addr_find_l2_eth_by_grh.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:26:55 -05:00
Bart Van Assche 4bfdf635c6 IB/cm: Fix a recently introduced deadlock
ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
that can be locked from inside an interrupt handler. Hence do not
enable interrupts inside cm_enter_timewait() if called with interrupts
disabled.

This patch fixes e.g. the following deadlock:
Acked-by: Erez Shitrit <erezsh@mellanox.com>

=================================
[ INFO: inconsistent lock state ]
4.4.0-rc7+ #1 Tainted: G            E
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&cm_id_priv->lock)->rlock){?.+...}, at: [<ffffffffa036eec4>] cm_establish+0x
74/0x1b0 [ib_cm]
{HARDIRQ-ON-W} state was registered at:
  [<ffffffff810a3c11>] mark_held_locks+0x71/0x90
  [<ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0
  [<ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10
  [<ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40
  [<ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm]
  [<ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm]
  [<ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp]
  [<ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm]
  [<ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm]
  [<ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm]
  [<ffffffff8107184d>] process_one_work+0x1bd/0x460
  [<ffffffff81073148>] worker_thread+0x118/0x420
  [<ffffffff81078454>] kthread+0xe4/0x100
  [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
irq event stamp: 1672286
hardirqs last  enabled at (1672283): [<ffffffff81408ec0>] poll_idle+0x10/0x80
hardirqs last disabled at (1672284): [<ffffffff8151d304>] common_interrupt+0x84/0x89
softirqs last  enabled at (1672286): [<ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50
softirqs last disabled at (1672285): [<ffffffff8105b697>] irq_enter+0x47/0x70

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cm_id_priv->lock)->rlock);
  <Interrupt>
    lock(&(&cm_id_priv->lock)->rlock);

 *** DEADLOCK ***

no locks held by swapper/8/0.

stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E   4.4.0-rc7+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
 ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007
 0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8
 ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001
Call Trace:
 <IRQ>  [<ffffffff81251c1b>] dump_stack+0x4f/0x74
 [<ffffffff810a32f4>] print_usage_bug+0x184/0x190
 [<ffffffff810a36e2>] mark_lock_irq+0xf2/0x290
 [<ffffffff810a3995>] mark_lock+0x115/0x1b0
 [<ffffffff810a3b8c>] mark_irqflags+0x15c/0x170
 [<ffffffff810a4fef>] __lock_acquire+0x1ef/0x560
 [<ffffffff810a53c2>] lock_acquire+0x62/0x80
 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
 [<ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm]
 [<ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm]
 [<ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt]
 [<ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
 [<ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core]
 [<ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
 [<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
 [<ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110
 [<ffffffff810b68bf>] handle_irq_event+0x3f/0x70
 [<ffffffff810ba7f9>] handle_edge_irq+0x79/0x120
 [<ffffffff81007f3d>] handle_irq+0x5d/0x130
 [<ffffffff810071fd>] do_IRQ+0x6d/0x130
 [<ffffffff8151d309>] common_interrupt+0x89/0x89
 <EOI>  [<ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200
 [<ffffffff81408aa2>] cpuidle_enter+0x12/0x20
 [<ffffffff810990d6>] call_cpuidle+0x36/0x60
 [<ffffffff81099163>] cpuidle_idle_call+0x63/0x110
 [<ffffffff8109930a>] cpu_idle_loop+0xfa/0x130
 [<ffffffff8109934e>] cpu_startup_entry+0xe/0x10
 [<ffffffff8103c443>] start_secondary+0x83/0x90

Fixes: commit be4b499323 ("IB/cm: Do not queue work to a device that's going away")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:26:55 -05:00
Bart Van Assche 19f57298f0 IB/srpt: Fix the RDMA completion handlers
Avoid that the following kernel crash is triggered when processing
an RDMA completion:

BUG: unable to handle kernel paging request at 0000000100000198
IP: [<ffffffff810a4ea2>] __lock_acquire+0xa2/0x560
Call Trace:
 [<ffffffff810a53c2>] lock_acquire+0x62/0x80
 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
 [<ffffffffa04fd437>] srpt_rdma_read_done+0x57/0x120 [ib_srpt]
 [<ffffffffa0144dd3>] __ib_process_cq+0x43/0xc0 [ib_core]
 [<ffffffffa0145115>] ib_cq_poll_work+0x25/0x70 [ib_core]
 [<ffffffff8107184d>] process_one_work+0x1bd/0x460
 [<ffffffff81073148>] worker_thread+0x118/0x420
 [<ffffffff81078454>] kthread+0xe4/0x100
 [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70

Fixes: commit 59fae4deaa ("IB/srpt: chain RDMA READ/WRITE requests").
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:26:55 -05:00
Matan Barak 9506902b7b IB/core: Fix dereference before check
Sparse complains about dereference before check. Fixing this by
moving the check before the dereference.

Fixes: 200298326b ('IB/core: Validate route when we init ah')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:26:54 -05:00
Matan Barak 2e2cdace5a IB/core: Eliminate sparse false context imbalance warning
When write_gid function needs to do a sleep-able operation, it unlocks
table->rwlock and then relocks it. Sparse complains about context
imbalance.

This is safe as write_gid is always called with table->rwlock.
write_gid protects from simultaneous writes to this GID entry
by setting the GID_TABLE_ENTRY_INVALID flag.

Fixes: 9c584f0495 ('IB/core: Change per-entry lock in RoCE GID table to
		     one lock')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:26:21 -05:00
Hal Rosenstock 6e2a51a0f7 IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling
Port number is not part of ClassPortInfo attribute but is
still needed as a parameter when invoking process_mad.

To properly handle this attribute, port_num is added as a
parameter to get_counter_table and get_perf_mad was changed
not to store port_num in the attribute itself when it's
querying the ClassPortInfo attribute.

This handles issue pointed out by Matan Barak <matanb@dev.mellanox.co.il>

Fixes: 145d9c5410 ('IB/core: Display extended counter set if available')

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:26:20 -05:00
Bart Van Assche b6aeb980f1 IB/core: Remove set-but-not-used variable from ib_sg_to_pages()
Detected this by building the IB core with W=1. See also patch
"IB core: Fix ib_sg_to_pages()" (commit 8f5ba10ed4).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Leon Romanovsky <leon.romanovsky@mellanox.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:25:45 -05:00
Leon Romanovsky c876a1b7dd IB/mlx5: Fix passing casted pointer in mlx5_query_port_roce
Fix static checker warning:
        drivers/infiniband/hw/mlx5/main.c:149 mlx5_query_port_roce()
        warn: passing casted pointer '&props->qkey_viol_cntr' to
	'mlx5_query_nic_vport_qkey_viol_cntr()' 32 vs 16.

Fixes: 3f89a643eb ("IB/mlx5: Extend query_device/port to support RoCE")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:25:45 -05:00
Christoph Hellwig d53e11fdf0 IB/mad: use CQ abstraction
Remove the local workqueue to process mad completions and use the CQ API
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:25:45 -05:00
Christoph Hellwig ca281265c0 IB/mad: pass ib_mad_send_buf explicitly to the recv_handler
Stop abusing wr_id and just pass the parameter explicitly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:25:36 -05:00
Lucas Tanure 39f426553e infiniband: Replace memset with eth_zero_addr
Use eth_zero_addr to assign the zero address to the given address
array instead of memset when second argument is address of zero.

Signed-off-by: Lucas Tanure <tanure@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:24:54 -05:00
Leon Romanovsky 50ca6ed21e IB/mlx5: Delete locally redefined variable
Fix the following sparse warning:
drivers/infiniband/hw/mlx5/main.c:1061:29: warning: symbol 'pfn' shadows
an earlier one
drivers/infiniband/hw/mlx5/main.c:1030:21: originally declared here

Fixes: d69e3bcf79 ('IB/mlx5: Mmap the HCA's core clock register to user-space')
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:24:54 -05:00
Moni Shoua 1049f13816 IB/mlx4: Take source mac from AH instead from the port
In commit dbf727de74 ("IB/core: Use GID table in AH creation and dmac
resolution") we copy source mac to mlx4_ah from the attributes of
gid at ib_ah_attr.grh.sgid_index. Now we can use it.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:24:53 -05:00
Matan Barak 4e40816734 IB/mlx4: Initialize hop_limit when creating address handle
Hop limit value wasn't copied from attributes  when ah was created.
This may influence packets for unconnected services to get dropped in
routers when endpoints are not in the same subnet.

Fixes: fa417f7b52 ("IB/mlx4: Add support for IBoE")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:24:53 -05:00
Leon Romanovsky 9f17768611 IB/mlx5: Expose correct maximum number of CQE capacity
Maximum number of EQE capacity per CQ was mistakenly exposed
as CQE. Fix that.

Fixes: 938fe83c8d ("net/mlx5_core: New device capabilities handling")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:24:53 -05:00
Hariprasad S 28de1f7437 iw_cxgb4: Take clip reference before starting IPv6 listen
The h/w is designed in such a way that, if you do anything IPv6
related, a valid clip entry must be there. So take clip reference
before creating IPv6 listening servers, and then if we fail to
create server, release the clip entry.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:23:40 -05:00
Hariprasad S 4275a5b200 iw_cxgb4: Fixes GW-Basic labels to meaningful error names
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:17:40 -05:00
Hariprasad S 82b1df1b08 iw_cxgb4: Fixes static checker warning in c4iw_rdev_open()
Commit c5dfb000b9 ("iw_cxgb4: Pass qid range to user space driver")
from Dec 11, 2015, leads to the following static checker warning:

	drivers/infiniband/hw/cxgb4/device.c:857 c4iw_rdev_open()
        warn: variable dereferenced before check 'rdev->status_page'

Also we weren't deallocating ocqp pool in error path when failed to
allocate status page. Fixing it too.

Fixes: c5dfb000b9 ("iw_cxgb4: Pass qid range to user space driver")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:17:40 -05:00
Dan Carpenter a7d0e959fa IB/cma: allocating too much memory in make_cma_ports()
The issue here is that there is a cut and paste bug.  When we allocate
cma_dev_group->default_ports_group we use "sizeof(*cma_dev_group->ports)"
instead of "sizeof(*cma_dev_group->default_ports_group)".

We're bumping up against the 80 character limit so I introduced a new
local pointer "ports_group" to get around that.

Fixes: 045959db65 ('IB/cma: Add configfs for rdma_cm')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:17:40 -05:00
Dan Carpenter bc1251e6d9 RDMA/nes: checking for NULL instead of IS_ERR
nes_reg_phys_mr() returns ERR_PTRs on error.  It doesn't return NULL.

This bug has been there for a while, but we recently changed from
calling a function pointer to calling nes_reg_phys_mr() directly so now
Smatch is able to detect the bug.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:17:40 -05:00
Vinit Agnihotri fbbeb8632b IB/qib: Support creating qps with GFP_NOIO flag
The current code is problematic when the QP creation and ipoib is used to
support NFS and NFS desires to do IO for paging purposes. In that case, the
GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight memory
situations.

This fix adds support to create queue pair with GFP_NOIO flag for connected
mode only to cleanly fail the create queue pair in those situations.

Cc: <stable@vger.kernel.org> # 3.16+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:17:40 -05:00
Ira Weiny 65487fdc0c IB/sysfs: Fix sparse warning on attr_id
Attributed ID was declared as an int while the value should really be big
endian 16.

Fixes: 35c4cbb178 ("IB/core: Create get_perf_mad function in sysfs.c")

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 14:12:56 -05:00
Devesh Sharma 3b1ea43009 RDMA/ocrdma: Depend on async link events from CNA
Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issuing "open" on be2net interface.

The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.

A. be2net is sending administrative open/close event to ocrdma holding
   device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
   So sequence of locks is rtnl_lock---> device_list lock

B.  When new ocrdma roce device gets registered, infiniband stack now
    takes rtnl_lock in ib_register_device() in GID initialization routines.
    So sequence of locks in this path is device_list lock ---> rtnl_lock.

This improper locking sequence causes deadlock.

With this patch we stop using administrative open and close events
injected by be2net driver. These events were used to dispatch PORT_ACTIVE
and PORT_ERROR events to the IB-stack. This patch implements a logic
to receive async-link-events generated from CNA whenever link-state-change
is detected. Now on, these async-events will be used to dispatch
PORT_ACTIVE and PORT_ERROR events to IB-stack.

Depending on async-events from CNA removes the need to hold device-list-mutex
and thus breaks the busy-wait scenario.

Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 14:00:47 -05:00
Devesh Sharma d310a344e1 RDMA/ocrdma: Dispatch only port event when port state changes
Dispatch only port event to IB stack when port state changes.
Don't explicitly modify qps to error. Let application listen to
port events on async event queue or let QP fail with retry-exceeded
completion error.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 14:00:47 -05:00
Devesh Sharma a2addf94a8 RDMA/ocrdma: Fix vlan-id assignment in qp parameters
vlan-id is wrongly getting as 0 when PFC is enabled.
Set vlan-id configured by user in QP parameters.
In case vlan interface is not used, flash a warning to
user to configure vlan and assign vlan-id as 0 in qp params.

Fixes: dbf727de74 ('IB/core: Use GID table in AH creation and dmac resolution')
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 14:00:47 -05:00
Matan Barak 649367735e IB/cma: Fix RDMA port validation for iWarp
cma_validate_port wrongly assumed that Ethernet devices are RoCE
devices and thus their ndev should be matched in the GID table.
This broke the iWarp support. Fixing that matching the ndev only if
we work on a RoCE port.

Cc: <stable@vger.kernel.org> # 4.4.x-
Fixes: abae1b71dd ('IB/cma: cma_validate_port should verify the port
		     and netdevice')
Reported-by: Hariprasad Shenai <hariprasad@chelsio.com>
Tested-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 13:33:47 -05:00
Mike Marciniszyn 09dc9cd652 IB/qib: fix mcast detach when qp not attached
The code produces the following trace:

[1750924.419007] general protection fault: 0000 [#3] SMP
[1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
[1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
3.13.0-39-generic #66-Ubuntu
[1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
860/0XM089, BIOS A04 07/24/2007
[1750924.420364] task: ffff8800366a9800 ti: ffff88007af1c000 task.ti:
ffff88007af1c000
[1750924.420364] RIP: 0010:[<ffffffffa0131d51>] [<ffffffffa0131d51>]
qib_mcast_qp_free+0x11/0x50 [ib_qib]
[1750924.420364] RSP: 0018:ffff88007af1dd70  EFLAGS: 00010246
[1750924.420364] RAX: 0000000000000001 RBX: ffff88007b822688 RCX:
000000000000000f
[1750924.420364] RDX: ffff88007b822688 RSI: ffff8800366c15a0 RDI:
6764697200000000
[1750924.420364] RBP: ffff88007af1dd78 R08: 0000000000000001 R09:
0000000000000000
[1750924.420364] R10: 0000000000000011 R11: 0000000000000246 R12:
ffff88007baa1d98
[1750924.420364] R13: ffff88003ecab000 R14: ffff88007b822660 R15:
0000000000000000
[1750924.420364] FS:  00007ffff7fd8740(0000) GS:ffff88007fc80000(0000)
knlGS:0000000000000000
[1750924.420364] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1750924.420364] CR2: 00007ffff597c750 CR3: 000000006860b000 CR4:
00000000000007e0
[1750924.420364] Stack:
[1750924.420364]  ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
000000007af1de20
[1750924.420364]  ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
ffffffffa00cb313
[1750924.420364]  00007fffffffde88 0000000000000000 0000000000000008
ffff88003ecab000
[1750924.420364] Call Trace:
[1750924.420364]  [<ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
[ib_qib]
[1750924.568035]  [<ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
[ib_uverbs]
[1750924.568035]  [<ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
[1750924.568035]  [<ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
[ib_uverbs]
[1750924.568035]  [<ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
[1750924.568035]  [<ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
[1750924.568035]  [<ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
[1750924.568035]  [<ffffffff811bd214>] vfs_write+0xb4/0x1f0
[1750924.568035]  [<ffffffff811bdc49>] SyS_write+0x49/0xa0
[1750924.568035]  [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
<f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
[1750924.568035] RIP  [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
[ib_qib]
[1750924.568035]  RSP <ffff88007af1dd70>
[1750924.650439] ---[ end trace 73d5d4b3f8ad4851 ]

The fix is to note the qib_mcast_qp that was found.   If none is found, then
return EINVAL indicating the error.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 13:09:44 -05:00
Erez Shitrit 50be28de6f IB/IPoIB: Fix kernel panic on multicast flow
ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the
parameter mcast->dev. That mcast is a temporary (used as an iterator)
variable that may be uninitialized.
There is no need to send the variable dev to the function, as each mcast
has its dev as a member in the mcast struct.

This causes the next panic:
RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib]
RSP: 0018: EFLAGS: 00010246
RAX: f0201 RBX: 24e00 RCX: 00000
....
....
Stack:
Call Trace:
	ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib]
	ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib]
	process_one_work+0x164/0x470
	worker_thread+0x11d/0x420
	...

Fixes: 5a0e81f6f4 ('IB/IPoIB: factor out common multicast list removal code')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reported-by: Doron Tsur <doront@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 12:59:54 -05:00
Doron Tsur 0b6e26ce89 net/mlx5_core: Fix trimming down IRQ number
With several ConnectX-4 cards installed on a server, one may receive
irqn > 255 from the kernel API, which we mistakenly trim to 8bit.

This causes EQ creation failure with the following stack trace:
[<ffffffff812a11f4>] dump_stack+0x48/0x64
[<ffffffff810ace21>] __setup_irq+0x3a1/0x4f0
[<ffffffff810ad7e0>] request_threaded_irq+0x120/0x180
[<ffffffffa0923660>] ? mlx5_eq_int+0x450/0x450 [mlx5_core]
[<ffffffffa0922f64>] mlx5_create_map_eq+0x1e4/0x2b0 [mlx5_core]
[<ffffffffa091de01>] alloc_comp_eqs+0xb1/0x180 [mlx5_core]
[<ffffffffa091ea99>] mlx5_dev_init+0x5e9/0x6e0 [mlx5_core]
[<ffffffffa091ec29>] init_one+0x99/0x1c0 [mlx5_core]
[<ffffffff812e2afc>] local_pci_probe+0x4c/0xa0

Fixing it by changing of the irqn type from u8 to unsigned int to
support values > 255

Fixes: 61d0e73e0a ('net/mlx5_core: Use the the real irqn in eq->irqn')
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-17 12:08:04 -05:00
Linus Torvalds 7d1fc01afc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  floppy: make local variable non-static
  exynos: fixes an incorrect header guard
  dt-bindings: fixes some incorrect header guards
  cpufreq-dt: correct dead link in documentation
  cpufreq: ARM big LITTLE: correct dead link in documentation
  treewide: Fix typos in printk
  Documentation: filesystem: Fix typo in fs/eventfd.c
  fs/super.c: use && instead of & for warn_on condition
  Documentation: fix sysfs-ptp
  lib: scatterlist: fix Kconfig description
2016-01-14 17:04:19 -08:00
Nicholas Bellinger f246c94154 ib_srpt: Convert acl lookup to modern get_initiator_node_acl usage
This patch does a simple conversion of ib_srpt code to use
proper modern core_tpg_get_initiator_node_acl() lookup using
se_node_acl->acl_kref, and drops the legacy internal list
usage from srpt_lookup_acl().

This involves doing transport_init_session() earlier, and
making sure transport_free_session() is called during
a se_node_acl lookup failure to drop the last ->acl_kref.

Also, it adds a minor backwards-compat hack to avoid the
potential for user-space wrt node-acl WWPN formatting by
simply stripping off '0x' prefix from ch->sess_name, and
retrying once if core_tpg_get_initiator_node_acl() fails.

Finally, go ahead and drop port_acl_list port_acl_lock
since they are no longer used.

Cc: Vu Pham <vu@mellanox.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2016-01-12 23:44:32 -08:00
Maor Gottlieb 038d2ef875 IB/mlx5: Add flow steering support
Adding flow steering support by creating a flow-table per
priority (if rules exist in the priority). mlx5_ib uses
autogrouping and thus only creates the required destinations.

Also includes adding of these flow steering utilities

1. Parsing verbs flow attributes hardware steering specs.

2. Check if flow is multicast - this is required in order to decide
to which flow table will we add the steering rule.

3. Set outer headers in flow match criteria to zeros.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11 17:48:53 -05:00
Nicholas Bellinger 373a4cd737 iser-target: Fix non negative ERR_PTR isert_device_get usage
As reported by Dan, isert_create_device_ib_res() failure within
isert_device_get() can potentially return a postive value,
resulting in ERR_PTR() triggering a NULL pointer dereference.

Caught by the static checker:

     drivers/infiniband/ulp/isert/ib_isert.c:423 isert_device_get()
     error: passing non negative 1 to ERR_PTR

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2016-01-07 13:57:51 -08:00
David S. Miller c07f30ad68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-31 18:20:10 -05:00
Devesh Sharma 10a214dc99 RDMA/ocrdma: Depend on async link events from CNA
Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issuing "open" on be2net interface.

The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.

A. be2net is sending administrative open/close event to ocrdma holding
   device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
   So sequence of locks is rtnl_lock---> device_list lock

B.  When new ocrdma roce device gets registered, infiniband stack now
    takes rtnl_lock in ib_register_device() in GID initialization routines.
    So sequence of locks in this path is device_list lock ---> rtnl_lock.

This improper locking sequence causes deadlock.

With this patch we stop using administrative open and close events
injected by be2net driver. These events were used to dispatch PORT_ACTIVE
and PORT_ERROR events to the IB-stack. This patch implements a logic
to receive async-link-events generated from CNA whenever link-state-change
is detected. Now on, these async-events will be used to dispatch
PORT_ACTIVE and PORT_ERROR events to IB-stack.

Depending on async-events from CNA removes the need to hold device-list-mutex
and thus breaks the busy-wait scenario.

Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-28 11:45:54 -05:00
Devesh Sharma 36ac0db0db RDMA/ocrdma: Dispatch only port event when port state changes
Dispatch only port event to IB stack when port state changes.
Don't explicitly modify qps to error. Let application listen to
port events on async event queue or let QP fail with retry-exceeded
completion error.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-28 11:45:54 -05:00
Devesh Sharma c6002d5602 RDMA/ocrdma: Fix vlan-id assignment in qp parameters
vlan-id is wrongly getting as 0 when PFC is enabled.
Set vlan-id configured by user in QP parameters.
In case vlan interface is not used, flash a warning to
user to configure vlan and assign vlan-id as 0 in qp params.

Fixes: dbf727de74 ('IB/core: Use GID table in AH creation and dmac resolution')
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-28 11:45:54 -05:00
Jenny Derzhavetz 59caaed7a7 IB/iser: Support the remote invalidation exception
Declare that we support remote invalidation in case we are:
1. using fastreg method
2. always registering memory

Detect the invalidated rkey from the work completion info so we
won't invalidate it locally. The spec mandates that we must not rely
on the target remote invalidate our rkey so we must check it upon
a receive (scsi response) completion.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-26 19:27:10 -05:00
Sagi Grimberg e26d2d21ff IB/iser: Change the increment rkey flow logic
When we enable remote invalidate support we won't want to perform
local invalidates at the same time we do today, but we still need
to get new rkeys.  So, decouple the rkey update from the local
invalidate and tie it to memory reg instead.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:36 -05:00
Jenny Derzhavetz 422bd0acb0 IB/isert: Support the remote invalidation exception
We'll use remote invalidate, according to negotiation result
during connection establishment. If the initiator declared that
it supports the remote invalidate exception and the local HCA
supports IB_DEVICE_MEM_MGT_EXTENSIONS then the target will
use IB_WR_SEND_WITH_INV with the correct rkey for the response.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:36 -05:00
Jenny Derzhavetz 13bce4821f IB/isert: Declare correct flags when accepting a connection
iser target does not support zero based virtual addresses and
send with invalidate, so it should declare that it doesn't.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:35 -05:00
Sagi Grimberg c64941533f IB/isert: Remove unused file iser_proto.h
We don't need iser_proto.h anymore, remove it and
move (non-protocol) declarations to ib_isert.h

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:35 -05:00
Sagi Grimberg d3cf81f9c8 IB/iser,isert: Create and use new shared header
The iser RDMA_CM negotiation protocol is shared by
the initiator and the target, so have a shared header
for the defines and structure. Move relevant items from
the initiator and target headers.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:35 -05:00
Jenny Derzhavetz 1caa70d8a7 IB/iser: set intuitive values for mr_valid
This parameter is described as "is mr valid indicator".
In other words, it indicates whether memory registration
is valid or not. So intuitive values would be:
mr_valid=True, when memory registration is valid and
mr_valid=False otherwise.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:34 -05:00
Jenny Derzhavetz b5f04b00f7 IB/iser: Don't register memory for all immediate data writes
When all the task data is sent as immediate data, we are
allowed to use the local_dma_lkey as it is not sent to
the wire.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:34 -05:00
Sagi Grimberg bfe066e256 IB/iser: Reuse ib_sg_to_pages
We have in iser iser_sg_to_page_vec which has exactly
the same role as ib_sg_to_pages. Customize the page_vec
to hold a fake MR so we can reuse ib_sg_to_pages.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:34 -05:00
Roi Dayan 08ff089b12 IB/iser: Fix module init not cleaning up on error flow
Destroy workqueue on transport register error, also
release kmem cache on workqueue allocation error.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:33 -05:00
Julia Lawall 46e741f410 IB/core: constify mmu_notifier_ops structures
This mmu_notifier_ops structure is never modified, so declare it as
const, like the other mmu_notifier_ops structures.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:33 -05:00
Julia Lawall 2392a4cdcb IB/iser: constify iser_reg_ops structure
The iser_reg_ops structures are never modified, so declare them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:33 -05:00
Julia Lawall c419874c44 RDMA/nes: constify nes_cm_ops structure
The nes_cm_ops structure is never modified, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:32 -05:00
Bodong Wang 88115fe7a0 IB/mlx5: report tx/rx checksum cap in query results
This patch will report the tx/rx checksum cap for raw qp via the
query device results.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:32 -05:00
Leon Romanovsky ee37095044 IB/mlx4: Convert kmalloc to kmalloc_array for checkpatch
Convert kmalloc to be kmalloc_array to fix warnings below:

WARNING: Prefer kmalloc_array over kmalloc with multiply
+               qp->sq.wrid = kmalloc(qp->sq.wqe_cnt * sizeof(u64),

WARNING: Prefer kmalloc_array over kmalloc with multiply
+               qp->rq.wrid = kmalloc(qp->rq.wqe_cnt * sizeof(u64),

WARNING: Prefer kmalloc_array over kmalloc with multiply
+               srq->wrid = kmalloc(srq->msrq.max * sizeof(u64),

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:32 -05:00
Leon Romanovsky 9afc60dcc4 IB/mlx4: Suppress non-fatal memory allocations
Failure in kmalloc memory allocations will throw a warning about it.
Such warnings are not needed anymore, since in commit 0ef2f05c7e
("IB/mlx4: Use vmalloc for WR buffers when needed"), fallback mechanism
from kmalloc() to __vmalloc() was added.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:31 -05:00
Eran Ben Elisha da7525d2a9 IB/mlx5: Advertise atomic capabilities in query device
In order to ensure IB spec atomic correctness in atomic operations, if
HW is configured to host endianness, advertise IB_ATOMIC_HCA.  if not,
advertise IB_ATOMIC_NONE.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:31 -05:00
Hariprasad S 67f1aee6f4 iw_cxgb3: Fix incorrectly returning error on success
The cxgb3_*_send() functions return NET_XMIT_ values, which are
positive integers values. So don't treat positive return values
as an error.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:30 -05:00
Hariprasad S c5dfb000b9 iw_cxgb4: Pass qid range to user space driver
Enhances the t4_dev_status_page to pass the qid start and size
attributes from iw_cxgb4 to libcxgb4.
Bump the ABI Version to 3 -> To allow libcxgb4 to detect old drivers and
revert to the old way of computing the qid ranges.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:30 -05:00
Dean Luick 0d6ed314de IB/mad: Ensure fairness in ib_mad_completion_handler
It was found that when a process was rapidly sending MADs other processes could
be hung in their unregister calls.

This would happen when process A was injecting packets fast enough that the
single threaded workqueue was never exiting ib_mad_completion_handler.
Therefore when process B called flush_workqueue via the unregister call it
would hang until process A stopped sending MADs.

The fix is to periodically reschedule ib_mad_completion_handler after
processing a large number of completions.  The number of completions chosen was
decided based on the defaults for the recv queue size.  However, it was kept
fixed such that increasing those queue sizes would not adversely affect
fairness in the future.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:30 -05:00
Leon Romanovsky 051f263098 IB/mlx5: Add driver cross-channel support
Add support of cross-channel functionality to mlx5
driver. This includes ability to ignore overrun for CQ
which intended for cross-channel, export device capability and
configure the QP to be sync master/slave queues.

The cross-channel enabled QP supports combination of
three possible properties:
* WQE processing on the receive queue of this QP
* WQE processing on the send queue of this QP
* WQE are supported on the send queue

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:33:14 -05:00
Leon Romanovsky 8a06ce59a4 IB/core: Add cross-channel support
The cross-channel feature allows to execute WQEs that involve
synchronization of I/O operations’ on different QPs.

This capability enables to program complex flows with a single
function call, hereby significantly reducing overhead associated
with I/O processing.

Cross-channel operations support is indicated by HCA capability
information.

The queue pairs can be configured to work as a “sync master queue”
or “sync slave queues”.

The added flags are:

1. Device capability flag IB_DEVICE_CROSS_CHANNEL for the
   devices that can perform cross-channel operations.

2. CQ property flag IB_CQ_FLAGS_IGNORE_OVERRUN to disable CQ overrun
   check. This check is useless in cross-channel scenario.

3. QP property flags to indicate if queues are slave or master:
   * IB_QP_CREATE_MANAGED_SEND indicates that posted send work requests
     will not be executed immediately and requires enabling.
   * IB_QP_CREATE_MANAGED_RECV indicates that posted receive work
     requests will not be executed immediately and requires enabling.
   * IB_QP_CREATE_CROSS_CHANNEL declares the QP to work in cross-channel
     mode. If IB_QP_CREATE_MANAGED_SEND and IB_QP_CREATE_MANAGED_RECV are
     not provided, this QP will be sync master queue, else it will be sync
     slave.

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:33:14 -05:00
Matan Barak d69e3bcf79 IB/mlx5: Mmap the HCA's core clock register to user-space
In order to read the HCA's current cycles register, we need
to map it to user-space. Add support to map this register
via mmap command.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:25:59 -05:00
Matan Barak b368d7cb8c IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext
Pass hca_core_clock_offset to user-space is mandatory in order to
let the user-space read the free-running clock register from the
right offset in the memory mapped page.
Passing this value is done by changing the vendor's command
and response of init_ucontext to be in extensible form.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:25:59 -05:00
Matan Barak 7c60bcbb68 IB/mlx5: Add support for hca_core_clock and timestamp_mask
Reporting the hca_core_clock (in kHZ) and the timestamp_mask in
query_device extended verb. timestamp_mask is used by users in order
to know what is the valid range of the raw timestamps, while
hca_core_clock reports the clock frequency that is used for
timestamps.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:25:59 -05:00
Matan Barak 972ecb8213 IB/mlx5: Add create_cq extended command
In order to create a CQ that supports timestamp, mlx5 needs to
support the extended create CQ command with the timestamp flag.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:23:02 -05:00
Christoph Lameter 145d9c5410 IB/core: Display extended counter set if available
Check if the extended counters are available and if so
create the proper extended and additional counters.

Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 15:58:30 -05:00
Christoph Lameter b2788ce575 IB/core: Specify attribute_id in port_table_attribute
Add the attr_id on port_table_attribute since we will have to add
a different port_table_attribute for the extended attribute soon.

Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 15:58:30 -05:00
Christoph Lameter 35c4cbb178 IB/core: Create get_perf_mad function in sysfs.c
Create a new function to retrieve performance management
data from the existing code in get_pma_counter().

Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 15:58:30 -05:00
Christoph Hellwig ab67ed8de0 IB: remove the write-only usecnt field from struct ib_mr
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 14:29:06 -05:00
Christoph Hellwig f64054d850 nes: simplify nes_reg_phys_mr calling conventions
Just pass and address/size pair instead of an ib_phys_buf array.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 14:29:05 -05:00
Christoph Hellwig 35cb3fc026 cxgb3: simplify iwch_get_dma_wr
Fold simplified versions of build_phys_page_list and
iwch_register_phys_mem into iwch_get_dma_wr now that no other callers
are left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 14:29:04 -05:00
Christoph Hellwig feb7c1e38b IB: remove in-kernel support for memory windows
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the
unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw
into the uverbs module.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 14:29:04 -05:00
Christoph Hellwig b7d3e0a94f IB: remove support for phys MRs
We have stopped using phys MRs in the kernel a while ago, so let's
remove all the cruft used to implement them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-By: Devesh Sharma<devesh.sharma@avagotech.com> [ocrdma]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 14:29:04 -05:00
Christoph Hellwig a4d825a01e IB: remove ib_query_mr
This functionality has no users and was only supported by the staged out
EHCA driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 14:29:03 -05:00
Christoph Lameter 432c55fff4 IB/IPoIB: Move multicast specific code out of ipoib_main.c
Code cleanup to move multicast specific code that checks for
a sendonly join to ipoib_multicast.c. This allows the removal
of the export of __ipoib_mcast_find().

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 14:28:52 -05:00
Christoph Lameter 5a0e81f6f4 IB/IPoIB: factor out common multicast list removal code
Code cleanup to remove multicast specific code from ipoib_main.c

The removal of a list of multicast groups occurs in three places.
Create a new function ipoib_mcast_remove_list(). Use this new
function in ipoib_main.c too.
That in turn allows the dropping of two functions that were
exported from ipoib_multicast.c for expiration of mc groups.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 14:28:00 -05:00
Achiad Shochat e53505a802 IB/mlx5: Support RoCE
Advertise RoCE support for IB/core layer and set the hardware to
work in RoCE mode.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:37 -05:00
Achiad Shochat 2811ba51b0 IB/mlx5: Add RoCE fields to Address Vector
Set the address handle and QP address path fields according to the
link layer type (IB/Eth).

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:37 -05:00
Achiad Shochat 3cca26069a IB/mlx5: Support IB device's callbacks for adding/deleting GIDs
These callbacks write into the mlx5 RoCE address table.
Upon del_gid we write a zero'd GID.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:37 -05:00
Achiad Shochat cb34be6da2 IB/mlx5: Set network_hdr_type upon RoCE responder completion
When handling a responder completion, if the link layer is Ethernet,
set the work completion network_hdr_type field according to CQE's
info and the IB_WC_WITH_NETWORK_HDR_TYPE flag.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:37 -05:00
Achiad Shochat 3f89a643eb IB/mlx5: Extend query_device/port to support RoCE
Using the vport access functions to retrieve the Ethernet
specific information and return this information in
ib_query_device and ib_query_port.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:37 -05:00
Achiad Shochat fc24fc5e95 IB/mlx5: Support IB device's callback for getting its netdev
For Eth ports only:
Maintain a net device pointer in mlx5_ib_device and update it
upon NETDEV_REGISTER and NETDEV_UNREGISTER events if the
net-device and IB device have the same PCI parent device.
Implement the get_netdev callback to return this net device.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:36 -05:00
Achiad Shochat ebd61f68e1 IB/mlx5: Support IB device's callback for getting the link layer
Make the existing mlx5_ib_port_link_layer() signature match
the ib device callback signature (add port_num parameter).
Refactor it to use a sub function so that the link layer could
be queried also before the ibdev is created.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:36 -05:00
Julia Lawall 59f65f495d IB/usnic: delete unneeded IS_ERR test
kzalloc doesn't return ERR_PTR, so there is no need to test for it.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,e;
@@

* x = kzalloc(...)
... when != x = e
* IS_ERR_OR_NULL(x)
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:50:58 -05:00
Nelson Escobar 96390f62aa IB/usnic: Handle 0 counts in resource allocation
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:50:58 -05:00
Nelson Escobar dc92d14684 IB/usnic: Fix resource leak in error case
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:50:58 -05:00
Nelson Escobar 89e5323c64 IB/usnic: Support more QP state transitions
They were already implemented at a lower layer, but the upper level
routine placed arbitrary restrictions on which transitions were
permitted.  Simplify the state machine logic to live wholly in
usnic_ib_qp_grp_modify.

Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:50:58 -05:00
Nelson Escobar 2547a3663c IB/usnic: Fix message typo
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:50:58 -05:00
Nelson Escobar 3ea7286139 IB/usnic: Fix incorrect cast in usnic_ib_fw_string_to_u64
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:50:58 -05:00
Nelson Escobar 1e67a64e9c IB/usnic: Improve a failure message
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:50:57 -05:00
Nelson Escobar 638e970747 IB/usnic: Remove unused prototype
query_protocol() was added in commit 6b90a6d66b ("IB/Verbs:
Implement new callback query_protocol()") and then removed in
commit f9b22e355d ("IB/core: Convert core to use bitfield
for caps").

This left behind an unused prototype.

Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:50:57 -05:00
Moni Shoua bee3c3c918 IB/cma: Join and leave multicast groups with IGMP
Since RoCEv2 is a protocol over IP header it is required to send IGMP
join and leave requests to the network when joining and leaving
multicast groups.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:39:53 -05:00
Moni Shoua 25f40220e5 IB/core: Initialize UD header structure with IP and UDP headers
ib_ud_header_init() is used to format InfiniBand headers
in a buffer up to (but not with) BTH. For RoCE UDP ENCAP it is
required that this function would be able to build also IP and UDP
headers.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:39:53 -05:00
Matan Barak 045959db65 IB/cma: Add configfs for rdma_cm
Users would like to control the behaviour of rdma_cm.
For example, old applications which don't set the
required RoCE gid type could be executed on RoCE V2
network types. In order to support this configuration,
we implement a configfs for rdma_cm.

In order to use the configfs, one needs to mount it and
mkdir <IB device name> inside rdma_cm directory.

The patch adds support for a single configuration file,
default_roce_mode. The mode can either be "IB/RoCE v1" or
"RoCE v2".

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:39:52 -05:00
Matan Barak 218a773f76 IB/rdma_cm: Add wrapper for cma reference count
Currently, cma users can't increase or decrease the cma reference
count. This is necassary when setting cma attributes (like the
default GID type) in order to avoid use-after-free errors.
Adding cma_ref_dev and cma_deref_dev APIs.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:39:52 -05:00
Matan Barak 200298326b IB/core: Validate route when we init ah
In order to make sure API users don't try to use SGIDs which don't
conform to the routing table, validate the route before searching
the RoCE GID table.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:35:12 -05:00
Matan Barak 6020d7e500 IB/core: Move rdma_is_upper_dev_rcu to header file
In order to validate the route, we need an easy way to check if a
net-device belongs to our RDMA device. Move this helper function
to a header file in order to make this check easier.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:35:12 -05:00
Somnath Kotur c865f24628 IB/core: Add rdma_network_type to wc
Providers should tell IB core the wc's network type.
This is used in order to search for the proper GID in the
GID table. When using HCAs that can't provide this info,
IB core tries to deep examine the packet and extract
the GID type by itself.

We choose sgid_index and type from all the matching entries in
RDMA-CM based on hint from the IP stack and we set hop_limit for
the IP packet based on above hint from IP stack.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Somnath Kotur <Somnath.Kotur@Avagotech.Com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:35:11 -05:00
Matan Barak 7766a99fdc IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
Adding RoCE v2 GID type and port type. Vendors
which support this type will get their GID table
populated with RoCE v2 GIDs automatically.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 10:35:11 -05:00