linux

Commit Graph

Author	SHA1	Message	Date
Kaike Wan	2deeb47729	IB/sa: Fix netlink local service GFP crash The rdma netlink local service registers a handler to handle RESOLVE response and another handler to handle SET_TIMEOUT request. The first thing these handlers do is to call netlink_capable() to check the access right of the received skb to make sure that the sender has root access. Under normal conditions, such responses and requests will be directly forwarded to the handlers without going through the netlink_dump pathway (see ibnl_rcv_msg() in drivers/infiniband/core/netlink.c). However, a user application could send a RESOLVE request (not response) to the local service, which will fall into the netlink_dump pathway, where a new skb will be created without initializing the control block. This new skb will be eventually forwarded to the local service RESOLVE response handler. Unfortunately, netlink_capable() will cause general protection fault if the skb's control block is not initialized. This patch will address the problem by checking the skb first. Signed-off-by: Kaike Wan <kaike.wan@intel.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-21 11:59:19 -05:00
Linus Torvalds	71e4634e00	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "The highlights this round include: - Introduce configfs support for unlocked configfs_depend_item() (krzysztof + andrezej) - Conversion of usb-gadget target driver to new function registration interface (andrzej + sebastian) - Enable qla2xxx FC target mode support for Extended Logins (himansu + giridhar) - Enable qla2xxx FC target mode support for Exchange Offload (himansu + giridhar) - Add qla2xxx FC target mode irq affinity notification + selective command queuing. (quinn + himanshu) - Fix iscsi-target deadlock in se_node_acl configfs deletion (sagi + nab) - Convert se_node_acl configfs deletion + se_node_acl->queue_depth to proper se_session->sess_kref + target_get_session() usage. (hch + sagi + nab) - Fix long-standing race between se_node_acl->acl_kref get and get_initiator_node_acl() lookup. (hch + nab) - Fix target/user block-size handling, and make sure netlink reaches all network namespaces (sheng + andy) Note there is an outstanding bug-fix series for remote I_T nexus port TMR LUN_RESET has been posted and still being tested, and will likely become post -rc1 material at this point" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (56 commits) scsi: qla2xxxx: avoid type mismatch in comparison target/user: Make sure netlink would reach all network namespaces target: Obtain se_node_acl->acl_kref during get_initiator_node_acl target: Convert ACL change queue_depth se_session reference usage iscsi-target: Fix potential dead-lock during node acl delete ib_srpt: Convert acl lookup to modern get_initiator_node_acl usage tcm_fc: Convert acl lookup to modern get_initiator_node_acl usage tcm_fc: Wait for command completion before freeing a session target: Fix a memory leak in target_dev_lba_map_store() target: Support aborting tasks with a 64-bit tag usb/gadget: Remove set-but-not-used variables target: Remove an unused variable target: Fix indentation in target_core_configfs.c target/user: Allow user to set block size before enabling device iser-target: Fix non negative ERR_PTR isert_device_get usage target/fcoe: Add tag support to tcm_fc qla2xxx: Check for online flag instead of active reset when transmitting responses qla2xxx: Set all queues to 4k qla2xxx: Disable ZIO at start time. qla2xxx: Move atioq to a different lock to reduce lock contention ...	2016-01-20 17:20:53 -08:00
Sagi Grimberg	f9a6ed62c4	IB/srpt: Remove redundant wc array No usage after the conversion to the new CQ API. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 16:40:31 -05:00
Mike Marciniszyn	967bcfc0f5	IB/qib: Improve ipoib UD performance Based on profiling, UD performance drops in case of processes in a single client due to excess context switches when the progress workqueue is scheduled. This is solved by modifying the heuristic to select the direct progress instead of the scheduling progress via the workqueue when UD-like situations are detected in the heuristic. Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:41:16 -05:00
Matan Barak	4ed088e6c2	IB/mlx4: Advertise RoCE v2 support Advertise RoCE v2 support in port_immutable attributes according to the hardware's capabilities. This enables the verbs stack to use RoCE v2 mode. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:01 -05:00
Moni Shoua	e1b866c677	IB/mlx4: Create and use another QP1 for RoCEv2 The mlx4 driver uses a special QP to implement the GSI QP. This kind of QP allows to build the InfiniBand headers in software. When mlx4 hardware builds the packet, it calculates the ICRC and puts it at the end of the payload. However, this ICRC calculation depends on the QP configuration, which is determined when the QP is modified (roce_mode during INIT->RTR). When receiving a packet, the ICRC verification doesn't depend on this configuration. Therefore, using two GSI QPs for send (one for each RoCE version) and one GSI QP for receive are required. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:01 -05:00
Moni Shoua	3ef967a4af	IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers RoCEv2 packets are sent over IP/UDP protocols. The mlx4 driver uses a type of RAW QP to send packets for QP1 and therefore needs to build the network headers below BTH in software. This patch adds option to build QP1 packets with IP and UDP headers if RoCEv2 is requested. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:01 -05:00
Moni Shoua	71a39bbbfc	IB/mlx4: Enable RoCE v2 when the IB device is added If the hardware supports RoCE v2, we configure the hardware UDP port according to the RoCE v2 Annex when mlx4_ib device is added. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:01 -05:00
Moni Shoua	3b5daf28ac	IB/mlx4: Support modify_qp for RoCE v2 In order to support modify_qp for RoCE v2, we need to set the gid_type in the QP context. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:01 -05:00
Moni Shoua	7e57b85c44	IB/mlx4: Add support for setting RoCEv2 gids in hardware To tell hardware about a gid with type RoCEv2, software needs a new modifier to the SET_PORT command: MLX4_SET_PORT_ROCE_ADDR. This can replace the old method, MLX4_SET_PORT_GID_TABLE, for RoCEv1 gids. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:00 -05:00
Moni Shoua	b699a859d1	IB/mlx4: Add gid_type to GID properties IB core driver adds a property of type to struct ib_gid_attr. The mlx4 driver should take that in consideration when modifying or querying the hardware gid table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:00 -05:00
Matan Barak	c3efe7500a	IB/core: Use hop-limit from IP stack for RoCE Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as hop limit values. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:26:56 -05:00
Matan Barak	f7f4b23e27	IB/core: Rename rdma_addr_find_dmac_by_grh rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and downsteram patch will also add hop_limit as an output parameter, thus we rename it to rdma_addr_find_l2_eth_by_grh. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:26:55 -05:00
Bart Van Assche	4bfdf635c6	IB/cm: Fix a recently introduced deadlock ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock that can be locked from inside an interrupt handler. Hence do not enable interrupts inside cm_enter_timewait() if called with interrupts disabled. This patch fixes e.g. the following deadlock: Acked-by: Erez Shitrit <erezsh@mellanox.com> ================================= [ INFO: inconsistent lock state ] 4.4.0-rc7+ #1 Tainted: G E --------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes: (&(&cm_id_priv->lock)->rlock){?.+...}, at: [<ffffffffa036eec4>] cm_establish+0x 74/0x1b0 [ib_cm] {HARDIRQ-ON-W} state was registered at: [<ffffffff810a3c11>] mark_held_locks+0x71/0x90 [<ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0 [<ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10 [<ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40 [<ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm] [<ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm] [<ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp] [<ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm] [<ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm] [<ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm] [<ffffffff8107184d>] process_one_work+0x1bd/0x460 [<ffffffff81073148>] worker_thread+0x118/0x420 [<ffffffff81078454>] kthread+0xe4/0x100 [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70 irq event stamp: 1672286 hardirqs last enabled at (1672283): [<ffffffff81408ec0>] poll_idle+0x10/0x80 hardirqs last disabled at (1672284): [<ffffffff8151d304>] common_interrupt+0x84/0x89 softirqs last enabled at (1672286): [<ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50 softirqs last disabled at (1672285): [<ffffffff8105b697>] irq_enter+0x47/0x70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&cm_id_priv->lock)->rlock); <Interrupt> lock(&(&cm_id_priv->lock)->rlock); * DEADLOCK * no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G E 4.4.0-rc7+ #1 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014 ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007 0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8 ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001 Call Trace: <IRQ> [<ffffffff81251c1b>] dump_stack+0x4f/0x74 [<ffffffff810a32f4>] print_usage_bug+0x184/0x190 [<ffffffff810a36e2>] mark_lock_irq+0xf2/0x290 [<ffffffff810a3995>] mark_lock+0x115/0x1b0 [<ffffffff810a3b8c>] mark_irqflags+0x15c/0x170 [<ffffffff810a4fef>] __lock_acquire+0x1ef/0x560 [<ffffffff810a53c2>] lock_acquire+0x62/0x80 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60 [<ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm] [<ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm] [<ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt] [<ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib] [<ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core] [<ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core] [<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core] [<ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110 [<ffffffff810b68bf>] handle_irq_event+0x3f/0x70 [<ffffffff810ba7f9>] handle_edge_irq+0x79/0x120 [<ffffffff81007f3d>] handle_irq+0x5d/0x130 [<ffffffff810071fd>] do_IRQ+0x6d/0x130 [<ffffffff8151d309>] common_interrupt+0x89/0x89 <EOI> [<ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200 [<ffffffff81408aa2>] cpuidle_enter+0x12/0x20 [<ffffffff810990d6>] call_cpuidle+0x36/0x60 [<ffffffff81099163>] cpuidle_idle_call+0x63/0x110 [<ffffffff8109930a>] cpu_idle_loop+0xfa/0x130 [<ffffffff8109934e>] cpu_startup_entry+0xe/0x10 [<ffffffff8103c443>] start_secondary+0x83/0x90 Fixes: commit `be4b499323` ("IB/cm: Do not queue work to a device that's going away") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:26:55 -05:00
Bart Van Assche	19f57298f0	IB/srpt: Fix the RDMA completion handlers Avoid that the following kernel crash is triggered when processing an RDMA completion: BUG: unable to handle kernel paging request at 0000000100000198 IP: [<ffffffff810a4ea2>] __lock_acquire+0xa2/0x560 Call Trace: [<ffffffff810a53c2>] lock_acquire+0x62/0x80 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60 [<ffffffffa04fd437>] srpt_rdma_read_done+0x57/0x120 [ib_srpt] [<ffffffffa0144dd3>] __ib_process_cq+0x43/0xc0 [ib_core] [<ffffffffa0145115>] ib_cq_poll_work+0x25/0x70 [ib_core] [<ffffffff8107184d>] process_one_work+0x1bd/0x460 [<ffffffff81073148>] worker_thread+0x118/0x420 [<ffffffff81078454>] kthread+0xe4/0x100 [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70 Fixes: commit `59fae4deaa` ("IB/srpt: chain RDMA READ/WRITE requests"). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:26:55 -05:00
Matan Barak	9506902b7b	IB/core: Fix dereference before check Sparse complains about dereference before check. Fixing this by moving the check before the dereference. Fixes: `200298326b` ('IB/core: Validate route when we init ah') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:26:54 -05:00
Matan Barak	2e2cdace5a	IB/core: Eliminate sparse false context imbalance warning When write_gid function needs to do a sleep-able operation, it unlocks table->rwlock and then relocks it. Sparse complains about context imbalance. This is safe as write_gid is always called with table->rwlock. write_gid protects from simultaneous writes to this GID entry by setting the GID_TABLE_ENTRY_INVALID flag. Fixes: `9c584f0495` ('IB/core: Change per-entry lock in RoCE GID table to one lock') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:26:21 -05:00
Hal Rosenstock	6e2a51a0f7	IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling Port number is not part of ClassPortInfo attribute but is still needed as a parameter when invoking process_mad. To properly handle this attribute, port_num is added as a parameter to get_counter_table and get_perf_mad was changed not to store port_num in the attribute itself when it's querying the ClassPortInfo attribute. This handles issue pointed out by Matan Barak <matanb@dev.mellanox.co.il> Fixes: `145d9c5410` ('IB/core: Display extended counter set if available') Signed-off-by: Hal Rosenstock <hal@mellanox.com> Acked-by: Matan Barak <matanb@mellanox.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:26:20 -05:00
Bart Van Assche	b6aeb980f1	IB/core: Remove set-but-not-used variable from ib_sg_to_pages() Detected this by building the IB core with W=1. See also patch "IB core: Fix ib_sg_to_pages()" (commit `8f5ba10ed4`). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Leon Romanovsky <leon.romanovsky@mellanox.com> Acked-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:25:45 -05:00
Leon Romanovsky	c876a1b7dd	IB/mlx5: Fix passing casted pointer in mlx5_query_port_roce Fix static checker warning: drivers/infiniband/hw/mlx5/main.c:149 mlx5_query_port_roce() warn: passing casted pointer '&props->qkey_viol_cntr' to 'mlx5_query_nic_vport_qkey_viol_cntr()' 32 vs 16. Fixes: `3f89a643eb` ("IB/mlx5: Extend query_device/port to support RoCE") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:25:45 -05:00
Christoph Hellwig	d53e11fdf0	IB/mad: use CQ abstraction Remove the local workqueue to process mad completions and use the CQ API instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:25:45 -05:00
Christoph Hellwig	ca281265c0	IB/mad: pass ib_mad_send_buf explicitly to the recv_handler Stop abusing wr_id and just pass the parameter explicitly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:25:36 -05:00
Lucas Tanure	39f426553e	infiniband: Replace memset with eth_zero_addr Use eth_zero_addr to assign the zero address to the given address array instead of memset when second argument is address of zero. Signed-off-by: Lucas Tanure <tanure@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:24:54 -05:00
Leon Romanovsky	50ca6ed21e	IB/mlx5: Delete locally redefined variable Fix the following sparse warning: drivers/infiniband/hw/mlx5/main.c:1061:29: warning: symbol 'pfn' shadows an earlier one drivers/infiniband/hw/mlx5/main.c:1030:21: originally declared here Fixes: `d69e3bcf79` ('IB/mlx5: Mmap the HCA's core clock register to user-space') Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:24:54 -05:00
Moni Shoua	1049f13816	IB/mlx4: Take source mac from AH instead from the port In commit `dbf727de74` ("IB/core: Use GID table in AH creation and dmac resolution") we copy source mac to mlx4_ah from the attributes of gid at ib_ah_attr.grh.sgid_index. Now we can use it. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:24:53 -05:00
Matan Barak	4e40816734	IB/mlx4: Initialize hop_limit when creating address handle Hop limit value wasn't copied from attributes when ah was created. This may influence packets for unconnected services to get dropped in routers when endpoints are not in the same subnet. Fixes: `fa417f7b52` ("IB/mlx4: Add support for IBoE") Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:24:53 -05:00
Leon Romanovsky	9f17768611	IB/mlx5: Expose correct maximum number of CQE capacity Maximum number of EQE capacity per CQ was mistakenly exposed as CQE. Fix that. Fixes: `938fe83c8d` ("net/mlx5_core: New device capabilities handling") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Cc: <stable@vger.kernel.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:24:53 -05:00
Hariprasad S	28de1f7437	iw_cxgb4: Take clip reference before starting IPv6 listen The h/w is designed in such a way that, if you do anything IPv6 related, a valid clip entry must be there. So take clip reference before creating IPv6 listening servers, and then if we fail to create server, release the clip entry. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:23:40 -05:00
Hariprasad S	4275a5b200	iw_cxgb4: Fixes GW-Basic labels to meaningful error names Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:17:40 -05:00
Hariprasad S	82b1df1b08	iw_cxgb4: Fixes static checker warning in c4iw_rdev_open() Commit `c5dfb000b9` ("iw_cxgb4: Pass qid range to user space driver") from Dec 11, 2015, leads to the following static checker warning: drivers/infiniband/hw/cxgb4/device.c:857 c4iw_rdev_open() warn: variable dereferenced before check 'rdev->status_page' Also we weren't deallocating ocqp pool in error path when failed to allocate status page. Fixing it too. Fixes: `c5dfb000b9` ("iw_cxgb4: Pass qid range to user space driver") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:17:40 -05:00
Dan Carpenter	a7d0e959fa	IB/cma: allocating too much memory in make_cma_ports() The issue here is that there is a cut and paste bug. When we allocate cma_dev_group->default_ports_group we use "sizeof(cma_dev_group->ports)" instead of "sizeof(cma_dev_group->default_ports_group)". We're bumping up against the 80 character limit so I introduced a new local pointer "ports_group" to get around that. Fixes: `045959db65` ('IB/cma: Add configfs for rdma_cm') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:17:40 -05:00
Dan Carpenter	bc1251e6d9	RDMA/nes: checking for NULL instead of IS_ERR nes_reg_phys_mr() returns ERR_PTRs on error. It doesn't return NULL. This bug has been there for a while, but we recently changed from calling a function pointer to calling nes_reg_phys_mr() directly so now Smatch is able to detect the bug. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:17:40 -05:00
Vinit Agnihotri	fbbeb8632b	IB/qib: Support creating qps with GFP_NOIO flag The current code is problematic when the QP creation and ipoib is used to support NFS and NFS desires to do IO for paging purposes. In that case, the GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight memory situations. This fix adds support to create queue pair with GFP_NOIO flag for connected mode only to cleanly fail the create queue pair in those situations. Cc: <stable@vger.kernel.org> # 3.16+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:17:40 -05:00
Ira Weiny	65487fdc0c	IB/sysfs: Fix sparse warning on attr_id Attributed ID was declared as an int while the value should really be big endian 16. Fixes: `35c4cbb178` ("IB/core: Create get_perf_mad function in sysfs.c") Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 14:12:56 -05:00
Devesh Sharma	3b1ea43009	RDMA/ocrdma: Depend on async link events from CNA Recently Dough Ledford reported a deadlock happening between ocrdma-load sequence and NetworkManager service issuing "open" on be2net interface. The deadlock happens when any be2net hook (e.g. open/close) is called in parallel to insmod ocrdma.ko. A. be2net is sending administrative open/close event to ocrdma holding device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net. So sequence of locks is rtnl_lock---> device_list lock B. When new ocrdma roce device gets registered, infiniband stack now takes rtnl_lock in ib_register_device() in GID initialization routines. So sequence of locks in this path is device_list lock ---> rtnl_lock. This improper locking sequence causes deadlock. With this patch we stop using administrative open and close events injected by be2net driver. These events were used to dispatch PORT_ACTIVE and PORT_ERROR events to the IB-stack. This patch implements a logic to receive async-link-events generated from CNA whenever link-state-change is detected. Now on, these async-events will be used to dispatch PORT_ACTIVE and PORT_ERROR events to IB-stack. Depending on async-events from CNA removes the need to hold device-list-mutex and thus breaks the busy-wait scenario. Reported-by: Doug Ledford <dledford@redhat.com> CC: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 14:00:47 -05:00
Devesh Sharma	d310a344e1	RDMA/ocrdma: Dispatch only port event when port state changes Dispatch only port event to IB stack when port state changes. Don't explicitly modify qps to error. Let application listen to port events on async event queue or let QP fail with retry-exceeded completion error. Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 14:00:47 -05:00
Devesh Sharma	a2addf94a8	RDMA/ocrdma: Fix vlan-id assignment in qp parameters vlan-id is wrongly getting as 0 when PFC is enabled. Set vlan-id configured by user in QP parameters. In case vlan interface is not used, flash a warning to user to configure vlan and assign vlan-id as 0 in qp params. Fixes: `dbf727de74` ('IB/core: Use GID table in AH creation and dmac resolution') Cc: Matan Barak <matanb@mellanox.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 14:00:47 -05:00
Matan Barak	649367735e	IB/cma: Fix RDMA port validation for iWarp cma_validate_port wrongly assumed that Ethernet devices are RoCE devices and thus their ndev should be matched in the GID table. This broke the iWarp support. Fixing that matching the ndev only if we work on a RoCE port. Cc: <stable@vger.kernel.org> # 4.4.x- Fixes: `abae1b71dd` ('IB/cma: cma_validate_port should verify the port and netdevice') Reported-by: Hariprasad Shenai <hariprasad@chelsio.com> Tested-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 13:33:47 -05:00
Mike Marciniszyn	09dc9cd652	IB/qib: fix mcast detach when qp not attached The code produces the following trace: [1750924.419007] general protection fault: 0000 [#3] SMP [1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4 dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core [1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D 3.13.0-39-generic #66-Ubuntu [1750924.420364] Hardware name: Dell Computer Corporation PowerEdge 860/0XM089, BIOS A04 07/24/2007 [1750924.420364] task: ffff8800366a9800 ti: ffff88007af1c000 task.ti: ffff88007af1c000 [1750924.420364] RIP: 0010:[<ffffffffa0131d51>] [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50 [ib_qib] [1750924.420364] RSP: 0018:ffff88007af1dd70 EFLAGS: 00010246 [1750924.420364] RAX: 0000000000000001 RBX: ffff88007b822688 RCX: 000000000000000f [1750924.420364] RDX: ffff88007b822688 RSI: ffff8800366c15a0 RDI: 6764697200000000 [1750924.420364] RBP: ffff88007af1dd78 R08: 0000000000000001 R09: 0000000000000000 [1750924.420364] R10: 0000000000000011 R11: 0000000000000246 R12: ffff88007baa1d98 [1750924.420364] R13: ffff88003ecab000 R14: ffff88007b822660 R15: 0000000000000000 [1750924.420364] FS: 00007ffff7fd8740(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000 [1750924.420364] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1750924.420364] CR2: 00007ffff597c750 CR3: 000000006860b000 CR4: 00000000000007e0 [1750924.420364] Stack: [1750924.420364] ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429 000000007af1de20 [1750924.420364] ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70 ffffffffa00cb313 [1750924.420364] 00007fffffffde88 0000000000000000 0000000000000008 ffff88003ecab000 [1750924.420364] Call Trace: [1750924.420364] [<ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350 [ib_qib] [1750924.568035] [<ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0 [ib_uverbs] [1750924.568035] [<ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core] [1750924.568035] [<ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170 [ib_uverbs] [1750924.568035] [<ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs] [1750924.568035] [<ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20 [1750924.568035] [<ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0 [1750924.568035] [<ffffffff811bd214>] vfs_write+0xb4/0x1f0 [1750924.568035] [<ffffffff811bdc49>] SyS_write+0x49/0xa0 [1750924.568035] [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f [1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10 <f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f [1750924.568035] RIP [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50 [ib_qib] [1750924.568035] RSP <ffff88007af1dd70> [1750924.650439] ---[ end trace 73d5d4b3f8ad4851 ] The fix is to note the qib_mcast_qp that was found. If none is found, then return EINVAL indicating the error. Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 13:09:44 -05:00
Erez Shitrit	50be28de6f	IB/IPoIB: Fix kernel panic on multicast flow ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the parameter mcast->dev. That mcast is a temporary (used as an iterator) variable that may be uninitialized. There is no need to send the variable dev to the function, as each mcast has its dev as a member in the mcast struct. This causes the next panic: RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib] RSP: 0018: EFLAGS: 00010246 RAX: f0201 RBX: 24e00 RCX: 00000 .... .... Stack: Call Trace: ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib] ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib] process_one_work+0x164/0x470 worker_thread+0x11d/0x420 ... Fixes: `5a0e81f6f4` ('IB/IPoIB: factor out common multicast list removal code') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reported-by: Doron Tsur <doront@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 12:59:54 -05:00
Doron Tsur	0b6e26ce89	net/mlx5_core: Fix trimming down IRQ number With several ConnectX-4 cards installed on a server, one may receive irqn > 255 from the kernel API, which we mistakenly trim to 8bit. This causes EQ creation failure with the following stack trace: [<ffffffff812a11f4>] dump_stack+0x48/0x64 [<ffffffff810ace21>] __setup_irq+0x3a1/0x4f0 [<ffffffff810ad7e0>] request_threaded_irq+0x120/0x180 [<ffffffffa0923660>] ? mlx5_eq_int+0x450/0x450 [mlx5_core] [<ffffffffa0922f64>] mlx5_create_map_eq+0x1e4/0x2b0 [mlx5_core] [<ffffffffa091de01>] alloc_comp_eqs+0xb1/0x180 [mlx5_core] [<ffffffffa091ea99>] mlx5_dev_init+0x5e9/0x6e0 [mlx5_core] [<ffffffffa091ec29>] init_one+0x99/0x1c0 [mlx5_core] [<ffffffff812e2afc>] local_pci_probe+0x4c/0xa0 Fixing it by changing of the irqn type from u8 to unsigned int to support values > 255 Fixes: `61d0e73e0a` ('net/mlx5_core: Use the the real irqn in eq->irqn') Reported-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-17 12:08:04 -05:00
Linus Torvalds	7d1fc01afc	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: floppy: make local variable non-static exynos: fixes an incorrect header guard dt-bindings: fixes some incorrect header guards cpufreq-dt: correct dead link in documentation cpufreq: ARM big LITTLE: correct dead link in documentation treewide: Fix typos in printk Documentation: filesystem: Fix typo in fs/eventfd.c fs/super.c: use && instead of & for warn_on condition Documentation: fix sysfs-ptp lib: scatterlist: fix Kconfig description	2016-01-14 17:04:19 -08:00
Nicholas Bellinger	f246c94154	ib_srpt: Convert acl lookup to modern get_initiator_node_acl usage This patch does a simple conversion of ib_srpt code to use proper modern core_tpg_get_initiator_node_acl() lookup using se_node_acl->acl_kref, and drops the legacy internal list usage from srpt_lookup_acl(). This involves doing transport_init_session() earlier, and making sure transport_free_session() is called during a se_node_acl lookup failure to drop the last ->acl_kref. Also, it adds a minor backwards-compat hack to avoid the potential for user-space wrt node-acl WWPN formatting by simply stripping off '0x' prefix from ch->sess_name, and retrying once if core_tpg_get_initiator_node_acl() fails. Finally, go ahead and drop port_acl_list port_acl_lock since they are no longer used. Cc: Vu Pham <vu@mellanox.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2016-01-12 23:44:32 -08:00
Maor Gottlieb	038d2ef875	IB/mlx5: Add flow steering support Adding flow steering support by creating a flow-table per priority (if rules exist in the priority). mlx5_ib uses autogrouping and thus only creates the required destinations. Also includes adding of these flow steering utilities 1. Parsing verbs flow attributes hardware steering specs. 2. Check if flow is multicast - this is required in order to decide to which flow table will we add the steering rule. 3. Set outer headers in flow match criteria to zeros. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:48:53 -05:00
Nicholas Bellinger	373a4cd737	iser-target: Fix non negative ERR_PTR isert_device_get usage As reported by Dan, isert_create_device_ib_res() failure within isert_device_get() can potentially return a postive value, resulting in ERR_PTR() triggering a NULL pointer dereference. Caught by the static checker: drivers/infiniband/ulp/isert/ib_isert.c:423 isert_device_get() error: passing non negative 1 to ERR_PTR Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2016-01-07 13:57:51 -08:00
David S. Miller	c07f30ad68	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-12-31 18:20:10 -05:00
Devesh Sharma	10a214dc99	RDMA/ocrdma: Depend on async link events from CNA Recently Dough Ledford reported a deadlock happening between ocrdma-load sequence and NetworkManager service issuing "open" on be2net interface. The deadlock happens when any be2net hook (e.g. open/close) is called in parallel to insmod ocrdma.ko. A. be2net is sending administrative open/close event to ocrdma holding device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net. So sequence of locks is rtnl_lock---> device_list lock B. When new ocrdma roce device gets registered, infiniband stack now takes rtnl_lock in ib_register_device() in GID initialization routines. So sequence of locks in this path is device_list lock ---> rtnl_lock. This improper locking sequence causes deadlock. With this patch we stop using administrative open and close events injected by be2net driver. These events were used to dispatch PORT_ACTIVE and PORT_ERROR events to the IB-stack. This patch implements a logic to receive async-link-events generated from CNA whenever link-state-change is detected. Now on, these async-events will be used to dispatch PORT_ACTIVE and PORT_ERROR events to IB-stack. Depending on async-events from CNA removes the need to hold device-list-mutex and thus breaks the busy-wait scenario. Reported-by: Doug Ledford <dledford@redhat.com> CC: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-28 11:45:54 -05:00
Devesh Sharma	36ac0db0db	RDMA/ocrdma: Dispatch only port event when port state changes Dispatch only port event to IB stack when port state changes. Don't explicitly modify qps to error. Let application listen to port events on async event queue or let QP fail with retry-exceeded completion error. Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-28 11:45:54 -05:00
Devesh Sharma	c6002d5602	RDMA/ocrdma: Fix vlan-id assignment in qp parameters vlan-id is wrongly getting as 0 when PFC is enabled. Set vlan-id configured by user in QP parameters. In case vlan interface is not used, flash a warning to user to configure vlan and assign vlan-id as 0 in qp params. Fixes: `dbf727de74` ('IB/core: Use GID table in AH creation and dmac resolution') Cc: Matan Barak <matanb@mellanox.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-28 11:45:54 -05:00
Jenny Derzhavetz	59caaed7a7	IB/iser: Support the remote invalidation exception Declare that we support remote invalidation in case we are: 1. using fastreg method 2. always registering memory Detect the invalidated rkey from the work completion info so we won't invalidate it locally. The spec mandates that we must not rely on the target remote invalidate our rkey so we must check it upon a receive (scsi response) completion. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-26 19:27:10 -05:00
Sagi Grimberg	e26d2d21ff	IB/iser: Change the increment rkey flow logic When we enable remote invalidate support we won't want to perform local invalidates at the same time we do today, but we still need to get new rkeys. So, decouple the rkey update from the local invalidate and tie it to memory reg instead. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:36 -05:00
Jenny Derzhavetz	422bd0acb0	IB/isert: Support the remote invalidation exception We'll use remote invalidate, according to negotiation result during connection establishment. If the initiator declared that it supports the remote invalidate exception and the local HCA supports IB_DEVICE_MEM_MGT_EXTENSIONS then the target will use IB_WR_SEND_WITH_INV with the correct rkey for the response. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:36 -05:00
Jenny Derzhavetz	13bce4821f	IB/isert: Declare correct flags when accepting a connection iser target does not support zero based virtual addresses and send with invalidate, so it should declare that it doesn't. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:35 -05:00
Sagi Grimberg	c64941533f	IB/isert: Remove unused file iser_proto.h We don't need iser_proto.h anymore, remove it and move (non-protocol) declarations to ib_isert.h Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:35 -05:00
Sagi Grimberg	d3cf81f9c8	IB/iser,isert: Create and use new shared header The iser RDMA_CM negotiation protocol is shared by the initiator and the target, so have a shared header for the defines and structure. Move relevant items from the initiator and target headers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:35 -05:00
Jenny Derzhavetz	1caa70d8a7	IB/iser: set intuitive values for mr_valid This parameter is described as "is mr valid indicator". In other words, it indicates whether memory registration is valid or not. So intuitive values would be: mr_valid=True, when memory registration is valid and mr_valid=False otherwise. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:34 -05:00
Jenny Derzhavetz	b5f04b00f7	IB/iser: Don't register memory for all immediate data writes When all the task data is sent as immediate data, we are allowed to use the local_dma_lkey as it is not sent to the wire. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:34 -05:00
Sagi Grimberg	bfe066e256	IB/iser: Reuse ib_sg_to_pages We have in iser iser_sg_to_page_vec which has exactly the same role as ib_sg_to_pages. Customize the page_vec to hold a fake MR so we can reuse ib_sg_to_pages. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:34 -05:00
Roi Dayan	08ff089b12	IB/iser: Fix module init not cleaning up on error flow Destroy workqueue on transport register error, also release kmem cache on workqueue allocation error. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:33 -05:00
Julia Lawall	46e741f410	IB/core: constify mmu_notifier_ops structures This mmu_notifier_ops structure is never modified, so declare it as const, like the other mmu_notifier_ops structures. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:33 -05:00
Julia Lawall	2392a4cdcb	IB/iser: constify iser_reg_ops structure The iser_reg_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:33 -05:00
Julia Lawall	c419874c44	RDMA/nes: constify nes_cm_ops structure The nes_cm_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:32 -05:00
Bodong Wang	88115fe7a0	IB/mlx5: report tx/rx checksum cap in query results This patch will report the tx/rx checksum cap for raw qp via the query device results. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:32 -05:00
Leon Romanovsky	ee37095044	IB/mlx4: Convert kmalloc to kmalloc_array for checkpatch Convert kmalloc to be kmalloc_array to fix warnings below: WARNING: Prefer kmalloc_array over kmalloc with multiply + qp->sq.wrid = kmalloc(qp->sq.wqe_cnt * sizeof(u64), WARNING: Prefer kmalloc_array over kmalloc with multiply + qp->rq.wrid = kmalloc(qp->rq.wqe_cnt * sizeof(u64), WARNING: Prefer kmalloc_array over kmalloc with multiply + srq->wrid = kmalloc(srq->msrq.max * sizeof(u64), Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:32 -05:00
Leon Romanovsky	9afc60dcc4	IB/mlx4: Suppress non-fatal memory allocations Failure in kmalloc memory allocations will throw a warning about it. Such warnings are not needed anymore, since in commit `0ef2f05c7e` ("IB/mlx4: Use vmalloc for WR buffers when needed"), fallback mechanism from kmalloc() to __vmalloc() was added. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:31 -05:00
Eran Ben Elisha	da7525d2a9	IB/mlx5: Advertise atomic capabilities in query device In order to ensure IB spec atomic correctness in atomic operations, if HW is configured to host endianness, advertise IB_ATOMIC_HCA. if not, advertise IB_ATOMIC_NONE. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:31 -05:00
Hariprasad S	67f1aee6f4	iw_cxgb3: Fix incorrectly returning error on success The cxgb3_*_send() functions return NET_XMIT_ values, which are positive integers values. So don't treat positive return values as an error. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:30 -05:00
Hariprasad S	c5dfb000b9	iw_cxgb4: Pass qid range to user space driver Enhances the t4_dev_status_page to pass the qid start and size attributes from iw_cxgb4 to libcxgb4. Bump the ABI Version to 3 -> To allow libcxgb4 to detect old drivers and revert to the old way of computing the qid ranges. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:30 -05:00
Dean Luick	0d6ed314de	IB/mad: Ensure fairness in ib_mad_completion_handler It was found that when a process was rapidly sending MADs other processes could be hung in their unregister calls. This would happen when process A was injecting packets fast enough that the single threaded workqueue was never exiting ib_mad_completion_handler. Therefore when process B called flush_workqueue via the unregister call it would hang until process A stopped sending MADs. The fix is to periodically reschedule ib_mad_completion_handler after processing a large number of completions. The number of completions chosen was decided based on the defaults for the recv queue size. However, it was kept fixed such that increasing those queue sizes would not adversely affect fairness in the future. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-24 00:17:30 -05:00
Leon Romanovsky	051f263098	IB/mlx5: Add driver cross-channel support Add support of cross-channel functionality to mlx5 driver. This includes ability to ignore overrun for CQ which intended for cross-channel, export device capability and configure the QP to be sync master/slave queues. The cross-channel enabled QP supports combination of three possible properties: * WQE processing on the receive queue of this QP * WQE processing on the send queue of this QP * WQE are supported on the send queue Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 23:33:14 -05:00
Leon Romanovsky	8a06ce59a4	IB/core: Add cross-channel support The cross-channel feature allows to execute WQEs that involve synchronization of I/O operations’ on different QPs. This capability enables to program complex flows with a single function call, hereby significantly reducing overhead associated with I/O processing. Cross-channel operations support is indicated by HCA capability information. The queue pairs can be configured to work as a “sync master queue” or “sync slave queues”. The added flags are: 1. Device capability flag IB_DEVICE_CROSS_CHANNEL for the devices that can perform cross-channel operations. 2. CQ property flag IB_CQ_FLAGS_IGNORE_OVERRUN to disable CQ overrun check. This check is useless in cross-channel scenario. 3. QP property flags to indicate if queues are slave or master: * IB_QP_CREATE_MANAGED_SEND indicates that posted send work requests will not be executed immediately and requires enabling. * IB_QP_CREATE_MANAGED_RECV indicates that posted receive work requests will not be executed immediately and requires enabling. * IB_QP_CREATE_CROSS_CHANNEL declares the QP to work in cross-channel mode. If IB_QP_CREATE_MANAGED_SEND and IB_QP_CREATE_MANAGED_RECV are not provided, this QP will be sync master queue, else it will be sync slave. Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 23:33:14 -05:00
Matan Barak	d69e3bcf79	IB/mlx5: Mmap the HCA's core clock register to user-space In order to read the HCA's current cycles register, we need to map it to user-space. Add support to map this register via mmap command. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 23:25:59 -05:00
Matan Barak	b368d7cb8c	IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext Pass hca_core_clock_offset to user-space is mandatory in order to let the user-space read the free-running clock register from the right offset in the memory mapped page. Passing this value is done by changing the vendor's command and response of init_ucontext to be in extensible form. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 23:25:59 -05:00
Matan Barak	7c60bcbb68	IB/mlx5: Add support for hca_core_clock and timestamp_mask Reporting the hca_core_clock (in kHZ) and the timestamp_mask in query_device extended verb. timestamp_mask is used by users in order to know what is the valid range of the raw timestamps, while hca_core_clock reports the clock frequency that is used for timestamps. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 23:25:59 -05:00
Matan Barak	972ecb8213	IB/mlx5: Add create_cq extended command In order to create a CQ that supports timestamp, mlx5 needs to support the extended create CQ command with the timestamp flag. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 23:23:02 -05:00
Christoph Lameter	145d9c5410	IB/core: Display extended counter set if available Check if the extended counters are available and if so create the proper extended and additional counters. Signed-off-by: Christoph Lameter <cl@linux.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 15:58:30 -05:00
Christoph Lameter	b2788ce575	IB/core: Specify attribute_id in port_table_attribute Add the attr_id on port_table_attribute since we will have to add a different port_table_attribute for the extended attribute soon. Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Christoph Lameter <cl@linux.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 15:58:30 -05:00
Christoph Lameter	35c4cbb178	IB/core: Create get_perf_mad function in sysfs.c Create a new function to retrieve performance management data from the existing code in get_pma_counter(). Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Christoph Lameter <cl@linux.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 15:58:30 -05:00
Christoph Hellwig	ab67ed8de0	IB: remove the write-only usecnt field from struct ib_mr Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 14:29:06 -05:00
Christoph Hellwig	f64054d850	nes: simplify nes_reg_phys_mr calling conventions Just pass and address/size pair instead of an ib_phys_buf array. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 14:29:05 -05:00
Christoph Hellwig	35cb3fc026	cxgb3: simplify iwch_get_dma_wr Fold simplified versions of build_phys_page_list and iwch_register_phys_mem into iwch_get_dma_wr now that no other callers are left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 14:29:04 -05:00
Christoph Hellwig	feb7c1e38b	IB: remove in-kernel support for memory windows Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 14:29:04 -05:00
Christoph Hellwig	b7d3e0a94f	IB: remove support for phys MRs We have stopped using phys MRs in the kernel a while ago, so let's remove all the cruft used to implement them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-By: Devesh Sharma<devesh.sharma@avagotech.com> [ocrdma] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 14:29:04 -05:00
Christoph Hellwig	a4d825a01e	IB: remove ib_query_mr This functionality has no users and was only supported by the staged out EHCA driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 14:29:03 -05:00
Christoph Lameter	432c55fff4	IB/IPoIB: Move multicast specific code out of ipoib_main.c Code cleanup to move multicast specific code that checks for a sendonly join to ipoib_multicast.c. This allows the removal of the export of __ipoib_mcast_find(). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 14:28:52 -05:00
Christoph Lameter	5a0e81f6f4	IB/IPoIB: factor out common multicast list removal code Code cleanup to remove multicast specific code from ipoib_main.c The removal of a list of multicast groups occurs in three places. Create a new function ipoib_mcast_remove_list(). Use this new function in ipoib_main.c too. That in turn allows the dropping of two functions that were exported from ipoib_multicast.c for expiration of mc groups. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 14:28:00 -05:00
Achiad Shochat	e53505a802	IB/mlx5: Support RoCE Advertise RoCE support for IB/core layer and set the hardware to work in RoCE mode. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 12:07:37 -05:00
Achiad Shochat	2811ba51b0	IB/mlx5: Add RoCE fields to Address Vector Set the address handle and QP address path fields according to the link layer type (IB/Eth). Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 12:07:37 -05:00
Achiad Shochat	3cca26069a	IB/mlx5: Support IB device's callbacks for adding/deleting GIDs These callbacks write into the mlx5 RoCE address table. Upon del_gid we write a zero'd GID. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 12:07:37 -05:00
Achiad Shochat	cb34be6da2	IB/mlx5: Set network_hdr_type upon RoCE responder completion When handling a responder completion, if the link layer is Ethernet, set the work completion network_hdr_type field according to CQE's info and the IB_WC_WITH_NETWORK_HDR_TYPE flag. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 12:07:37 -05:00
Achiad Shochat	3f89a643eb	IB/mlx5: Extend query_device/port to support RoCE Using the vport access functions to retrieve the Ethernet specific information and return this information in ib_query_device and ib_query_port. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 12:07:37 -05:00
Achiad Shochat	fc24fc5e95	IB/mlx5: Support IB device's callback for getting its netdev For Eth ports only: Maintain a net device pointer in mlx5_ib_device and update it upon NETDEV_REGISTER and NETDEV_UNREGISTER events if the net-device and IB device have the same PCI parent device. Implement the get_netdev callback to return this net device. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 12:07:36 -05:00
Achiad Shochat	ebd61f68e1	IB/mlx5: Support IB device's callback for getting the link layer Make the existing mlx5_ib_port_link_layer() signature match the ib device callback signature (add port_num parameter). Refactor it to use a sub function so that the link layer could be queried also before the ibdev is created. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 12:07:36 -05:00
Julia Lawall	59f65f495d	IB/usnic: delete unneeded IS_ERR test kzalloc doesn't return ERR_PTR, so there is no need to test for it. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,e; @@ * x = kzalloc(...) ... when != x = e * IS_ERR_OR_NULL(x) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Dave Goodell <dgoodell@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:50:58 -05:00
Nelson Escobar	96390f62aa	IB/usnic: Handle 0 counts in resource allocation Signed-off-by: Dave Goodell <dgoodell@cisco.com> Reviewed-by: Reese Faucette <rfaucett@cisco.com> Reviewed-by: Xuyang Wang <xuywang@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:50:58 -05:00
Nelson Escobar	dc92d14684	IB/usnic: Fix resource leak in error case Signed-off-by: Dave Goodell <dgoodell@cisco.com> Reviewed-by: Reese Faucette <rfaucett@cisco.com> Reviewed-by: Xuyang Wang <xuywang@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:50:58 -05:00
Nelson Escobar	89e5323c64	IB/usnic: Support more QP state transitions They were already implemented at a lower layer, but the upper level routine placed arbitrary restrictions on which transitions were permitted. Simplify the state machine logic to live wholly in usnic_ib_qp_grp_modify. Signed-off-by: Dave Goodell <dgoodell@cisco.com> Reviewed-by: Reese Faucette <rfaucett@cisco.com> Reviewed-by: Xuyang Wang <xuywang@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:50:58 -05:00
Nelson Escobar	2547a3663c	IB/usnic: Fix message typo Signed-off-by: Dave Goodell <dgoodell@cisco.com> Reviewed-by: Reese Faucette <rfaucett@cisco.com> Reviewed-by: Xuyang Wang <xuywang@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:50:58 -05:00
Nelson Escobar	3ea7286139	IB/usnic: Fix incorrect cast in usnic_ib_fw_string_to_u64 Signed-off-by: Christian Benvenuti <benve@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Reviewed-by: Dave Goodell <dgoodell@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:50:58 -05:00
Nelson Escobar	1e67a64e9c	IB/usnic: Improve a failure message Signed-off-by: Nelson Escobar <neescoba@cisco.com> Reviewed-by: Dave Goodell <dgoodell@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:50:57 -05:00
Nelson Escobar	638e970747	IB/usnic: Remove unused prototype query_protocol() was added in commit `6b90a6d66b` ("IB/Verbs: Implement new callback query_protocol()") and then removed in commit `f9b22e355d` ("IB/core: Convert core to use bitfield for caps"). This left behind an unused prototype. Signed-off-by: Nelson Escobar <neescoba@cisco.com> Reviewed-by: Dave Goodell <dgoodell@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:50:57 -05:00
Moni Shoua	bee3c3c918	IB/cma: Join and leave multicast groups with IGMP Since RoCEv2 is a protocol over IP header it is required to send IGMP join and leave requests to the network when joining and leaving multicast groups. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:39:53 -05:00
Moni Shoua	25f40220e5	IB/core: Initialize UD header structure with IP and UDP headers ib_ud_header_init() is used to format InfiniBand headers in a buffer up to (but not with) BTH. For RoCE UDP ENCAP it is required that this function would be able to build also IP and UDP headers. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:39:53 -05:00
Matan Barak	045959db65	IB/cma: Add configfs for rdma_cm Users would like to control the behaviour of rdma_cm. For example, old applications which don't set the required RoCE gid type could be executed on RoCE V2 network types. In order to support this configuration, we implement a configfs for rdma_cm. In order to use the configfs, one needs to mount it and mkdir <IB device name> inside rdma_cm directory. The patch adds support for a single configuration file, default_roce_mode. The mode can either be "IB/RoCE v1" or "RoCE v2". Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:39:52 -05:00
Matan Barak	218a773f76	IB/rdma_cm: Add wrapper for cma reference count Currently, cma users can't increase or decrease the cma reference count. This is necassary when setting cma attributes (like the default GID type) in order to avoid use-after-free errors. Adding cma_ref_dev and cma_deref_dev APIs. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:39:52 -05:00
Matan Barak	200298326b	IB/core: Validate route when we init ah In order to make sure API users don't try to use SGIDs which don't conform to the routing table, validate the route before searching the RoCE GID table. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:35:12 -05:00
Matan Barak	6020d7e500	IB/core: Move rdma_is_upper_dev_rcu to header file In order to validate the route, we need an easy way to check if a net-device belongs to our RDMA device. Move this helper function to a header file in order to make this check easier. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:35:12 -05:00
Somnath Kotur	c865f24628	IB/core: Add rdma_network_type to wc Providers should tell IB core the wc's network type. This is used in order to search for the proper GID in the GID table. When using HCAs that can't provide this info, IB core tries to deep examine the packet and extract the GID type by itself. We choose sgid_index and type from all the matching entries in RDMA-CM based on hint from the IP stack and we set hop_limit for the IP packet based on above hint from IP stack. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Somnath Kotur <Somnath.Kotur@Avagotech.Com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:35:11 -05:00
Matan Barak	7766a99fdc	IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type Adding RoCE v2 GID type and port type. Vendors which support this type will get their GID table populated with RoCE v2 GIDs automatically. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:35:11 -05:00
Matan Barak	470be516a2	IB/core: Add gid attributes to sysfs This patch set adds attributes of net device and gid type to each GID in the GID table. Users that use verbs directly need to specify the GID index. Since the same GID could have different types or associated net devices, users should have the ability to query the associated GID attributes. Adding these attributes to sysfs. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:35:11 -05:00
Matan Barak	cb57bb849e	IB/cm: Use the source GID index type Previosuly, cm and cma modules supported only IB and RoCE v1 GID type. In order to support multiple GID types, the gid_type is passed to cm_init_av_by_path and stored in the path record. The rdma cm client would use a default GID type that will be saved in rdma_id_private. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:35:10 -05:00
Matan Barak	b39ffa1df5	IB/core: Add gid_type to gid attribute In order to support multiple GID types, we need to store the gid_type with each GID. This is also aligned with the RoCE v2 annex "RoCEv2 PORT GID table entries shall have a "GID type" attribute that denotes the L3 Address type". The currently supported GID is IB_GID_TYPE_IB which is also RoCE v1 GID type. This implies that gid_type should be added to roce_gid_table meta-data. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:35:10 -05:00
Matan Barak	cee3c4d0c5	IB/core: don't search the GID table twice Previously, we've searched the GID table twice: first when we searched the table for a GID matching the proposed new one, and second when we didn't find a match, we searched again for an empty GID slot in the table. Instead, search the table once noting the first empty slot as we search for our target GID. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:32:06 -05:00
Matan Barak	9c584f0495	IB/core: Change per-entry lock in RoCE GID table to one lock Previously, IB GID cached used a lock per entry. This could result in spending a lot of CPU cycles for locking and unlocking just in order to find a GID. Changing this in favor of one lock per a GID table. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:19:54 -05:00
Matan Barak	f3906bd360	IB/core: Refactor GID cache's ib_dispatch_event Refactor ib_dispatch_event into a new function in order to avoid duplicating code in the next patch. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-23 10:19:54 -05:00
Wengang Wang	df4176677f	IB/mlx4: Replace kfree with kvfree in mlx4_ib_destroy_srq Commit `0ef2f05c7e` uses vmalloc for WR buffers when needed and uses kvfree to free the buffers. It missed changing kfree to kvfree in mlx4_ib_destroy_srq(). Reported-by: Matthew Finaly <matt@Mellanox.com> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-22 23:23:34 -05:00
Matan Barak	fac51590c1	IB/cma: cma_match_net_dev needs to take into account port_num Previously, cma_match_net_dev called cma_protocol_roce which tried to verify that the IB device uses RoCE protocol. However, if rdma_id wasn't bound to a port, then the check would occur against the first port of the device without regard to whether that port was even of the same type as the type of port the incoming packet was received on. Fix this by passing the port of the request and only checking against the same port of the device. Reported-by: Or Gerlitz <gerlitz.or@gmail.com> Fixes: `b8cab5dab1` ('IB/cma: Accept connection without a valid netdev on RoCE') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-22 23:22:50 -05:00
Doug Ledford	882f3b3b91	Merge branches '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5 Signed-off-by: Doug Ledford <dledford@redhat.com> Conflicts: drivers/infiniband/ulp/iser/iser_verbs.c	2015-12-22 17:03:15 -05:00
Or Gerlitz	182a2da0c7	IB/core: Remove ib_query_device The copy of the attributes present on the device is now used by all consumers except for uverbs in case of serving user-space query, where dev->query_device is called. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-22 17:01:40 -05:00
Or Gerlitz	4a061b287b	IB/ulps: Avoid calling ib_query_device Instead, use the cached copy of the attributes present on the device. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-22 14:39:00 -05:00
Or Gerlitz	86bee4c9c1	IB/core: Avoid calling ib_query_device Use the cached copy of the attributes present on the device, except for the case of a query originating from user-space, where we have to invoke the driver query_device entry, so they can fill in their udata. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-22 14:39:00 -05:00
Ira Weiny	3e153a93a1	IB/core: Save the device attributes on the device structure This way both the IB core and upper level drivers can access these cached device attributes rather than querying or caching them on their own. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-22 14:39:00 -05:00
David S. Miller	b3e0d3d7ba	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/geneve.c Here we had an overlapping change, where in 'net' the extraneous stats bump was being removed whilst in 'net-next' the final argument to udp_tunnel6_xmit_skb() was being changed. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-17 22:08:28 -05:00
Doug Ledford	c6333f9f9f	Merge branch 'rdma-cq.2' of git://git.infradead.org/users/hch/rdma into 4.5/rdma-cq Signed-off-by: Doug Ledford <dledford@redhat.com> Conflicts: drivers/infiniband/ulp/srp/ib_srp.c - Conflicts with changes in ib_srp.c introduced during 4.4-rc updates	2015-12-15 14:10:44 -05:00
Christoph Hellwig	cfeb91b375	IB/iser: Convert to CQ abstraction Use the new CQ abstraction to simplify completions in the iSER initiator. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-12-11 14:10:52 -08:00
Sagi Grimberg	7edc5a999d	IB/iser: Use helper for container_of Nicer this way. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-12-11 14:10:51 -08:00
Sagi Grimberg	0f512b34c6	IB/iser: Use a dedicated descriptor for login We'll need it later with the new CQ abstraction. also switch login bufs to void pointers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-12-11 14:10:50 -08:00
Christoph Hellwig	1dc7b1f10d	IB/srp: use the new CQ API This also moves recv completion handling from hardirq context into softirq context. Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-12-11 14:10:49 -08:00
Christoph Hellwig	59fae4deaa	IB/srpt: chain RDMA READ/WRITE requests Remove struct rdma_iu and instead allocate the struct ib_rdma_wr array early and fill out directly. This allows us to chain the WRs, and thus archives both less lock contention on the HCA workqueue as well as much simpler error handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>	2015-12-11 14:10:48 -08:00
Christoph Hellwig	14d3a3b249	IB: add a proper completion queue abstraction This adds an abstraction that allows ULPs to simply pass a completion object and completion callback with each submitted WR and let the RDMA core handle the nitty gritty details of how to handle completion interrupts and poll the CQ. In detail there is a new ib_cqe structure which just contains the completion callback, and which can be used to get at the containing object using container_of. It is pointed to by the WR and WC as an alternative to the wr_id field, similar to how many ULPs already use the field to store a pointer using casts. A driver using the new completion callbacks allocates it's CQs using the new ib_create_cq API, which in addition to the number of CQEs and the completion vectors also takes a mode on how we poll for CQEs. Three modes are available: direct for drivers that never take CQ interrupts and just poll for them, softirq to poll from softirq context using the to be renamed blk-iopoll infrastructure which takes care of rearming and budgeting, or a workqueue for consumer who want to be called from user context. Thanks a lot to Sagi Grimberg who helped reviewing the API, wrote the current version of the workqueue code because my two previous attempts sucked too much and converted the iSER initiator to the new API. Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-12-11 14:10:43 -08:00
Leon Romanovsky	ab5cdc3163	IB/mlx5: Postpone remove_keys under knowledge of coming preemption The remove_keys() logic is performed as garbage collection task. Such task is intended to be run when no other active processes are running. The need_resched() will return TRUE if there are user tasks to be activated in near future. In such case, we don't execute remove_keys() and postpone the garbage collection work to try to run in next cycle, in order to free CPU resources to other tasks. The possible pseudo-code to trigger such scenario: 1. Allocate a lot of MR to fill the cache above the limit. 2. Wait a small amount of time "to calm" the system. 3. Start CPU extensive operations on multi-node cluster. 4. Expect performance degradation during MR cache shrink operation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-08 16:55:31 -05:00
Wengang Wang	0ef2f05c7e	IB/mlx4: Use vmalloc for WR buffers when needed There are several hits that WR buffer allocation(kmalloc) failed. It failed at order 3 and/or 4 contigous pages allocation. At the same time there are actually 100MB+ free memory but well fragmented. So try vmalloc when kmalloc failed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-08 16:48:10 -05:00
Sagi Grimberg	f1a47d37fb	iser-target: Remove explicit mlx4 work-around The driver now exposes sufficient limits so we can avoid having mlx4 specific work-around. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-08 13:01:11 -05:00
Sagi Grimberg	a5e14ba334	mlx4: Expose correct max_sge_rd limit mlx4 devices (ConnectX-2, ConnectX-3) has a limitation where rdma read work queue entries cannot exceed 512 bytes. A rdma_read wqe needs to fit in 512 bytes: - wqe control segment (16 bytes) - rdma segment (16 bytes) - scatter elements (16 bytes each) So max_sge_rd should be: (512 - 16 - 16) / 16 = 30. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-08 12:42:44 -05:00
Hal Rosenstock	533708867d	IB/mad: Require CM send method for everything except ClassPortInfo Receipt of CM MAD with other than the Send method for an attribute other than the ClassPortInfo attribute is invalid. CM attributes other than ClassPortInfo only use the send method. The SRP initiator does not maintain a timeout policy for CM connect requests relies on the CM layer to do that. The result was that the SRP initiator hung as the connect request never completed. A new SRP target has been observed to respond to Send CM REQ with GetResp of CM REQ with bad status. This is non conformant with IBA spec but exposes a vulnerability in the current MAD/CM code which will respond to the incoming GetResp of CM REQ as if it was a valid incoming Send of CM REQ rather than tossing this on the floor. It also causes the MAD layer not to retransmit the original REQ even though it has not received a REP. Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-08 12:19:11 -05:00
Bart Van Assche	d3632493c7	IB/cma: Add a missing rcu_read_unlock() Ensure that validate_ipv4_net_dev() calls rcu_read_unlock() if fib_lookup() fails. Detected by sparse. Compile-tested only. Fixes: "IB/cma: Validate routing of incoming requests" (commit `f887f2ac87`). Cc: Haggai Eran <haggaie@mellanox.com> Cc: stable <stable@vger.kernel.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-08 12:14:43 -05:00
Masanari Iida	e3d132d123	treewide: Fix typos in printk This patch fix multiple spelling typos found in various part of kernel. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-12-08 14:59:19 +01:00
Bart Van Assche	8f5ba10ed4	IB core: Fix ib_sg_to_pages() On 12/03/2015 01:18 AM, Christoph Hellwig wrote: > The patch looks good to me, but while we touch this area, how about > throwing in a few cosmetic fixes as well? How about the patch below ? In that version of the ib_sg_to_pages() fix these concerns have been addressed and additionally to more bugs have been fixed. ------------ [PATCH] IB core: Fix ib_sg_to_pages() Fix the code for detecting gaps. A gap occurs not only if the second or later scatterlist element is not aligned but also if any scatterlist element other than the last does not end at a page boundary. In the code for coalescing contiguous elements, ensure that mr->length is correct and that last_page_addr is up-to-date. Ensure that this function returns a negative error code instead of zero if the first set_page() call fails. Fixes: commit `4c67e2bfc8` ("IB/core: Introduce new fast registration API") Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 17:20:12 -05:00
Bart Van Assche	57b0be9c0f	IB/srp: Fix srp_map_sg_fr() After dma_map_sg() has been called the return value of that function must be used as the number of elements in the scatterlist instead of scsi_sg_count(). Fixes: commit `f7f7aab1a5` ("IB/srp: Convert to new registration API") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: stable <stable@vger.kernel.org> # v4.4+ Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 17:20:11 -05:00
Bart Van Assche	a745f4f410	IB/srp: Fix indirect data buffer rkey endianness Detected by sparse. Fixes: commit `330179f2fa` ("IB/srp: Register the indirect data buffer descriptor") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: stable <stable@vger.kernel.org> # v4.3+ Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 17:20:11 -05:00
Christoph Hellwig	fc925518aa	IB/srp: Initialize dma_length in srp_map_idb Without this sg_dma_len will return 0 on architectures tha have the dma_length field. Fixes: commit `f7f7aab1a5` ("IB/srp: Convert to new registration API") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 17:20:11 -05:00
Sagi Grimberg	09c0c0bea5	IB/srp: Fix possible send queue overflow When using work request based memory registration (fast_reg) we must reserve SQ entries for registration and invalidation in addition to send operations. Each IO consumes 3 SQ entries (registration, send, invalidation) so we need to allocate 3x larger send-queue instead of 2x. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 17:20:11 -05:00
Bart Van Assche	4d59ad2995	IB/srp: Fix a memory leak If srp_connect_ch() returns a positive value then that is considered by its caller as a connection failure but this does not result in a scsi_host_put() call and additionally causes the srp_create_target() function to return a positive value while it should return a negative value. Avoid all this confusion and additionally fix a memory leak by ensuring that srp_connect_ch() always returns a value that is <= 0. This patch avoids that a rejected login triggers the following memory leak: unreferenced object 0xffff88021b24a220 (size 8): comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s) hex dump (first 8 bytes): 68 6f 73 74 35 38 00 a5 host58.. backtrace: [<ffffffff8151014a>] kmemleak_alloc+0x7a/0xc0 [<ffffffff81165c1e>] __kmalloc_track_caller+0xfe/0x160 [<ffffffff81260d2b>] kvasprintf+0x5b/0x90 [<ffffffff81260e2d>] kvasprintf_const+0x8d/0xb0 [<ffffffff81254b0c>] kobject_set_name_vargs+0x3c/0xa0 [<ffffffff81337e3c>] dev_set_name+0x3c/0x40 [<ffffffff81355757>] scsi_host_alloc+0x327/0x4b0 [<ffffffffa03edc8e>] srp_create_target+0x4e/0x8a0 [ib_srp] [<ffffffff8133778b>] dev_attr_store+0x1b/0x20 [<ffffffff811f27fa>] sysfs_kf_write+0x4a/0x60 [<ffffffff811f1e8e>] kernfs_fop_write+0x14e/0x180 [<ffffffff81176eef>] __vfs_write+0x2f/0xf0 [<ffffffff811771e4>] vfs_write+0xa4/0x100 [<ffffffff81177c64>] SyS_write+0x54/0xc0 [<ffffffff8151b257>] entry_SYSCALL_64_fastpath+0x12/0x6f Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 17:20:11 -05:00
Kaike Wan	3ebd2fd0d0	IB/sa: Put netlink request into the request list before sending It was found by Saurabh Sengar that the netlink code tried to allocate memory with GFP_KERNEL while holding a spinlock. While it is possible to fix the issue by replacing GFP_KERNEL with GFP_ATOMIC, it is better to get rid of the spinlock while sending the packet. However, in order to protect against a race condition that a quick response may be received before the request is put on the request list, we need to put the request on the list first. Signed-off-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reported-by: Saurabh Sengar <saurabh.truth@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 16:43:01 -05:00
Arnd Bergmann	2c63d1072a	IB/iser: use sector_div instead of do_div do_div is the wrong way to divide a sector_t, as it is less efficient when sector_t is 32-bit wide. With the upcoming do_div optimizations, the kernel starts warning about this: drivers/infiniband/ulp/iser/iser_verbs.c:1296:4: note: in expansion of macro 'do_div' include/asm-generic/div64.h:224:22: warning: passing argument 1 of '__div64_32' from incompatible pointer type This changes the code to use sector_div instead, which always produces optimal code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 16:42:33 -05:00
Mike Marciniszyn	d144da8c6f	IB/core: use RCU for uverbs id lookup The current implementation gets a spin_lock, and at any scale with qib and hfi1 post send, the lock contention grows exponentially with the number of QPs. idr_find() is RCU compatibile, so read doesn't need the lock. Change to use rcu_read_lock() and rcu_read_unlock() in __idr_get_uobj(). kfree_rcu() is used to insure a grace period between the idr removal and actual free. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 16:39:26 -05:00
Easwar Hariharan	57ab251213	IB/qib: Minor fixes to qib per SFF 8636 Minor errors found via code inspection during future development. SFF 8636 defines bit position 2 to hold the status indication of QSFP memory paging. The mask used to test for the value was incorrect and is fixed in this patch. Additionally, the dump function had a mismatch between the field being printed out and the field used to source the data which was fixed. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reported-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 16:36:00 -05:00
Mike Marciniszyn	1d784b890c	IB/core: Fix user mode post wr corruption Commit `e622f2f4ad` ("IB: split struct ib_send_wr") introduced a regression for HCAs whose user mode post sends go through ib_uverbs_post_send(). The code didn't account for the fact that the first sge is offset by an operation dependent length. The allocation did, but the pointer to the destination sge list is computed without that knowledge. The sge list copy_from_user() then corrupts fields in the work request Store the operation dependent length in a local variable and compute the sge list copy_from_user() destination using that length. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 16:22:14 -05:00
Ira Weiny	785f742223	IB/qib: Fix qib_mr structure struct qib_mr requires the mr member be the last because struct qib_mregion contains a dynamic array at the end. The additions of members should have been placed before this structure as the comment noted. Failure to do so was causing random memory corruption. Reproducing this bug was easy to do by running the client and server of ib_write_bw -s 8 -n 5 on the same node. This BUG() was tripped in a slab debug kernel: kernel BUG at mm/slab.c:2572! Fixes: `38071a461f` ("IB/qib: Support the new memory registration API") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-12-07 16:22:14 -05:00
Or Gerlitz	f1b4e12a9a	IB/mlx4: Use the VF base-port when demuxing mad from wire Under HA mode, it's possible that the VF registered its GID (and expects to get mads through the PV scheme) on a port which is different from the one this mad arrived on, due to HA fail over. Therefore, if the gid is not matched on the port that the packet arrived on, check for a match on the other port if HA mode is active -- and if a match is found on the other port, continue processing the mad using that other port. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-06 22:40:45 -05:00
Linus Torvalds	d83763f4a6	SCSI misc on 20151113 Sorry for the delay in this patch which was mostly caused by getting the merger of the mpt2/mpt3sas driver, which was seen as an essential item of maintenance work to do before the drivers diverge too much. Unfortunately, this caused a compile failure (detected by linux-next), which then had to be fixed up and incubated. In addition to the mpt2/3sas rework, there are updates from pm80xx, lpfc, bnx2fc, hpsa, ipr, aacraid, megaraid_sas, storvsc and ufs plus an assortment of changes including some year 2038 issues, a fix for a remove before detach issue in some drivers and a couple of other minor issues. Signed-off-by: James Bottomley <JBottomley@Odin.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABAgAGBQJWRnpiAAoJEDeqqVYsXL0MJd0IAIkNP1q/6ksMw/lam2UlbdxN zFsFOIhGM3xLnBFehbCx7JW3cmybA7WC5jhMjaa1OeJmYHLpwbTzvVwrLs2l/0y1 /S+G0wwxROZfIKj/1EO3oKbSPCu9N3lStxOnmNXt5PSUEzAXrTqSNTtnMf2ZGh7j bQxTWEJe+66GckgGw4ozTXJHWXqM/Zs/FsYjn2h/WzFhFv8utr7zRSzHMVjcqpLG TK/Lt03hiTGqiignKrV9W6JzdGTWf2LGIsj/njgR0dxv59cNH8PrHIjZyknSBxgn lblinsCjxDHWnP4BSfTi9MQG1lEiZiWO3Y6TKkKJTgxZ9M0Eitspc+cLOiJ1mpg= =HvQf -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull final round of SCSI updates from James Bottomley: "Sorry for the delay in this patch which was mostly caused by getting the merger of the mpt2/mpt3sas driver, which was seen as an essential item of maintenance work to do before the drivers diverge too much. Unfortunately, this caused a compile failure (detected by linux-next), which then had to be fixed up and incubated. In addition to the mpt2/3sas rework, there are updates from pm80xx, lpfc, bnx2fc, hpsa, ipr, aacraid, megaraid_sas, storvsc and ufs plus an assortment of changes including some year 2038 issues, a fix for a remove before detach issue in some drivers and a couple of other minor issues" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (141 commits) mpt3sas: fix inline markers on non inline function declarations sd: Clear PS bit before Mode Select. ibmvscsi: set max_lun to 32 ibmvscsi: display default value for max_id, max_lun and max_channel. mptfusion: don't allow negative bytes in kbuf_alloc_2_sgl() scsi: pmcraid: replace struct timeval with ktime_get_real_seconds() mvumi: 64bit value for seconds_since1970 be2iscsi: Fix bogus WARN_ON length check scsi_scan: don't dump trace when scsi_prep_async_scan() is called twice mpt3sas: Bump mpt3sas driver version to 09.102.00.00 mpt3sas: Single driver module which supports both SAS 2.0 & SAS 3.0 HBAs mpt2sas, mpt3sas: Update the driver versions mpt3sas: setpci reset kernel oops fix mpt3sas: Added OEM Gen2 PnP ID branding names mpt3sas: Refcount fw_events and fix unsafe list usage mpt3sas: Refcount sas_device objects and fix unsafe list usage mpt3sas: sysfs attribute to report Backup Rail Monitor Status mpt3sas: Ported WarpDrive product SSS6200 support mpt3sas: fix for driver fails EEH, recovery from injected pci bus error mpt3sas: Manage MSI-X vectors according to HBA device type ...	2015-11-13 20:35:54 -08:00
Linus Torvalds	9aa3d651a9	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "This series contains HCH's changes to absorb configfs attribute ->show() + ->store() function pointer usage from it's original tree-wide consumers, into common configfs code. It includes usb-gadget, target w/ drivers, netconsole and ocfs2 changes to realize the improved simplicity, that now renders the original include/target/configfs_macros.h CPP magic for fabric drivers and others, unnecessary and obsolete. And with common code in place, new configfs attributes can be added easier than ever before. Note, there are further improvements in-flight from other folks for v4.5 code in configfs land, plus number of target fixes for post -rc1 code" In the meantime, a new user of the now-removed old configfs API came in through the char/misc tree in commit `7bd1d4093c` ("stm class: Introduce an abstraction for System Trace Module devices"). This merge resolution comes from Alexander Shishkin, who updated his stm class tracing abstraction to account for the removal of the old show_attribute and store_attribute methods in commit `517982229f` ("configfs: remove old API") from this pull. As Alexander says about that patch: "There's no need to keep an extra wrapper structure per item and the awkward show_attribute/store_attribute item ops are no longer needed. This patch converts policy code to the new api, all the while making the code quite a bit smaller and easier on the eyes. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>" That patch was folded into the merge so that the tree should be fully bisectable. * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits) configfs: remove old API ocfs2/cluster: use per-attribute show and store methods ocfs2/cluster: move locking into attribute store methods netconsole: use per-attribute show and store methods target: use per-attribute show and store methods spear13xx_pcie_gadget: use per-attribute show and store methods dlm: use per-attribute show and store methods usb-gadget/f_serial: use per-attribute show and store methods usb-gadget/f_phonet: use per-attribute show and store methods usb-gadget/f_obex: use per-attribute show and store methods usb-gadget/f_uac2: use per-attribute show and store methods usb-gadget/f_uac1: use per-attribute show and store methods usb-gadget/f_mass_storage: use per-attribute show and store methods usb-gadget/f_sourcesink: use per-attribute show and store methods usb-gadget/f_printer: use per-attribute show and store methods usb-gadget/f_midi: use per-attribute show and store methods usb-gadget/f_loopback: use per-attribute show and store methods usb-gadget/ether: use per-attribute show and store methods usb-gadget/f_acm: use per-attribute show and store methods usb-gadget/f_hid: use per-attribute show and store methods ...	2015-11-13 20:04:17 -08:00
Christoph Hellwig	64d513ac31	scsi: use host wide tags by default This patch changes the !blk-mq path to the same defaults as the blk-mq I/O path by always enabling block tagging, and always using host wide tags. We've had blk-mq available for a few releases so bugs with this mode should have been ironed out, and this ensures we get better coverage of over tagging setup over different configs. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-11-09 17:11:57 -08:00
Linus Torvalds	ad804a0b2a	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...	2015-11-07 14:32:45 -08:00
Linus Torvalds	ab9f2faf8f	Initial 4.4 merge window submission - "Checksum offload support in user space" enablement - Misc cxgb4 fixes, add T6 support - Misc usnic fixes - 32 bit build warning fixes - Misc ocrdma fixes - Multicast loopback prevention extension - Extend the GID cache to store and return attributes of GIDs - Misc iSER updates - iSER clustering update - Network NameSpace support for rdma CM - Work Request cleanup series - New Memory Registration API -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWPO5UAAoJELgmozMOVy/dSCQP/iX2ImMZOS3VkOYKhLR3dSv8 4vTEiYIoAT1JEXiPpiabuuACwotcZcMRk9kZ0dcWmBoFusTzKJmoDOkgAYd95XqY EsAyjqtzUGNNMjH5u5W+kdbaFdH9Ktq7IJvspRlJuvzC47Srax+qBxX01jrAkDgh 4PoA3hEa2KkvkDjY2Mhvk9EWd/uflO9Ky6o0D8jUQkWtEvKBRyDjQLk30oW6wHX9 pTWqww3dD0EXTrR+PDA88v2saKH1kZFU1Nt2eU8Bw+zlJM8hcX6U7PfRX0g3HT/J o+7ejTdLPWFDH35gJOU+KE519f1JbwfRjPJCqbOC9IttBB7iHSbhcpQLpWv4JV1x agdBeDA3TGQj3dHb2SkYMlWXCBp7q8UCbVGvvirTFzGSGU73sc6hhP+vCKvPQIlE Ah5tUqD7Y3mOBjvuDeIzKMLXILd5d3cH+m7Laytrf5e7fJPmBRZyOkcMh0QVElyl mKo+PFjghgeTFb405J7SDDw/vThVyN9HyIt7AGEzObaajzOOk9R1hkQr46XVy9TK yi58fl85yQ2n6TWV6NRnvkQoMy/N2HAEuXk/7HtO0PabV5w3Lo0zvXB9SnVrrVEm 58FWRBYCWorVSdSacuDnPm0iz45WSRIb9G9sBlhEC93eXRq2rSBoy4RvyLeliHFH hllyhNNolI6FJ64j07Xm =bBIY -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is my initial round of 4.4 merge window patches. There are a few other things I wish to get in for 4.4 that aren't in this pull, as this represents what has gone through merge/build/run testing and not what is the last few items for which testing is not yet complete. - "Checksum offload support in user space" enablement - Misc cxgb4 fixes, add T6 support - Misc usnic fixes - 32 bit build warning fixes - Misc ocrdma fixes - Multicast loopback prevention extension - Extend the GID cache to store and return attributes of GIDs - Misc iSER updates - iSER clustering update - Network NameSpace support for rdma CM - Work Request cleanup series - New Memory Registration API" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits) IB/core, cma: Make __attribute_const__ declarations sparse-friendly IB/core: Remove old fast registration API IB/ipath: Remove fast registration from the code IB/hfi1: Remove fast registration from the code RDMA/nes: Remove old FRWR API IB/qib: Remove old FRWR API iw_cxgb4: Remove old FRWR API RDMA/cxgb3: Remove old FRWR API RDMA/ocrdma: Remove old FRWR API IB/mlx4: Remove old FRWR API support IB/mlx5: Remove old FRWR API support IB/srp: Dont allocate a page vector when using fast_reg IB/srp: Remove srp_finish_mapping IB/srp: Convert to new registration API IB/srp: Split srp_map_sg RDS/IW: Convert to new memory registration API svcrdma: Port to new memory registration API xprtrdma: Port to new memory registration API iser-target: Port to new memory registration API IB/iser: Port to new fast registration API ...	2015-11-07 13:33:07 -08:00
Mel Gorman	71baba4b92	mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM __GFP_WAIT was used to signal that the caller was in atomic context and could not sleep. Now it is possible to distinguish between true atomic context and callers that are not willing to sleep. The latter should clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing __GFP_WAIT behaves differently, there is a risk that people will clear the wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly indicate what it does -- setting it allows all reclaim activity, clearing them prevents it. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Mel Gorman	d0164adc89	mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Linus Torvalds	9cf5c095b6	asm-generic cleanups The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig to clean up various abuses of headers in there. The patch to rename the io-64-nonatomic-.h headers caused some conflicts with new users, so I added a workaround that we can remove in the next merge window. The only other patch is a warning fix from Marek Vasut -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVjzaf2CrR//JCVInAQImmhAA20fZ91sUlnA5skKNPT1phhF6Z7UF2Sx5 nPKcHQD3HA3lT1OKfPBYvCo+loYflvXFLaQThVylVcnE/8ecAEMtft4nnGW2nXvh sZqHIZ8fszTB53cynAZKTjdobD1wu33Rq7XRzg0ugn1mdxFkOzCHW/xDRvWRR5TL rdQjzzgvn2PNlqFfHlh6cZ5ykShM36AIKs3WGA0H0Y/aYsE9GmDOAUp41q1mLXnA 4lKQaIxoeOa+kmlsUB0wEHUecWWWJH4GAP+CtdKzTX9v12bGNhmiKUMCETG78BT3 uL8irSqaViNwSAS9tBxSpqvmVUsa5aCA5M3MYiO+fH9ifd7wbR65g/wq39D3Pc01 KnZ3BNVRW5XSA3c86pr8vbg/HOynUXK8TN0lzt6rEk8bjoPBNEDy5YWzy0t6reVe wX65F+ver8upjOKe9yl2Jsg+5Kcmy79GyYjLUY3TU2mZ+dIdScy/jIWatXe/OTKZ iB4Ctc4MDe9GDECmlPOWf98AXqsBUuKQiWKCN/OPxLtFOeWBvi4IzvFuO8QvnL9p jZcRDmIlIWAcDX/2wMnLjV+Hqi3EeReIrYznxTGnO7HHVInF555GP51vFaG5k+SN smJQAB0/sostmC1OCCqBKq5b6/li95/No7+0v0SUhJJ5o76AR5CcNsnolXesw1fu vTUkB/I66Hk= =dQKG -----END PGP SIGNATURE----- Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanups from Arnd Bergmann: "The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig to clean up various abuses of headers in there. The patch to rename the io-64-nonatomic-.h headers caused some conflicts with new users, so I added a workaround that we can remove in the next merge window. The only other patch is a warning fix from Marek Vasut" * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: temporarily add back asm-generic/io-64-nonatomic.h asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations gpio-mxc: stop including <asm-generic/bug> n_tracesink: stop including <asm-generic/bug> n_tracerouter: stop including <asm-generic/bug> mlx5: stop including <asm-generic/kmap_types.h> hifn_795x: stop including <asm-generic/kmap_types.h> drbd: stop including <asm-generic/kmap_types.h> move count_zeroes.h out of asm-generic move io-64-nonatomic.h out of asm-generic	2015-11-06 14:22:15 -08:00
David S. Miller	b75ec3af27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-11-01 00:15:30 -04:00
Bart Van Assche	db7489e076	IB/core, cma: Make __attribute_const__ declarations sparse-friendly Move the __attribute_const__ declarations such that sparse understands that these apply to the function itself and not to the return type. This avoids that sparse reports error messages like the following: drivers/infiniband/core/verbs.c:73:12: error: symbol 'ib_event_msg' redeclared with different type (originally declared at include/rdma/ib_verbs.h:470) - different modifiers Fixes: `2b1b5b6012` ("IB/core, cma: Nice log-friendly string helpers") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-30 17:57:49 -04:00
Sagi Grimberg	39bfc271bd	IB/core: Remove old fast registration API No callers and no providers left, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-29 11:43:47 -04:00
Sagi Grimberg	13d3e895fa	RDMA/nes: Remove old FRWR API No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:32:29 -04:00
Sagi Grimberg	b8533eccc8	IB/qib: Remove old FRWR API No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:32:29 -04:00
Sagi Grimberg	d3cfd002e6	iw_cxgb4: Remove old FRWR API No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:32:29 -04:00
Sagi Grimberg	94e585cb74	RDMA/cxgb3: Remove old FRWR API No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:19 -04:00
Sagi Grimberg	191cfed565	RDMA/ocrdma: Remove old FRWR API No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:19 -04:00
Sagi Grimberg	e761c67fbf	IB/mlx4: Remove old FRWR API support No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:19 -04:00
Sagi Grimberg	dd01e66a6c	IB/mlx5: Remove old FRWR API support No ULP uses it anymore, go ahead and remove it. Keep only the local invalidate part of the handlers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:19 -04:00
Sagi Grimberg	9a21be531c	IB/srp: Dont allocate a page vector when using fast_reg The new fast registration API does not reuqire a page vector so we can't avoid allocating it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:19 -04:00
Sagi Grimberg	51c2b8e2ac	IB/srp: Remove srp_finish_mapping No callers left, remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:19 -04:00
Sagi Grimberg	f7f7aab1a5	IB/srp: Convert to new registration API Instead of constructing a page list, call ib_map_mr_sg and post a new ib_reg_wr. srp_map_finish_fr now returns the number of sg elements registered. Remove srp_finish_mapping since no one is calling it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:19 -04:00
Sagi Grimberg	26630e8a09	IB/srp: Split srp_map_sg This is a preparation patch for the new registration API conversion. It splits srp_map_sg per registration strategy (srp_map_sg[fmr\|fr\|dma]. On its own it adds some code duplication, but it makes the API switch easier to comprehend. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:18 -04:00
Sagi Grimberg	16c2d702f2	iser-target: Port to new memory registration API Remove fastreg page list allocation as the page vector is now private to the provider. Instead of constructing the page list and fast_req work request, call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:18 -04:00
Sagi Grimberg	3940588500	IB/iser: Port to new fast registration API Remove fastreg page list allocation as the page vector is now private to the provider. Instead of constructing the page list and fast_req work request, call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:18 -04:00
Sagi Grimberg	0ba24dd39a	RDMA/nes: Support the new memory registration API Support the new memory registration API by allocating a private page list array in nes_mr and populate it when nes_map_mr_sg is invoked. Also, support IB_WR_REG_MR by duplicating IB_WR_FAST_REG_MR handling and take the needed information from different places: - page_size, iova, length (ib_mr) - page array (nes_mr) - key, access flags (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:18 -04:00
Sagi Grimberg	38071a461f	IB/qib: Support the new memory registration API Support the new memory registration API by allocating a private page list array in qib_mr and populate it when qib_map_mr_sg is invoked. Also, support IB_WR_REG_MR by duplicating qib_fastreg_mr just take the needed information from different places: - page_size, iova, length (ib_mr) - page array (qib_mr) - key, access flags (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:18 -04:00
Sagi Grimberg	8376b86de7	iw_cxgb4: Support the new memory registration API Support the new memory registration API by allocating a private page list array in c4iw_mr and populate it when c4iw_map_mr_sg is invoked. Also, support IB_WR_REG_MR by duplicating build_fastreg just take the needed information from different places: - page_size, iova, length (ib_mr) - page array (c4iw_mr) - key, access flags (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:18 -04:00
Sagi Grimberg	14fb4171ab	RDMA/cxgb3: Support the new memory registration API Support the new memory registration API by allocating a private page list array in iwch_mr and populate it when iwch_map_mr_sg is invoked. Also, support IB_WR_REG_MR by duplicating build_fastreg just take the needed information from different places: - page_size, iova, length (ib_mr) - page array (iwch_mr) - key, access flags (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:18 -04:00
Sagi Grimberg	2eaa1c5647	RDMA/ocrdma: Support the new memory registration API Support the new memory registration API by allocating a private page list array in ocrdma_mr and populate it when ocrdma_map_mr_sg is invoked. Also, support IB_WR_REG_MR by duplicating IB_WR_FAST_REG_MR, but take the needed information from different places: - page_size, iova, length, access flags (ib_mr) - page array (ocrdma_mr) - key (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:18 -04:00
Sagi Grimberg	1b2cd0fc67	IB/mlx4: Support the new memory registration API Support the new memory registration API by allocating a private page list array in mlx4_ib_mr and populate it when mlx4_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR by setting the exact WQE as IB_WR_FAST_REG_MR, just take the needed information from different places: - page_size, iova, length, access flags (ib_mr) - page array (mlx4_ib_mr) - key (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:17 -04:00
Sagi Grimberg	8a187ee52b	IB/mlx5: Support the new memory registration API Support the new memory registration API by allocating a private page list array in mlx5_ib_mr and populate it when mlx5_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR by setting the exact WQE as IB_WR_FAST_REG_MR, just take the needed information from different places: - page_size, iova, length, access flags (ib_mr) - page array (mlx5_ib_mr) - key (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:17 -04:00
Sagi Grimberg	a706000916	IB/mlx5: Remove dead fmr code Just function declarations - no need for those laying arround. If for some reason someone will want FMR support in mlx5, it should be easy enough to restore a few structs. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:17 -04:00
Sagi Grimberg	4c67e2bfc8	IB/core: Introduce new fast registration API The new fast registration verb ib_map_mr_sg receives a scatterlist and converts it to a page list under the verbs API thus hiding the specific HW mapping details away from the consumer. The provider drivers are provided with a generic helper ib_sg_to_pages that converts a scatterlist into a vector of page addresses. The drivers can still perform any HW specific page address setting by passing a set_page function pointer which will be invoked for each page address. This allows drivers to avoid keeping a shadow page vectors and convert them to HW specific translations by doing extra copies. This API will allow ULPs to remove the duplicated code of constructing a page vector from a given sg list. The send work request ib_reg_wr also shrinks as it will contain only mr, key and access flags in addition. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 22:27:17 -04:00
Doug Ledford	63e8790d39	Merge branch 'wr-cleanup' into k.o/for-4.4	2015-10-28 22:23:34 -04:00
Doug Ledford	eb14ab3ba1	Merge branch 'wr-cleanup' of git://git.infradead.org/users/hch/rdma into wr-cleanup Signed-off-by: Doug Ledford <dledford@redhat.com> Conflicts: drivers/infiniband/ulp/isert/ib_isert.c - Commit `4366b19ca5` (iser-target: Change the recv buffers posting logic) changed the logic in isert_put_datain() and had to be hand merged	2015-10-28 22:21:09 -04:00
Guy Shapiro	95893dde99	IB/ucma: Take the network namespace from the process Add support for network namespaces from user space. This is done by passing the network namespace of the process instead of init_net. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 12:32:48 -04:00
Guy Shapiro	fa20105e09	IB/cma: Add support for network namespaces Add support for network namespaces in the ib_cma module. This is accomplished by: 1. Adding network namespace parameter for rdma_create_id. This parameter is used to populate the network namespace field in rdma_id_private. rdma_create_id keeps a reference on the network namespace. 2. Using the network namespace from the rdma_id instead of init_net inside of ib_cma, when listening on an ID and when looking for an ID for an incoming request. 3. Decrementing the reference count for the appropriate network namespace when calling rdma_destroy_id. In order to preserve the current behavior init_net is passed when calling from other modules. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 12:32:48 -04:00
Haggai Eran	4be74b42a6	IB/cma: Separate port allocation to network namespaces Keep a struct for each network namespace containing the IDRs for the RDMA CM port spaces. The struct is created dynamically using the generic_net mechanism. This patch is internal infrastructure work for the following patches. In this patch, init_net is statically used as the network namespace for the new port-space API. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 12:32:48 -04:00
Guy Shapiro	565edd1d55	IB/addr: Pass network namespace as a parameter Add network namespace support to the ib_addr module. For that, all the address resolution and matching should be done using the appropriate namespace instead of init_net. This is achieved by: 1. Adding an explicit network namespace argument to exported function that require a namespace. 2. Saving the namespace in the rdma_addr_client structure. 3. Using it when calling networking functions. In order to preserve the behavior of calling modules, &init_net is passed as the parameter in calls from other modules. This is modified as namespace support is added on more levels. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 12:32:47 -04:00
Sagi Grimberg	630c3183ce	IB/iser: Enable SG clustering iser is perfectly capable supporting SG clustering as it translates the SG list to a page vector. Enabling SG clustering can dramatically reduce the number of SG elements, which doesn't make much of a difference at this point, but with arbitrary SG list support, reducing the number of SG elements can benefit greatly as as it would reduce the length of the HW descriptors array. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 12:26:06 -04:00
Sagi Grimberg	dd0107a089	IB/iser: set block queue_virt_boundary The block layer can reliably guarantee that SG lists won't contain gaps (page unaligned) if a driver set the queue virt_boundary. With this setting the block layer will: - refuse merges if bios are not aligned to the virtual boundary - split bios/requests that are not aligned to the virtual boundary - or, bounce buffer SG_IOs that are not aligned to the virtual boundary Since iser is working in 4K page size, set the virt_boundary to 4K pages. With this setting, we can now safely remove the bounce buffering logic in iser. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-28 12:26:06 -04:00
Linus Torvalds	410694e214	Changes for 4.3-rc6 6 serious fixes: 1) Hold the mutex around the find and corresponding update of our gid 2) The ifa list is rcu protected, copy its contents under rcu to avoid using a freed structure 3) On error, netdev might be null, so check it before trying to release it 4) On init, if workqueue alloc fails, fail init 5) The new demux patches exposed a bug in mlx5 and ipath drivers, we need to use the payload P_Key to determine the P_Key the packet arrived on because the hardware doesn't tell us the truth 6) Due to a couple convoluted error flows, it is possible for the CM to trigger a use_after_free and a double_free of rb nodes. Add two checks to prevent that. This code has worked for 10+ years. It is likely that some of the recent changes have caused this issue to surface. The current patch will protect us from nasty events for now while we track down why this is just now showing up. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWJ+zYAAoJELgmozMOVy/dpvIP/iDBxRy0tw8jRdJQNF7EUZ5S a40aXun/J7Ywl6znvWG/WepDriMDe/9n9qIll8ObZtn7VuNlCZhp/7fx0Bn+fVnR c4j1cIdhWeja7Wg1xI6K75kXzae8hIwn8aombaFwf4NhUH6qkqGMt72Wp0Y9KeIv d903VyDwEJgETLAgUPaS1STurEuYcQocESvuer66Qp704Q+EhSbYKf9UgiRT6bOm 36ilI5vxtpPeDpI/YT4ubyD2R7r6XPTGRGs3R6/3W6vRaGLNJ6jq4+Fhthx137G2 NoddGmBdxemt2u0hWwLKcMGb5JNFbOoHuOrJmH9AZO6OyIH8XzILeEiXosdRjX/d U7OC29fE1Ge5wBNwwm+rJx79HgOVyu1vc5AlcE8vbtImIugJT8OzNIbZoAMcm8CB RUieyn78PGrkZ1KjCbVpiJ7u6S07LBLq/GqPHTnZ7aen3klP3pm8rAfskM7quEvf UD3kFP3kFQKZ8PG0Q27XtkKixoBU0vie9YgF2ZLBbEfJpJDKT9HF3lr1nP2AjN2a 23FxUMzFCncE2ewA7tXwbHkCY7x5zn/Og0xdbRKdw/q/EvVu9VLZz/eTASLT+we0 v3YZQdkdSnU0rwULZY+26h1PeHba/aoeoN6x/tmlKLQvaCfhkAsoXDl+MGfY4oLk 1f5iWBgn4kseBXg+N/c9 =fpXl -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull infiniband fixes from Doug Ledford: "It's late in the game, I know, but these fixes seemed important enough to warrant a late pull request. They all involve oopses or use after frees or corruptions. Six serious fixes: - Hold the mutex around the find and corresponding update of our gid - The ifa list is rcu protected, copy its contents under rcu to avoid using a freed structure - On error, netdev might be null, so check it before trying to release it - On init, if workqueue alloc fails, fail init - The new demux patches exposed a bug in mlx5 and ipath drivers, we need to use the payload P_Key to determine the P_Key the packet arrived on because the hardware doesn't tell us the truth - Due to a couple convoluted error flows, it is possible for the CM to trigger a use_after_free and a double_free of rb nodes. Add two checks to prevent that. This code has worked for 10+ years. It is likely that some of the recent changes have caused this issue to surface. The current patch will protect us from nasty events for now while we track down why this is just now showing up" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/cm: Fix rb-tree duplicate free and use-after-free IB/cma: Use inner P_Key to determine netdev IB/ucma: check workqueue allocation before usage IB/cma: Potential NULL dereference in cma_id_from_event IB/core: Fix use after free of ifa IB/core: Fix memory corruption in ib_cache_gid_set_default_gid	2015-10-24 07:28:05 +09:00
Bart Van Assche	6c760b3dd5	iser-target: Remove an unused variable Detected this by compiling with W=1. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-22 18:37:47 -04:00
Bart Van Assche	78fc3fc4cc	IB/iser: Remove an unused variable Detected this by compiling with W=1. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-22 18:37:14 -04:00
Matan Barak	10e07f13c0	IB/core: Remove smac and vlan id from path record The GID cache accompanies every GID with attributes. The GID attributes link the GID with its netdevice, which could be resolved to smac and vlan id easily. Since we've added the netdevice (ifindex and net) to the path record, storing the L2 attributes is duplicated data and hence these attributes are removed. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:48:18 -04:00
Matan Barak	aa744cc01f	IB/core: Remove smac and vlan id from qp_attr and ah_attr Smac and vlan id could be resolved from the GID attribute, and thus these attributes aren't needed anymore. Removing them. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:48:18 -04:00
Matan Barak	5c266b2304	IB/cm: Remove the usage of smac and vid of qp_attr and cm_av The cm and cma don't need to explicitly handle vlan and smac, as they are resolved from the GID index now. Removing this portion of code. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:48:18 -04:00
Matan Barak	dbf727de74	IB/core: Use GID table in AH creation and dmac resolution Previously, vlan id and source MAC were used from QP attributes. Since the net device is now stored in the GID attributes, they could be used instead of getting this information from the QP attributes. IB_QP_SMAC, IB_QP_ALT_SMAC, IB_QP_VID and IB_QP_ALT_VID were removed because there is no known libibverbs that uses them. This commit also modifies the vendors (mlx4, ocrdma) drivers in order to use the new approach. ocrdma driver changes were done by Somnath Kotur <Somnath.Kotur@Avagotech.Com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:48:17 -04:00
Matan Barak	99b27e3b5d	IB/cache: Add ib_find_gid_by_filter cache API GID cache API users might want to search for GIDs with specific attributes rather than just specifying GID, net device and port. This is used in a later patch, where we find the sgid index by L2 Ethernet attributes. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:48:17 -04:00
Matan Barak	abae1b71dd	IB/cma: cma_validate_port should verify the port and netdevice Previously, cma_validate_port searched for GIDs in IB cache and then tried to verify the found port. This could fail when there are identical GIDs on both ports. In addition, netdevice should be taken into account when searching the GID table. Fixing cma_validate_port to search only the relevant port's cache and netdevice. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:48:17 -04:00
Matan Barak	c2c6ff1345	IB/cm: cm_init_av_by_path should find a GID by its netdevice Previously, the CM has searched the cache for any sgid_index whose GID matches the path's GID. Since the path record stores the net device, the CM should now search only for GIDs which originated from this net device. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:48:17 -04:00
Matan Barak	ba36e37fd3	IB/core: Add netdev to path record In order to find the sgid_index, one could just query the IB cache with the correct GID and netdevice. Therefore, instead of storing the L2 attributes directly in the path, we only store the ifindex and net and use them later to get the sgid_index. The vlan_id and smac L2 attributes are removed in a later patch. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:48:17 -04:00
Matan Barak	d300ec528b	IB/core: Expose and rename ib_find_cached_gid_by_port cache API Sometime consumers might want to search for a GID in a specific port. For example, when a WC arrives and we want to search the GID that matches that port - it's better to search only the relevant port. Exposing and renaming ib_cache_gid_find_by_port in order to match the naming convention of the module. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:48:17 -04:00
Matan Barak	55ee3ab2e4	IB/core: Add netdev and gid attributes paramteres to cache Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:48:17 -04:00
Eran Ben Elisha	fbfb6625ea	IB/mlx4: Add support for blocking multicast loopback QP creation user flag MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is now supported downstream. In addition, this flag was supported only for IB_QPT_UD, now, with the new implementation it is supported for all QP types. Support IB_USER_VERBS_EX_CMD_CREATE_QP in order to get the flag from user space using the extension create qp command. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:16:47 -04:00
Eran Ben Elisha	7b59f0f951	IB/mlx4: Add counter based implementation for QP multicast loopback block Current implementation for MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is not supported when link layer is Ethernet. This patch will add counter based implementation for multicast loopback prevention. HW can drop multicast loopback packets if sender QP counter index is equal to receiver QP counter index. If qp flag MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is set and link layer is Ethernet, create a new counter and attach it to the QP so it will continue receiving multicast loopback traffic but it's own. The decision if to create a new counter is being made at the qp modification to RTR after the QP's port is set. When QP is destroyed or moved back to reset state, delete the counter. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:16:47 -04:00
Eran Ben Elisha	3ba8e31d5a	IB/mlx4: Add IB counters table This is an infrastructure step for allocating and attaching more than one counter to QPs on the same port. Allocate a counters table and manage the insertion and removals of the counters in load and unload of mlx4 IB. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:16:47 -04:00
Eran Ben Elisha	ddf9529be1	IB/core: Allow setting create flags in QP init attribute Allow setting IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK at create_flags in ib_uverbs_create_qp_ex. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:16:46 -04:00
Eran Ben Elisha	6d8a74972b	IB/core: Extend ib_uverbs_create_qp ib_uverbs_ex_create_qp follows the extension verbs mechanism. New features (for example, QP creation flags field which is added in a downstream patch) could used via user-space libraries without breaking the ABI. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:16:46 -04:00
Hariprasad S	963cab5082	iw_cxgb4: Adds support for T6 adapter Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 23:16:38 -04:00
Selvin Xavier	c6a7b0d7a5	RDMA/ocrdma: Bump up ocrdma version number to 11.0.0.0 Updating the version number to 11.0.0.0 Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 17:28:19 -04:00
Devesh Sharma	af74d1956f	RDMA/ocrdma: Prevent CQ-Doorbell floods Changing CQ-Doorbell(DB) logic to prevent DB floods, it is supposed to be pressed only if any hw CQE is polled. If cq-arm was requested previously then don't bother about number of hw CQEs polled and arm the CQ. Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 17:28:19 -04:00
Naga Irrinki	aeb922df2c	RDMA/ocrdma: Check resource ids received in Async CQE Some versions of the FW sends wrong QP or CQ IDs in the Async CQE. Adding a check to see whether qp or cq structures associated with the CQE is valid. Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 17:28:19 -04:00
Selvin Xavier	fb16d8c49e	RDMA/ocrdma: Avoid a possible crash in ocrdma_rem_port_stats debugfs_remove should be called before freeing the driver stats resources to avoid any crash during ocrdma_remove. Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 17:28:19 -04:00
Selvin Xavier	5a85f5e9d4	RDMA/ocrdma: Cleanup unused device list and rcu variables ocrdma_dev_list is not used by the driver. So removing the references of this variable. dev->rcu was introduced for the ipv6 notifier for GID management. This is no longer required as the GID management is outside the HW driver. Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 17:28:18 -04:00
Hariprasad S	3dd9a5dc24	iw_cxgb4: reverse the ord/ird in the ESTABLISHED upcall The ESTABLISHED event should have the peer's ord/ird so swap the values in the event before the upcall. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 17:16:10 -04:00
Hariprasad S	f57b780c00	iw_cxgb4: fix misuse of ep->ord for minimum ird calculation When calculating the minimum ird in c4iw_accept_cr(), we need to always have a value of at least 1 if the RTR message is a 0B read. The code was incorrectly using ep->ord for this logic which was incorrectly adjusting the ird and causing incorrect ord/ird negotiation when using MPAv2 to negotiate these values. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 17:16:10 -04:00
Hariprasad S	158c776dba	iw_cxgb4: pass the ord/ird in connect reply events This allows client ULPs to get the negotiated ord/ird which is useful to avoid stalling the SQ due to exceeding the ORD. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 17:16:10 -04:00
Hariprasad S	99718e59fa	iw_cxgb4: detect fatal errors while creating listening filters In c4iw_create_listen(), if we're using listen filters, then bail out of the busy loop if the device becomes fatally dead Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 17:16:10 -04:00
Arnd Bergmann	5d1e623591	IB/core: avoid 32-bit warning The INIT_UDATA() macro requires a pointer or unsigned long argument for both input and output buffer, and all callers had a cast from when the code was merged until a recent restructuring, so now we get core/uverbs_cmd.c: In function 'ib_uverbs_create_cq': core/uverbs_cmd.c:1481:66: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] This makes the code behave as before by adding back the cast to unsigned long. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `565197dd8f` ("IB/core: Extend ib_uverbs_create_cq") Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 16:56:44 -04:00
Arnd Bergmann	b61e564af8	RDMA/cxgb4: re-fix 32-bit build warning Casting a pointer to __be64 produces a warning on 32-bit architectures: drivers/infiniband/hw/cxgb4/mem.c:147:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] req->wr.wr_lo = (__force __be64)&wr_wait; This was fixed at least twice for this driver in different places, and accidentally reverted once more. This puts the correct version back in place. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `6198dd8d7a` ("iw_cxgb4: 32b platform fixes") Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 16:56:28 -04:00
Geliang Tang	68a5e60436	IB/iser: fix a comment typo Just fix a typo in the code comment. Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 16:41:54 -04:00
Insu Yun	fe274c5aed	usnic: correctly handle kzalloc return value Since kzalloc returns memory address, not error code, it should be checked whether it is null or not. Signed-off-by: Insu Yun <wuninsu@gmail.com> Reviewed-by: Dave Goodell <dgoodell@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 16:41:19 -04:00
Insu Yun	2c79dad895	usnic: correctly check failed allocation Since ib_alloc_device returns allocated memory address, not error, it should be checked as IS_NULL, not IS_ERR_OR_NULL. Signed-off-by: Insu Yun <wuninsu@gmail.com> Reviewed-by: Dave Goodell <dgoodell@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 16:41:19 -04:00
Doug Ledford	fc81a06965	Merge branch 'k.o/for-4.3-v1' into k.o/for-4.4 Pick up the late fixes from the 4.3 cycle so we have them in our next branch.	2015-10-21 16:40:21 -04:00
Doron Tsur	0ca81a2840	IB/cm: Fix rb-tree duplicate free and use-after-free ib_send_cm_sidr_rep could sometimes erase the node from the sidr (depending on errors in the process). Since ib_send_cm_sidr_rep is called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv could be either erased from the rb_tree twice or not erased at all. Fixing that by making sure it's erased only once before freeing cm_id_priv. Fixes: `a977049dac` ('[PATCH] IB: Add the kernel CM implementation') Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-21 15:43:12 -04:00
Haggai Eran	ab3964ad2a	IB/cma: Use inner P_Key to determine netdev When discussing the patches to demux ids in rdma_cm instead of ib_cm, it was decided that it is best to use the P_Key value in the packet headers. However, the mlx5 and ipath drivers are currently unable to send correct P_Key values in GMP headers. They always send using a single P_Key that is set during the GSI QP initialization. Change the rdma_cm code to look at the P_Key value that is part of the packet payload as a workaround. Once the drivers are fixed this patch can be reverted. Fixes: `4c21b5bcef` ("IB/cma: Add net_dev and private data checks to RDMA CM") Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-20 14:16:51 -04:00
Sasha Levin	0174b381ca	IB/ucma: check workqueue allocation before usage Allocating a workqueue might fail, which wasn't checked so far and would lead to NULL ptr derefs when an attempt to use it was made. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-20 13:35:51 -04:00
Haggai Eran	b3b51f9f6f	IB/cma: Potential NULL dereference in cma_id_from_event If the lookup of a listening ID failed for an AF_IB request, the code would try to call dev_put() on a NULL net_dev. Fixes: `be688195bd` ("IB/cma: Fix net_dev reference leak with failed requests") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-20 13:13:42 -04:00
Matan Barak	3909642034	IB/core: Fix use after free of ifa When using ifup/ifdown while executing enum_netdev_ipv4_ips, ifa could become invalid and cause use after free error. Fixing it by protecting with RCU lock. Fixes: `03db3a2d81` ('IB/core: Add RoCE GID table management') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-20 13:10:46 -04:00
David S. Miller	26440c835f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/asix_common.c net/ipv4/inet_connection_sock.c net/switchdev/switchdev.c In the inet_connection_sock.c case the request socket hashing scheme is completely different in net-next. The other two conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-20 06:08:27 -07:00
Ivan Vecera	47ea032533	drivers/net: get rid of unnecessary initializations in .get_drvinfo() Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len, eedump_len & regdump_len fields in their .get_drvinfo() ethtool op. It's not necessary as these fields is filled in ethtool_get_drvinfo(). v2: removed unused variable v3: removed another unused variable Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:24:10 -07:00
Linus Torvalds	58bd6e0602	Changes for 4.3-rc5 - Work around connection namespace lookup bug related to RoCE - Change usnic license to Dual GPL/BSD (was intended to be that way all along, but wasn't clear, permission from contributors was chased down) - Fix an issue between NFSoRDMA and mlx5 that could cause an oops - Fix leak of sendonly multicast groups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWHoT/AAoJELgmozMOVy/deK4QALETCToLcR5RRDR+QleFUvby FnP91Pu9zGOoiuP25FT5Ny0YAmTHd1KiDQBQHRe/NrYDCH2M/q8jFJSWZLwGrG6q 8GYc1ieozGQMZvId3ZJnqUJUTEyJu9QtpiFFZJYJHriP6OShP1GiHJ/XTN9dvJ/u xcmViAYYIjjScjaY1MuYpxKITFwfZE0HtdvK7zzq+F9cpfmC//Zc0Po4V4o4Y9V3 14WgbWZyhehmECKwN95hIY1pLySadgcCxoeUDHclQ3efKLar4tEC3SOM2QZsnNRc qlCHEZYeB5TRo0dF/2CYUMLfUHkMjnUpW2BiVDbQfmPio7lGUjh2SBAQjI5i1dEQ Wg69JH1TV7BYfRiwe7n49P/BJ2vIhCR2UjQrHjilZ/h6DPSfKy29hVSvOzb5xLeJ mwl/KSKxlfT2Z1SZy0yMlJfCm8tjPwf6WhOVwkFRAhYHD3Yf31EMVzD7gTtW2MXO n5S80k5ccJlXniPWjaqerhjOZHmwHViBmHNlN4zlDCRZeT9IuKDj5mi31f7HC4gx WqJtSjRxydpbGPKROHI4vrmfARPAKNrKhj8BiqxO5Cja+TiS2QeXXr+fbRwETrLS TjXWNfS3Boy564AJ8Gfug2wfBcHwY+31Uv2a6nrMmKi+wwVexF/ENOb/x9WHZrVo VqQVI2lUBH2LsmzadD9c =usb1 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "We have four batched up patches for the current rc kernel. Two of them are small fixes that are obvious. One of them is larger than I would like for a late stage rc pull, but we found an issue in the namespace lookup code related to RoCE and this works around the issue for now (we allow a lookup with a namespace to succeed on RoCE since RoCE namespaces aren't implemented yet). This will go away in 4.4 when we put in support for namespaces in RoCE devices. The last one is large in terms of lines, but is all legal and no functional changes. Cisco needed to update their files to be more specific about their license. They had intended the files to be dual licensed as GPL/BSD all along, and specified that in their module license tag, but their file headers were not up to par. They contacted all of the contributors to get agreement and then submitted a patch to update the license headers in the files. Summary: - Work around connection namespace lookup bug related to RoCE - Change usnic license to Dual GPL/BSD (was intended to be that way all along, but wasn't clear, permission from contributors was chased down) - Fix an issue between NFSoRDMA and mlx5 that could cause an oops - Fix leak of sendonly multicast groups" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/ipoib: For sendonly join free the multicast group on leave IB/cma: Accept connection without a valid netdev on RoCE xprtrdma: Don't require LOCAL_DMA_LKEY support for fastreg usnic: add missing clauses to BSD license	2015-10-15 13:44:35 -07:00
Doron Tsur	17b38fb890	IB/core: Fix memory corruption in ib_cache_gid_set_default_gid When ib_cache_gid_set_default_gid is called from several threads, updating the table could make find_gid fail, therefore a negative index will be retruned and an invalid table entry will be used. Locking find_gid as well fixes this problem. Fixes: `03db3a2d81` ('IB/core: Add RoCE GID table management') Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-15 12:35:54 -04:00
Christoph Hellwig	adec640e03	mlx5: stop including <asm-generic/kmap_types.h> <linux/highmem.h> is the placace the get the kmap type flags, asm-generic files are generic implementations only to be used by architecture code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2015-10-15 00:21:10 +02:00
Christoph Hellwig	2eafd72939	target: use per-attribute show and store methods This also allows to remove the target-specific old configfs macros, and gets rid of the target_core_fabric_configfs.h header which only had one function declaration left that could be moved to a better place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Acked-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-10-13 22:17:49 -07:00
Christoph Lameter	0b5c9279e5	IB/ipoib: For sendonly join free the multicast group on leave When we leave the multicast group on expiration of a neighbor we do not free the mcast structure. This results in a memory leak that causes ib_dealloc_pd to fail and print a WARN_ON message and backtrace. Fixes: `bd99b2e05c` (IB/ipoib: Expire sendonly multicast joins) Signed-off-by: Christoph Lameter <cl@linux.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-13 16:43:59 -04:00
Christoph Hellwig	25556ae6b9	IB: remove xrc_remote_srq_num from struct ib_send_wr The field is only initialized in mlx, but never used. If we want to add proper XRC support it should be done with a new struct ib_xrc_wr. This shrinks the various WR structures by another 4 bytes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Haggai Eran <haggaie@mellanox.com>	2015-10-08 11:09:11 +01:00
Christoph Hellwig	e622f2f4ad	IB: split struct ib_send_wr This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: Haggai Eran <haggaie@mellanox.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com>	2015-10-08 11:09:10 +01:00
Haggai Eran	b8cab5dab1	IB/cma: Accept connection without a valid netdev on RoCE The netdev checks recently added to RDMA CM expect a valid netdev to be found for both InfiniBand and RoCE, but the code that find a netdev is only implemented for InfiniBand. Currently RoCE doesn't provide an API to find the netdev matching a given set of parameters, so this patch just disables the netdev enforcement for each incoming connections when the link layer is RoCE. Fixes: `4c21b5bcef` ("IB/cma: Add net_dev and private data checks to RDMA CM") Reported-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-06 14:25:16 -04:00
Jeff Squyres	3805eade3b	usnic: add missing clauses to BSD license The usnic_verbs kernel module was clearly marked with the following in its code: MODULE_LICENSE("Dual BSD/GPL"); However, we accidentally left a few clauses of the BSD text out of the license header in all the source files. This commit fixes that: all the files are properly dual BSD/GPL-licensed. Contributors that might have been confused by this have been contacted to get their permission and are Cc:ed here. Cc: Benoit Taine <benoit.taine@lip6.fr> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Masanari Iida <standby24x7@gmail.com> Cc: Matan Barak <matanb@mellanox.com> Cc: Michael Wang <yun.wang@profitbricks.com> Cc: Roland Dreier <roland@purestorage.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Fabio Estevam <fabio.estevam@freescale.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-06 13:43:25 -04:00
Linus Torvalds	46c8217c4a	Changes for 4.3-rc4 - Fixes for mlx5 related issues - Fixes for ipoib multicast handling -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWCfALAAoJELgmozMOVy/dc+MQAKoD6echYpTkWE0otMuHQcYf zMaVVots+JdRKpA6OqHYQHgKGA80z21BpnjGYwcwB5zB1zPrJwz4vxwGlOBHt01T xLBReFgSKyJlgOWLXKfPx4bXUdivOBKm203wY0dh+/dC/VROGYoiXYTmSDsfsuKa 8OXT1kWgzRVLtqwqj5GSkgWvtFZ28CjKh6d9egjqcj9tpbh2UupQDZzMyOtZ52X6 Nz/Vo3u4T7qjzlhHOlCwHCDw+97x0yvmvLY1mWweGPfKOnxtXjkzQmTQEpyzU5Mo EwcqJucrBnmjbLAIBMrbR1mzTUQeD4dHz1jx+EzWE0lVnRL3twe1UaY40176sNlm aCBA4bIOQ242r3IJ++ss15ol1k5hu7PYKRn9Q8d2sSbQGcSnCHe/YOutQQ+FTEFG yE9xiLL+pgT8koauROnxg66E3HDM78NGTpjP3EuG4r2Qwa1iFANPfDB6kikuv8bO rG3qUJcloEPvfatZY+h5QC4UCoB0/W1DAhlfzE3tPBYPmhSEgQDfEOzXTKDakeF0 VB903bYrOL3CVOun4I7fLrDc1leVeiAUKqO2orZs3qIpRWvAKyV/VjolAusMv2+F /4xPyh95AEMTFfmZogOCofQFk3eOnkWpLdrVTYCKy3i6NVBoy2wHldrl+LuCAN/m r/DNRBmazShashbeU6wg =8+cX -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: - Fixes for mlx5 related issues - Fixes for ipoib multicast handling * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/ipoib: increase the max mcast backlog queue IB/ipoib: Make sendonly multicast joins create the mcast group IB/ipoib: Expire sendonly multicast joins IB/mlx5: Remove pa_lkey usages IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY IB/iser: Add module parameter for always register memory xprtrdma: Replace global lkey with lkey local to PD	2015-10-01 16:38:52 -04:00
Bodong Wang	070b399723	IB/mlx4: Report checksum offload cap for RAW QP when query device Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-28 22:10:15 -04:00
Linus Torvalds	c91d707295	Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target fixes from Nicholas Bellinger: "This includes a iser-target series from Jenny + Sagi @ Mellanox that addresses the few remaining active I/O shutdown bugs, along with a patch to support zero-copy for immediate data payloads that gives a nice performance improvement for small block WRITEs. Also included are some recent >= v4.2 regression bug-fixes. The most notable is a RCU conversion regression for SPC-3 PR registrations, and recent removal of obsolete RFC-3720 markers that introduced a login regression bug with MSFT iSCSI initiators. Thanks to everyone who has been testing + reporting bugs for v4.x" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: iscsi-target: Avoid OFMarker + IFMarker negotiation target: Make TCM_WRITE_PROTECT failure honor D_SENSE bit target: Fix target_sense_desc_format NULL pointer dereference target: Propigate backend read-only to core_tpg_add_lun target: Fix PR registration + APTPL RCU conversion regression iser-target: Skip data copy if all the command data comes as immediate iser-target: Change the recv buffers posting logic iser-target: Fix pending connections handling in target stack shutdown sequnce iser-target: Remove np_ prefix from isert_np members iser-target: Remove unused variables iser-target: Put the reference on commands waiting for unsol data iser-target: remove command with state ISTATE_REMOVE	2015-09-26 21:02:42 -04:00
Doug Ledford	2866196f29	IB/ipoib: increase the max mcast backlog queue When performing sendonly joins, we queue the packets that trigger the join until the join completes. This may take on the order of hundreds of milliseconds. It is easy to have many more than three packets come in during that time. Expand the maximum queue depth in order to try and prevent dropped packets during the time it takes to join the multicast group. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-25 22:30:24 -04:00
Doug Ledford	c3852ab0e6	IB/ipoib: Make sendonly multicast joins create the mcast group Since IPoIB should, as much as possible, emulate how multicast sends work on Ethernet for regular TCP/IP apps, there should be no requirement to subscribe to a multicast group before your sends are properly sent. However, due to the difference in how multicast is handled on InfiniBand, we must join the appropriate multicast group before we can send to it. Previously we tried not to trigger the auto-create feature of the subnet manager when doing this because we didn't have tracking of these sendonly groups and the auto-creation might never get undone. The previous patch added timing to these sendonly joins and allows us to leave them after a reasonable idle expiration time. So supply all of the information needed to auto-create group. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-25 14:46:58 -04:00
Christoph Lameter	bd99b2e05c	IB/ipoib: Expire sendonly multicast joins On neighbor expiration, check to see if the neighbor was actually a sendonly multicast join, and if so, leave the multicast group as we expire the neighbor. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-25 14:43:19 -04:00
Sagi Grimberg	81fb5e26a9	IB/mlx5: Remove pa_lkey usages Since mlx5 driver cannot rely on registration using the reserved lkey (global_dma_lkey) it used to allocate a private physical address lkey for each allocated pd. Commit `96249d70dd` ("IB/core: Guarantee that a local_dma_lkey is available") just does it in the core layer so we can go ahead and use that. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-25 10:46:51 -04:00
Sagi Grimberg	c6790aa9f4	IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY Commit `96249d70dd` ("IB/core: Guarantee that a local_dma_lkey is available") allows ULPs that make use of the local dma key to keep working as before by allocating a DMA MR with local permissions and converted these consumers to use the MR associated with the PD rather then device->local_dma_lkey. ConnectIB has some known issues with memory registration using the local_dma_lkey (SEND, RDMA, RECV seems to work ok). Thus don't expose support for it (remove device->local_dma_lkey setting), and take advantage of the above commit such that no regression is introduced to working systems. The local_dma_lkey support will be restored in CX4 depending on FW capability query. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-25 10:46:51 -04:00
Sagi Grimberg	3cffd93017	IB/iser: Add module parameter for always register memory This module parameter forces memory registration even for a continuous memory region. It is true by default as sending an all-physical rkey with remote permissions might be insecure. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-25 10:46:51 -04:00
Linus Torvalds	72714841b7	Changes for 4.3-rc1 - Move ehca driver to staging/rdma and schedule for deletion -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV9xxNAAoJELgmozMOVy/dI+QP/R0afI1zc7pJe1D2RmrZWnAH HDWSNYJ8eVk25ptnaOes9kV6yuOV1nGiyCIGqUXk8Q7rVYbIIz0nK81S+gPLMpTU /la28bmUz7szvEtY+gFHZCuQvCiwjM+N7t39aehIMCc/vYnIdrOXuYMhB8kd6flL 2mpnUFmfqMZLGh4RY1PQB9cNwVW9yYOVnG+1Z/M26wWl3NIDe4jiNoqV/jFvEf2v FyzjcYfjexVA47XOsfGy512Vlb+8DTtFeCz/TRl9e8tYCSHOmTcXv7jLywRoA1zn NNwJnXhliM9/EDqtbJMymYDAm9vzcfvXPCx1OOisVWxJxg9IF3p2/uPLeIKUSXZl 15mfOzoTuNXlunHDQYbzC4+P2zs+uU2k6ivpd7HZgyiBkUXzftYh/6gIJcJOreU6 1ndt2fK5hcG/Was+v6kkYaw0x6ZpawyLqHMs32V1xVcbmM5P1OFFOQFx1KDYhHH8 WmOlcd/XmIjNY5o3ySQHSKjO6Ar0IfNyzpU3jUzg2AFbZv8m2BgYEhQOnrggxBtM deFbDlxMnIrsIJtsHBmr5Y/nsgIYpkxLK4xUod76yZSRuyjLCH/2C0otubS1w1vH 4unU9zJXboBi7naPE5/fdIPMhwZu1X47xUyrwGksz3pHyiRpnhq55t+AWpsHCrg+ +XYexztBX/cPReacpWJR =4XzR -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma driver move from Doug Ledford: "This is a move only, no functional changes. I tried to get it in prior to the rc1 release, but we were waiting on IBM to get back to us that they were OK with the deprecation and eventual removal of this driver. That OK didn't materialize until last week, so integration and testing time pushed us beyond the rc1 release. Summary: - Move ehca driver to staging/rdma and schedule for deletion" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/ehca: Deprecate driver, move to staging, schedule deletion	2015-09-16 09:16:20 -07:00
Jenny Derzhavetz	9fd60088ff	iser-target: Skip data copy if all the command data comes as immediate Given that supporting zcopy immediate data for all IOs requires iser driver to use its own buffer allocations, we settle with avoiding data copy for IOs with data length of up to 8K (which is more latency sensitive anyway). This trims IO write latency by up to 3us and increase IOPs by up to 40% by saving CPU time doing sg_copy_from_buffer (8K IO size is the obvious winner here). Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-09-15 15:47:31 -07:00
Jenny Derzhavetz	4366b19ca5	iser-target: Change the recv buffers posting logic iser target batches post recv operations to avoid the overhead of acquiring the recv queue lock and posting a HW doorbell for each command. We change it to be per command in order to support zcopy immediate data for IOs that fits in the 8K transfer boundary (in the next patch). (Fix minor patch fuzz due to ib_mr removal - nab) Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-09-15 15:47:29 -07:00
Jenny Derzhavetz	bd3792205a	iser-target: Fix pending connections handling in target stack shutdown sequnce Instead of handing a connection to the iscsi stack for processing right after accepting (rdma_accept) we only hand the connection to the iscsi core after we reached to a connected state (ESTABLISHED CM event). This will prevent two error scenrios: 1. race between rdma connection teardown and iscsi login sequence reported by Nic in: (`ce9a9fc20a` "iser-target: Fix REJECT CM event use-after-free OOPs") 2. target stack shutdown sequence race with constant login attempts by multiple initiators. We address this by maintaining two queues at the isert_np level: - accepted: connections that were accepted but have not reached connected state (might get rejected, unreachable or error). - pending: connections in connected state, but have yet to handed to the iscsi core for login processing. iser connections are promoted to the pending queue only from the accepted queue. This way the iscsi core now will only handle functional iser connections and once we shutdown the target stack, we look for any stales that got left behind so we can safely release them. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-09-15 15:47:27 -07:00
Jenny Derzhavetz	ed8cb0a437	iser-target: Remove np_ prefix from isert_np members These are always referenced from np-> so no need for the prefix. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-09-15 15:47:25 -07:00
Jenny Derzhavetz	f27dfa1f0e	iser-target: Remove unused variables Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-09-15 15:47:23 -07:00
Jenny Derzhavetz	3e03c4b01d	iser-target: Put the reference on commands waiting for unsol data The iscsi target core teardown sequence calls wait_conn for all active commands to finish gracefully by: - move the queue-pair to error state - drain all the completions - wait for the core to finish handling all session commands However, when tearing down a session while there are sequenced commands that are still waiting for unsolicited data outs, we can block forever as these are missing an extra reference put. We basically need the equivalent of iscsit_free_queue_reqs_for_conn() which is called after wait_conn has returned. Address this by an explicit walk on conn_cmd_list and put the extra reference. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-09-15 15:47:21 -07:00
Jenny Derzhavetz	a4c15cd957	iser-target: remove command with state ISTATE_REMOVE As documented in iscsit_sequence_cmd: /* * Existing callers for iscsit_sequence_cmd() will silently * ignore commands with CMDSN_LOWER_THAN_EXP, so force this * return for CMDSN_MAXCMDSN_OVERRUN as well.. */ We need to silently finish a command when it's in ISTATE_REMOVE. This fixes an teardown hang we were seeing where a mis-behaved initiator (triggered by allocation error injections) sent us a cmdsn which was lower than expected. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-09-15 15:47:19 -07:00
Linus Torvalds	05c78081d2	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "Here are the outstanding target-pending updates for v4.3-rc1. Mostly bug-fixes and minor changes this round. The fallout from the big v4.2-rc1 RCU conversion have (thus far) been minimal. The highlights this round include: - Move sense handling routines into scsi_common code (Sagi) - Return ABORTED_COMMAND sense key for PI errors (Sagi) - Add tpg_enabled_sendtargets attribute for disabled iscsi-target discovery (David) - Shrink target struct se_cmd by rearranging fields (Roland) - Drop iSCSI use of mutex around max_cmd_sn increment (Roland) - Replace iSCSI __kernel_sockaddr_storage with sockaddr_storage (Andy + Chris) - Honor fabric max_data_sg_nents I/O transfer limit (Arun + Himanshu + nab) - Fix EXTENDED_COPY >= v4.1 regression OOPsen (Alex + nab)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (37 commits) target: use stringify.h instead of own definition target/user: Fix UFLAG_UNKNOWN_OP handling target: Remove no-op conditional target/user: Remove unused variable target: Fix max_cmd_sn increment w/o cmdsn mutex regressions target: Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sess target/qla2xxx: Honor max_data_sg_nents I/O transfer limit target/iscsi: Replace __kernel_sockaddr_storage with sockaddr_storage target/iscsi: Replace conn->login_ip with login_sockaddr target/iscsi: Keep local_ip as the actual sockaddr target/iscsi: Fix np_ip bracket issue by removing np_ip target: Drop iSCSI use of mutex around max_cmd_sn increment qla2xxx: Update tcm_qla2xxx module description to 24xx+ iscsi-target: Add tpg_enabled_sendtargets for disabled discovery drivers: target: Drop unlikely before IS_ERR(_OR_NULL) target: check DPO/FUA usage for COMPARE AND WRITE target: Shrink struct se_cmd by rearranging fields target: Remove cmd->se_ordered_id (unused except debug log lines) target: add support for START_STOP_UNIT SCSI opcode target: improve unsupported opcode message ...	2015-09-11 19:00:42 -07:00
Doug Ledford	447e9a4d27	IB/ehca: Deprecate driver, move to staging, schedule deletion The ehca driver is only supported on IBM machines with a custom EBus. As they have opted to build their newer machines using more industry standard technology and haven't really been pushing EBus capable machines for a while, this driver can now safely be moved to the staging area and scheduled for eventual removal. This plan was brought to IBM's attention and received their sign-off. Cc: alexs@linux.vnet.ibm.com Cc: hnguyen@de.ibm.com Cc: raisch@de.ibm.com Cc: stefan.roscher@de.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-11 18:13:35 -04:00
Kirill A. Shutemov	7cbea8dc01	mm: mark most vm_operations_struct const With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct structs should be constant. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-10 13:29:01 -07:00
Linus Torvalds	26d2177e97	Changes for 4.3 - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc. mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV7v8wAAoJELgmozMOVy/d2dcP/3PXnGFPgFGJODKE6VCZtTvj nooNXRKXjxv470UT5DiAX7SNcBxzzS7Zl/Lj+831H9iNXUyzuH31KtBOAZ3W03vZ yXwCB2caOStSldTRSUUvPe2aIFPnyNmSpC4i6XcJLJMCFijKmxin5pAo8qE44BQU yjhT+wC9P6LL5wZXsn/nFIMLjOFfu0WBFHNp3gs5j59paxlx5VeIAZk16aQZH135 m7YCyicwrS8iyWQl2bEXRMon2vlCHlX2RHmOJ4f/P5I0quNcGF2+d8Yxa+K1VyC5 zcb3OBezz+wZtvh16yhsDfSPqHWirljwID2VzOgRSzTJWvQjju8VkwHtkq6bYoBW egIxGCHcGWsD0R5iBXLYr/tB+BmjbDObSm0AsR4+JvSShkeVA1IpeoO+19162ixE n6CQnk2jCee8KXeIN4PoIKsjRSbIECM0JliWPLoIpuTuEhhpajftlSLgL5hf1dzp HrSy6fXmmoRj7wlTa7DnYIC3X+ffwckB8/t1zMAm2sKnIFUTjtQXF7upNiiyWk4L /T1QEzJ2bLQckQ9yY4v528SvBQwA4Dy1amIQB7SU8+2S//bYdUvhysWPkdKC4oOT WlqS5PFDCI31MvNbbM3rUbMAD8eBAR8ACw9ZpGI/Rffm5FEX5W3LoxA8gfEBRuqt 30ZYFuW8evTL+YQcaV65 =EHLg -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...	2015-09-09 08:33:31 -07:00
Linus Torvalds	4e4adb2f46	NFS client updates for Linux 4.3 Highlights include: Stable patches: - Fix atomicity of pNFS commit list updates - Fix NFSv4 handling of open(O_CREAT\|O_EXCL\|O_RDONLY) - nfs_set_pgio_error sometimes misses errors - Fix a thinko in xs_connect() - Fix borkage in _same_data_server_addrs_locked() - Fix a NULL pointer dereference of migration recovery ops for v4.2 client - Don't let the ctime override attribute barriers. - Revert "NFSv4: Remove incorrect check in can_open_delegated()" - Ensure flexfiles pNFS driver updates the inode after write finishes - flexfiles must not pollute the attribute cache with attrbutes from the DS - Fix a protocol error in layoutreturn - Fix a protocol issue with NFSv4.1 CLOSE stateids Bugfixes + cleanups - pNFS blocks bugfixes from Christoph - Various cleanups from Anna - More fixes for delegation corner cases - Don't fsync twice for O_SYNC/IS_SYNC files - Fix pNFS and flexfiles layoutstats bugs - pnfs/flexfiles: avoid duplicate tracking of mirror data - pnfs: Fix layoutget/layoutreturn/return-on-close serialisation issues. - pnfs/flexfiles: error handling retries a layoutget before fallback to MDS Features: - Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from Kinglong - More RDMA client transport improvements from Chuck - Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr() verbs from the SUNRPC, Lustre and core infiniband tree. - Optimise away the close-to-open getattr if there is no cached data -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV7chgAAoJEGcL54qWCgDyqJQP/3kto9VXnXcatC382jF9Pfj5 F55XeSnviOXH7CyiKA4nSBhnxg/sLuWOTpbkVI/4Y+VyWhLby9h+mtcKURHOlBnj d5BFoPwaBVDnUiKlHFQDkRjIyxjj2Sb6/uEb2V/u3v+3znR5AZZ4lzFx4cD85oaz mcru7yGiSxaQCIH6lHExcCEKXaDP5YdvS9YFsyQfv2976JSaQHM9ZG04E0v6MzTo E5wwC4CLMKmhuX9kmQMj85jzs1ASAKZ3N7b4cApTIo6F8DCDH0vKQphq/nEQC497 ECjEs5/fpxtNJUpSBu0gT7G4LCiW3PzE7pHa+8bhbaAn9OzxIR5+qWujKsfGYQhO Oomp3K9zO6omshAc5w4MkknPpbImjoZjGAj/q/6DbtrDpnD7DzOTirwYY2yX0CA8 qcL81uJUb8+j4jJj4RTO+lTUBItrM1XTqTSd/3eSMr5DDRVZj+ERZxh17TaxRBZL YrbrLHxCHrcbdSbPlovyvY+BwjJUUFJRcOxGQXLmNYR9u92fF59rb53bzVyzcRRO wBozzrNRCFL+fPgfNPLEapIb6VtExdM3rl2HYsJGckHj4DPQdnoB3ytIT9iEFZEN +/rE14XEZte7kuH3OP4el2UsP/hVsm7A49mtwrkdbd7rxMWD6XfQUp8DAggWUEtI 1H6T7RC1Y6wsu0X1fnVz =knJA -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable patches: - Fix atomicity of pNFS commit list updates - Fix NFSv4 handling of open(O_CREAT\|O_EXCL\|O_RDONLY) - nfs_set_pgio_error sometimes misses errors - Fix a thinko in xs_connect() - Fix borkage in _same_data_server_addrs_locked() - Fix a NULL pointer dereference of migration recovery ops for v4.2 client - Don't let the ctime override attribute barriers. - Revert "NFSv4: Remove incorrect check in can_open_delegated()" - Ensure flexfiles pNFS driver updates the inode after write finishes - flexfiles must not pollute the attribute cache with attrbutes from the DS - Fix a protocol error in layoutreturn - Fix a protocol issue with NFSv4.1 CLOSE stateids Bugfixes + cleanups - pNFS blocks bugfixes from Christoph - Various cleanups from Anna - More fixes for delegation corner cases - Don't fsync twice for O_SYNC/IS_SYNC files - Fix pNFS and flexfiles layoutstats bugs - pnfs/flexfiles: avoid duplicate tracking of mirror data - pnfs: Fix layoutget/layoutreturn/return-on-close serialisation issues - pnfs/flexfiles: error handling retries a layoutget before fallback to MDS Features: - Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from Kinglong - More RDMA client transport improvements from Chuck - Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr() verbs from the SUNRPC, Lustre and core infiniband tree. - Optimise away the close-to-open getattr if there is no cached data" * tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits) NFSv4: Respect the server imposed limit on how many changes we may cache NFSv4: Express delegation limit in units of pages Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files" NFS: Optimise away the close-to-open getattr if there is no cached data NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error() nfs: Remove unneeded checking of the return value from scnprintf nfs: Fix truncated client owner id without proto type NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid() NFSv4.1/flexfiles: Fix freeing of mirrors NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file NFSv4.1/pnfs: Handle LAYOUTGET return values correctly NFSv4.1/pnfs: Don't ask for a read layout for an empty file. NFSv4.1: Fix a protocol issue with CLOSE stateids NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors SUNRPC: Prevent SYN+SYNACK+RST storms SUNRPC: xs_reset_transport must mark the connection as disconnected NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload ...	2015-09-07 14:02:24 -07:00
Jason Gunthorpe	d1178cbcdc	IB/ipoib: Suppress warning for send only join failures We expect send only joins to fail, it just means there are no listeners for the group. The correct thing to do is silently drop the packet at source. Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet to 224.0.0.22, and then a warning level kmessage like this: ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22 If there is no IP router listening to IGMP. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-03 17:11:05 -04:00
Doug Ledford	c3acdc06a9	IB/ipoib: Clean up send-only multicast joins Even though we don't expect the group to be created by the SM we sill need to provide all the parameters to force the SM to validate they are correct. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-03 17:05:58 -04:00
Sagi Grimberg	7fbc67df2c	IB/srp: Fix possible protection fault srp_destroy_qp is designed to indicate we are safe to continue with freeing the channel resources by modifying the qp error state, posting a dummy wr on the queue-pair and waiting for it to flush. This also holds for the channel registration pool as we are unmapping the memory region when handling a scsi response. Destroying the channel registration pool before we make sure we processed all the inflight IO might introduce a use-after-free of the registration pool. This use-after-free is demonstrated in the stack trace below where srp is trying to unmap a used FMR after the fmr_pool was already destroyed. general protection fault: 0000 [#1] SMP RIP: 0010:[<ffffffff8151121b>] [<ffffffff8151121b>] _raw_spin_lock_irqsave+0x1b/0x50 Call Trace: [<ffffffffa055d88a>] ib_fmr_pool_unmap+0x1a/0xb0 [ib_core] [<ffffffffa06c00ed>] srp_unmap_data.isra.28+0x17d/0x250 [ib_srp] [<ffffffffa06c01eb>] srp_free_req+0x2b/0x60 [ib_srp] [<ffffffffa06c0c94>] srp_recv_completion+0x174/0x580 [ib_srp] [<ffffffffa04580fe>] mlx4_eq_int+0x4de/0xe50 [mlx4_core] [<ffffffffa0458b00>] mlx4_msi_x_interrupt+0x10/0x20 [mlx4_core] [<ffffffff810abc45>] handle_irq_event_percpu+0x35/0x1b0 [<ffffffff810abdf2>] handle_irq_event+0x32/0x50 [<ffffffff810ae5cf>] handle_edge_irq+0x6f/0x120 [<ffffffff8100455a>] handle_irq+0x1a/0x30 [<ffffffff8151b475>] do_IRQ+0x45/0xb0 [<ffffffff8151162d>] common_interrupt+0x6d/0x6d [<ffffffff813e4d2f>] cpuidle_enter_state+0x4f/0xc0 [<ffffffff813e4e6c>] cpuidle_idle_call+0xcc/0x210 [<ffffffff8100b9ea>] arch_cpu_idle+0xa/0x30 [<ffffffff810ab1e1>] cpu_startup_entry+0xe1/0x270 [<ffffffff81030b3a>] start_secondary+0x21a/0x2c0 Reported-by: Eliott Kespi <eliottk@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-03 15:59:48 -04:00
Ira Weiny	0629cb06cd	IB/core: Move SM class defines from ib_mad.h to ib_smi.h When the hfi1 driver was added these definitions were moved from the qib driver to ib_mad.h to be used by both qib and hfi1. They should have been moved to ib_smi.h instead. Fixes: `d4ab347005` ("IB/core: Add core header changes needed for OPA") Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-03 15:50:32 -04:00
Sagi Grimberg	b636401f0e	mlx5: Fix incorrect wc pkey_index assignment for GSI messages Since patch series "Demux IB CM requests in the rdma_cm module" the P_Key index is taken from the work completion rather than the message itself. The HCA provides us with the message P_Key. In order to provide the P_Key index, we need to look it up. Given that this is relevant only for GSI messages (session establishments) which is less performance critical, micro-optimize against the GSI (is_qp1) branch. Fixes: `4c21b5bcef` ("IB/cma: Add net_dev and private data checks to RDMA CM") Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-03 14:50:06 -04:00
Haggai Eran	11d748045c	IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow The mlx5_ib_reg_user_mr() function will attempt to call clean_mr() in its error flow even though there is never a case where the error flow occurs with a valid MR pointer to destroy. Remove the clean_mr() call and the incorrect comment above it. Fixes: `b4cfe447d4` ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers") Cc: Eli Cohen <eli@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-03 14:42:54 -04:00
Christoph Hellwig	b632ffa7ce	IB/uverbs: reject invalid or unknown opcodes We have many WR opcodes that are only supported in kernel space and/or require optional information to be copied into the WR structure. Reject all those not explicitly handled so that we can't pass invalid information to drivers. Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-03 14:25:24 -04:00
Nicholas Krause	54b9a96f10	IB/cxgb4: Fix if statement in pick_local_ip6adddrs This fixes an if statement checking the return value of the function get_lladdr for success in the function pick_local_ip6addrs to instead of directly checking the return value of this call check the opposite as get_lladdr returns zero for success which would incorrectly make this if statement block not execute with the current if statement check. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-03 14:03:59 -04:00
Kaike Wan	ba13b5f8f8	IB/sa: Fix rdma netlink message flags The flags to ibnl_put_msg should be NLM_F_REQUEST instead of GFP_KERNEL. Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: John Fleck <john.fleck@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-02 13:58:54 -04:00
Yishai Hadas	e1c30298cc	IB/ucma: HW Device hot-removal support Currently, IB/cma remove_one flow blocks until all user descriptor managed by IB/ucma are released. This prevents hot-removal of IB devices. This patch allows IB/cma to remove devices regardless of user space activity. Upon getting the RDMA_CM_EVENT_DEVICE_REMOVAL event we close all the underlying HW resources for the given ucontext. The ucontext itself is still alive till its explicit destroying by its creator. Running applications at that time will have some zombie device, further operations may fail. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:41 -04:00
Yishai Hadas	ae184ddeca	IB/mlx4_ib: Disassociate support Implements the IB core disassociate_ucontext API. The driver detaches the HW resources for a given user context to prevent a dependency between application termination and device disconnecting. This is done by managing the VMAs that were mapped to the HW bars such as door bell and blueflame. When need to detach remap them to an arbitrary kernel page returned by the zap API. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:40 -04:00
Yishai Hadas	036b106357	IB/uverbs: Enable device removal when there are active user space applications Enables the uverbs_remove_one to succeed despite the fact that there are running IB applications working with the given ib device. This functionality enables a HW device to be unbind/reset despite the fact that there are running user space applications using it. It exposes a new IB kernel API named 'disassociate_ucontext' which lets a driver detaching its HW resources from a given user context without crashing/terminating the application. In case a driver implemented the above API and registered with ib_uverb there will be no dependency between its device to its uverbs_device. Upon calling remove_one of ib_uverbs the call should return after disassociating the open HW resources without waiting to clients disconnecting. In case driver didn't implement this API there will be no change to current behaviour and uverbs_remove_one will return only when last client has disconnected and reference count on uverbs device became 0. In case the lower driver device was removed any application will continue working over some zombie HCA, further calls will ended with an immediate error. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:40 -04:00
Yishai Hadas	057aec0d23	IB/uverbs: Explicitly pass ib_dev to uverbs commands Done in preparation for deploying RCU for the device removal flow. Allows isolating the RCU handling to the uverb_main layer and keeping the uverbs_cmd code as is. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:40 -04:00
Yishai Hadas	35d4a0b63d	IB/uverbs: Fix race between ib_uverbs_open and remove_one Fixes: `2a72f21226` ("IB/uverbs: Remove dev_table") Before this commit there was a device look-up table that was protected by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When it was dropped and container_of was used instead, it enabled the race with remove_one as dev might be freed just after: dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but before the kref_get. In addition, this buggy patch added some dead code as container_of(x,y,z) can never be NULL and so dev can never be NULL. As a result the comment above ib_uverbs_open saying "the open method will either immediately run -ENXIO" is wrong as it can never happen. The solution follows Jason Gunthorpe suggestion from below URL: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html cdev will hold a kref on the parent (the containing structure, ib_uverbs_device) and only when that kref is released it is guaranteed that open will never be called again. In addition, fixes the active count scheme to use an atomic not a kref to prevent WARN_ON as pointed by above comment from Jason. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:40 -04:00
Yishai Hadas	03c40442a0	IB/uverbs: Fix reference counting usage of event files Fix the reference counting usage to be handled in the event file creation/destruction function, instead of being done by the caller. This is done for both async/non-async event files. Based on Jason Gunthorpe report at https://www.mail-archive.com/ linux-rdma@vger.kernel.org/msg24680.html: "The existing code for this is broken, in ib_uverbs_get_context all the error paths between ib_uverbs_alloc_event_file and the kref_get(file->ref) are wrong - this will result in fput() which will call ib_uverbs_event_close, which will try to do kref_put and ib_unregister_event_handler - which are no longer paired." Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:39 -04:00
Jason Gunthorpe	7dd78647a2	IB/core: Make ib_dealloc_pd return void The majority of callers never check the return value, and even if they did, they can't do anything about a failure. All possible failure cases represent a bug in the caller, so just WARN_ON inside the function instead. This fixes a few random errors: net/rd/iw.c infinite loops while it fails. (racing with EBUSY?) This also lays the ground work to get rid of error return from the drivers. Most drivers do not error, the few that do are broken since it cannot be handled. Since uverbs can legitimately make use of EBUSY, open code the check. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:39 -04:00
Bart Van Assche	03f6fb93fd	IB/srp: Create an insecure all physical rkey only if needed The SRP initiator only needs this if the insecure register_always=N performance optimization is enabled, or if FRWR/FMR is not supported in the driver. Do not create an all physical MR unless it is needed to support either of those modes. Default register_always to true so the out of the box configuration does not create an insecure all physical MR. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [bvanassche: reworked and rebased this patch] Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:39 -04:00
Bart Van Assche	330179f2fa	IB/srp: Register the indirect data buffer descriptor Instead of always using the global rkey for the indirect data buffer descriptor, register that descriptor with the HCA if the kernel module parameter register_always has been set to Y. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:38 -04:00
Bart Van Assche	002f15674c	IB/srp: Introduce srp_device.use_fmr Introduce the variable srp_device.use_fmr. Leave out the dev->has_fr / dev->has_fmr and ch->fr_pool / ch->fmr_pool checks since these are redundant. This patch does not change any functionality but makes the source code easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:38 -04:00
Bart Van Assche	3ae95da883	IB/srp: Remove use_mr argument from srp_map_sg_entry() Move the srp_map_desc() call from inside srp_map_sg_entry() to srp_map_sg() such that the use_mr argument can be removed from srp_map_sg_entry(). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:38 -04:00
Bart Van Assche	0e0d3a4800	IB/srp: Remove the memory registration backtracking code Mapping a discontiguous sg-list requires multiple memory regions and hence can exhaust the memory region pool. The SRP initiator already handles this by temporarily reducing the queue depth. This means that it is safe to remove the memory registration backtracking code. This patch has been tested with direct I/O sizes up to 256 MB. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:37 -04:00
Bart Van Assche	f731ed6293	IB/srp: Add memory descriptor array pointer range checking Although most paths through which a request is submitted check block layer parameters like the max_segments limit, these are not checked when an SG_IO or direct I/O request is submitted. Hence add a range check for the memory descriptor array pointer. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:37 -04:00
Bart Van Assche	7e85c91970	IB/srp: Use multiple registrations for large memory regions Instead of using the global rkey for large memory regions, use multiple registrations. See also the while (dma_len) loop further down in srp_map_sg_entry(). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:37 -04:00
Bart Van Assche	186fbc6689	IB/srp: Re-enable FMR for non-page aligned buffers During a discussion in 2011 nobody recalled why FMR was not used for non-page aligned buffers (see also http://thread.gmane.org/gmane.linux.drivers.rdma/7149). Re-enable FMR for such buffers. For the reason why the srp_map_fmr() function needs to be modified, see also patch "IB/srp: rework mapping engine to use multiple FMR entries" (commit ID 8f26c9ff9cd0; January 2011). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:36 -04:00
Jason Gunthorpe	5a783956c2	ib_srpt: Remove ib_get_dma_mr calls The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:35 -04:00
Jason Gunthorpe	e6bf5f48d2	IB/srp: Use pd->local_dma_lkey Replace all leys with pd->local_dma_lkey. This driver does not support iWarp, so this is safe. The insecure use of ib_get_dma_mr is thus isolated to an rkey, and will have to be fixed separately. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:35 -04:00
Jason Gunthorpe	34efc7dfbd	iser-target: Remove ib_get_dma_mr calls The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:35 -04:00
Jason Gunthorpe	256b7ad273	IB/iser: Use pd->local_dma_lkey Replace all leys with pd->local_dma_lkey. This driver does not support iWarp, so this is safe. The insecure use of ib_get_dma_mr is thus isolated to an rkey, and this looks trivially fixed by forcing the use of registration in a future patch. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:34 -04:00
Jason Gunthorpe	b37c788f59	IB/mlx5: Remove ib_get_dma_mr calls The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:34 -04:00
Jason Gunthorpe	7dd9757628	IB/mlx4: Remove ib_get_dma_mr calls The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:34 -04:00
Jason Gunthorpe	77b1f99660	IB/ipoib: Remove ib_get_dma_mr calls The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:34 -04:00
Jason Gunthorpe	4be90bc60d	IB/mad: Remove ib_get_dma_mr calls The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:33 -04:00
Jason Gunthorpe	96249d70dd	IB/core: Guarantee that a local_dma_lkey is available Every single ULP requires a local_dma_lkey to do anything with a QP, so let us ensure one exists for every PD created. If the driver can supply a global local_dma_lkey then use that, otherwise ask the driver to create a local use all physical memory MR associated with the new PD. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Acked-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:33 -04:00
Sagi Grimberg	7332bed085	IB/iser: Chain all iser transaction send work requests Chaning of send work requests benefits performance by reducing the send queue lock contention (acquired in ib_post_send) and saves us HW doorbells which is posted only once. Currently, in normal IO flows iser does not chain the CDB send work request with the registration work request. Also in PI flows, signature work requests are not chained as well. Lets chain those and post only once. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:33 -04:00
Sagi Grimberg	1b16c9894b	IB/iser: Add debug prints to the various memory registration methods Easier to debug when we have the registration details. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:32 -04:00
Sagi Grimberg	df749cdc45	IB/iser: Support up to 8MB data transfer in a single command iser support up to 512KB data transfer in a single scsi command. This means that larger IOs will split to different request. While iser can easily saturate FDR/EDR wires, some arrays are fine tuned for 1MB (or larger) IO sizes, hence add an option to support larger transfers (up to 8MB) if the device allows it. Given that a few target implementations don't support data transfers of more than 512KB by default and the fact that larger IO sizes require more resources, we introduce a module parameter to determine the maximum number of 512B sectors in a single scsi command. Users that are interested in larger transfers can change this value given that the target supports larger transfers. At the moment, iser works in 4K pages granularity, In a later stage we will get it to work with system page size instead. IO operations that consists of N pages will need a page vector of size N+1 in case the first SG element contains an offset. Given that some devices allocates memory regions in powers of 2, this means that allocating a region with N+1 pages, will result in region resources allocation of the next power of 2. Since we don't want that to happen, in case we are in the limit of IO size supported and the first SG element has an offset, we align the SG list using a bounce buffer (which is OK given that this is not likely to happen a lot). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:32 -04:00
Sagi Grimberg	f8db651da2	IB/iser: Pass registration pool a size parameter Hard coded for now. This will allow to allocate different sized MRs depending on the IO size needed (and device capabilities). This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:32 -04:00
Sagi Grimberg	32467c420b	IB/iser: Unify fast memory registration flows iser_reg_rdma_mem_[fastreg\|fmr] share a lot of code, and logically do the same thing other than the buffer registration method itself (iser_fast_reg_mr vs. iser_fast_reg_fmr). The DIF logic is not implemented in the FMR flow as there is no existing device that supports FMRs and Signature feature. This patch unifies the flow in a single routine iser_reg_rdma_mem and just split to fmr/frwr for the buffer registration itself. Also, for symmetry reasons, unify iser_unreg_rdma_mem (which will call the relevant device specific unreg routine). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:31 -04:00
Sagi Grimberg	81722909c8	IB/iser: Make reg_desc_get a per device routine As for fmrs we will hold a single registration descriptor as no need for multiple like in the frwr mode (descriptor for each task). This change helps unifying the duplicate registration code paths. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:31 -04:00
Sagi Grimberg	7d0483c927	IB/iser: Rename iser_reg_page_vec to iser_fast_reg_fmr Also, change a name of a local variable. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:31 -04:00
Adir Lev	2b3bf95810	IB/iser: Maintain connection fmr_pool under a single registration descriptor This will allow us to unify the memory registration code path between the various methods which vary by the device capabilities. This change will make it easier and less intrusive to remove fmr_pools from the code when we'd want to. The reason we use a single descriptor is to avoid taking a redundant spinlock when working with FMRs. We also change the signature of iser_reg_page_vec to make it match iser_fast_reg_mr (and the future indirect registration method). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:30 -04:00
Sagi Grimberg	385ad87d4b	IB/iser: Introduce iser registration pool struct Instead of having it a part of the connection structure, have it be under a dedicated (embedded) structure in the connection. A logical separation of the registration pool and the connection structure. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:30 -04:00
Sagi Grimberg	eb6ea8c36c	IB/iser: Move fastreg descriptor allocation to iser_create_fastreg_desc Don't have the caller allocate the structure and worry about freeing it in case the routine failed. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:30 -04:00
Sagi Grimberg	48afbff673	IB/iser: Introduce iser_reg_ops Move all the per-device function pointers to an easy extensible iser_reg_ops structure that contains all the iser registration operations. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:29 -04:00
Sagi Grimberg	8c18ed03a9	IB/iser: Remove dead code in fmr_pool alloc/free In the past the we always tried to allocate an fmr_pool and if it failed on ENOSYS (not supported) then we continued with dma mr. This is not the case anymore and if we tried to allocate an fmr_pool then it is supported and we expect to succeed. Also, the check if fmr_pool is allocated when free is called is redundant as well as we are guaranteed it exists. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:29 -04:00
Sagi Grimberg	5190cc2664	IB/iser: Rename struct fast_reg_descriptor -> iser_fr_desc Avoid struct names without iser_ prefix. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:29 -04:00
Sagi Grimberg	d711d81d64	IB/iser: Introduce struct iser_reg_resources Have fast_reg_descriptor hold struct iser_reg_resources (mr, frpl, valid flag). This will be useful when the actual buffer registration routines will be passed with the needed registration resources (i.e. iser_reg_resources) without being aware of their nature (i.e. data or protection). In order to achieve this, we remove reg_indicators flags container and place specific flags (mr_valid) within iser_reg_resources struct. We also place the sig_mr_valid and sig_protcted flags in iser_pi_context. This patch also modifies iser_fast_reg_mr to receive the reg_resources instead of the fast_reg_descriptor and a data/protection indicator. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:28 -04:00
Sagi Grimberg	ea18f5d777	IB/iser: Remove an unneeded print for unaligned memory We can do it in iser_aligned_data_len instead and it will save us an argument that is passed to fall_to_counce_buf just for the print. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:28 -04:00
Sagi Grimberg	b9abd8d21d	IB/iser: Remove a redundant always-false condition We always call iser_initialize_task_headers() and set the header tx_sg.lkey to the device mr lkey, so no point in checking it in iser_create_send_desc(). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:28 -04:00
Sagi Grimberg	8d5944d803	IB/iser: Fix possible bogus DMA unmapping If iser_initialize_task_headers() routine failed before dma mapping, we should not attempt to unmap in cleanup_task(). Fixes: `7414dde0a6` (IB/iser: Fix race between iser connection ...) Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:28 -04:00
Sagi Grimberg	02816a8b88	IB/iser: Get rid of un-maintained counters We don't update those anywhere in the code and they seem pretty useless (no one seem to care about those). qp_tx_queue_full: We never should get this fmr_map_not_avail: We can never get to this eh_abort_cnt: We don't monitor aborts Go ahead and remove them. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:27 -04:00
Sagi Grimberg	d16739055b	IB/iser: Fix missing return status check in iser_send_data_out Since commit "IB/iser: Fix race between iser connection teardown..." iser_initialize_task_headers() might fail, so we need to check that. Fixes: `7414dde0a6` (IB/iser: Fix race between iser connection ...) Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:27 -04:00
Sagi Grimberg	1156cc80f8	IB/iser: Remove '.' from log message Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:27 -04:00
Sagi Grimberg	74ce897b7c	IB/iser: Change minor assignments and logging prints Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:26 -04:00
Jenny Falkovich	db0a6cbd21	IB/iser: Change some module parameters to be RO While we're at it, use permission defines instead of octal values and rearrange a little bit. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:26 -04:00
Kaike Wan	2ca546b92a	IB/sa: Route SA pathrecord query through netlink This patch routes a SA pathrecord query to netlink first and processes the response appropriately. If a failure is returned, the request will be sent through IB. The decision whether to route the request to netlink first is determined by the presence of a listener for the local service netlink multicast group. If the user-space local service netlink multicast group listener is not present, the request will be sent through IB, just like what is currently being done. Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: John Fleck <john.fleck@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:26 -04:00
Kaike Wan	5d2657708e	IB/sa: Allocate SA query with kzalloc Replace kmalloc with kzalloc so that all uninitialized fields in SA query will be zero-ed out to avoid unintentional consequence. This prepares the SA query structure to accept new fields in the future. Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: John Fleck <john.fleck@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:25 -04:00
Kaike Wan	bc10ed7d3d	IB/core: Add rdma netlink helper functions This patch adds a function to check if listeners for a netlink multicast group are present. It also adds a function to receive netlink response messages. Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: John Fleck <john.fleck@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:25 -04:00
Bart Van Assche	bc44bd1d86	IB/srp: Stop the scsi_eh_<n> and scsi_tmf_<n> threads if login fails scsi_host_alloc() not only allocates memory for a SCSI host but also creates the scsi_eh_<n> kernel thread and the scsi_tmf_<n> workqueue. Stop these threads if login fails by calling scsi_host_put(). Reported-by: Konstantin Krotov <kkv@clodo.ru> Fixes: `fb49c8bbaa` ("Remove an extraneous scsi_host_put() from an error path") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: <stable@vger.kernel.org> #v3.19 Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:24 -04:00
Bart Van Assche	713ef24e41	IB/srp: Bump driver version and release date Since version 1.0 e.g. scsi-mq has been added. Since this is a significant change, bump the driver version and release date. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:24 -04:00
Bart Van Assche	c257ea6f9f	IB/srp: Handle partial connection success correctly Avoid that the following kernel warning is reported if the SRP target system accepts fewer channels per connection than what was requested by the initiator system: WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xb1/0x120 [ib_srp]() Call Trace: [<ffffffff8105d67f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8105d6da>] warn_slowpath_null+0x1a/0x20 [<ffffffffa05419e1>] srp_destroy_qp+0xb1/0x120 [ib_srp] [<ffffffffa05445fb>] srp_create_ch_ib+0x19b/0x420 [ib_srp] [<ffffffffa0545257>] srp_create_target+0x7d7/0xa94 [ib_srp] [<ffffffff8138dac0>] dev_attr_store+0x20/0x30 [<ffffffff812079ef>] sysfs_write_file+0xef/0x170 [<ffffffff81191fc4>] vfs_write+0xb4/0x130 [<ffffffff8119276f>] sys_write+0x5f/0xa0 [<ffffffff815a0a59>] system_call_fastpath+0x16/0x1b Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:24 -04:00
Bart Van Assche	e6300cbd9b	IB/srp: Constify a function argument This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:23 -04:00
Ariel Nahum	799cdaf8a9	IB/mlx4: Fix incorrect cq flushing in error state When handling a device internal error, the driver is responsible to drain the completion queue with flush errors. In case a completion queue was assigned to multiple send queues, the driver iterates over the send queues and generates flush errors of inflight wqes. The driver must correctly pass the wc array with an offset as a result of the previous send queue iteration. Not doing so will overwrite previously set completions and return a wrong number of polled completions which includes ones which were not correctly set. Fixes: `35f05dabf9` (IB/mlx4: Reset flow support for IB kernel ULPs) Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:23 -04:00
Noa Osherovich	5e99b139f1	IB/mlx4: Use correct SL on AH query under RoCE The mlx4 IB driver implementation for ib_query_ah used a wrong offset (28 instead of 29) when link type is Ethernet. Fixed to use the correct one. Fixes: `fa417f7b52` ('IB/mlx4: Add support for IBoE') Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:23 -04:00
Jack Morgenstein	2b135db3e8	IB/mlx4: Forbid using sysfs to change RoCE pkeys The pkey mapping for RoCE must remain the default mapping: VFs: virtual index 0 = mapped to real index 0 (0xFFFF) All others indices: mapped to a real pkey index containing an invalid pkey. PF: virtual index i = real index i. Don't allow users to change these mappings using files found in sysfs. Fixes: `c1e7e46612` ('IB/mlx4: Add iov directory in sysfs under the ib device') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:22 -04:00
Jack Morgenstein	2cb8e7f86e	IB/mlx4: Demote mcg message from warning to debug The mcg "too many pending requests" warning message fills the log when OpenSM is downed. Demote the message from warning level to debug level. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:22 -04:00
Jack Morgenstein	90c1d8b635	IB/mlx4: Fix potential deadlock when sending mad to wire send_mad_to_wire takes the same spinlock that is taken in the interrupt context. Therefore, it needs irqsave/restore. Fixes: `b9c5d6a643` ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:22 -04:00
Doug Ledford	b8071ad893	IB/core: Remove needless bracketization Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:21 -04:00
Somnath Kotur	cc36929e73	RDMA/ocrdma: Incorporate the moving of GID Table mgmt to IB/Core 1.Change query_gid hook to return value from IB/Core GID management APIs. 2.Get rid of all the netdev notifier chain subscription code as well as maintenance of SGID Table in memory. 3.Implement get_netdev hook in driver. Signed-off-by: Somnath Kotur <somnath.kotur@avagotech.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:21 -04:00
Moni Shoua	5070cd2239	IB/mlx4: Replace mechanism for RoCE GID management Manage RoCE gid table with logic in IB/core, which is common to all vendors, and remove the mechanism from the mlx4 IB driver. Since management of the GID cache may lead to index mismatch with the hardware GID table, a translation between indexes is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:21 -04:00
Moni Shoua	e26be1bfef	IB/mlx4: Implement ib_device callbacks get_netdev: get the net_device on the physical port of the IB transport port. In port aggregation mode it is required to return the netdev of the active port. modify_gid: note for a change in the RoCE gid cache. Handle this by writing to the harsware GID table. It is possible that indexes in cahce and hardware tables won't match so a translation is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:20 -04:00
Matan Barak	238fdf48f2	IB/core: Add RoCE table bonding support Handling bonding and other devices require us to all all GIDs of the net-devices which are upper-devices of the RoCE port related net-device. Active-backup configurations imposes even more challenges as the default GID should only be set on the active devices (this is necessary as otherwise the same MAC could be used for several slaves and thus several slaves will have identical GIDs). Managing these configurations are done by listening to: (a) NETDEV_CHANGEUPPER event (1) if a related net-device is linked, delete all inactive slaves default GIDs and add the upper device GIDs. (2) if a related net-device is unlinked, delete all upper GIDs and add the default GIDs. (b) NETDEV_BONDING_FAILOVER: (1) delete the bond GIDs from inactive slaves (2) delete the inactive slave's default GIDs (3) Add the bond GIDs to the active slave. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:20 -04:00
Dan Carpenter	98d25afa97	IB/core: missing curly braces in ib_find_gid() Smatch says that, based on the indenting, we should probably add curly braces here. Fixes: `03db3a2d81` ('IB/core: Add RoCE GID table management') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:09 -04:00
Matan Barak	03db3a2d81	IB/core: Add RoCE GID table management RoCE GIDs are based on IP addresses configured on Ethernet net-devices which relate to the RDMA (RoCE) device port. Currently, each of the low-level drivers that support RoCE (ocrdma, mlx4) manages its own RoCE port GID table. As there's nothing which is essentially vendor specific, we generalize that, and enhance the RDMA core GID cache to do this job. In order to populate the GID table, we listen for events: (a) netdev up/down/change_addr events - if a netdev is built onto our RoCE device, we need to add/delete its IPs. This involves adding all GIDs related to this ndev, add default GIDs, etc. (b) inet events - add new GIDs (according to the IP addresses) to the table. For programming the port RoCE GID table, providers must implement the add_gid and del_gid callbacks. RoCE GID management requires us to state the associated net_device alongside the GID. This information is necessary in order to manage the GID table. For example, when a net_device is removed, its associated GIDs need to be removed as well. RoCE mandates generating a default GID for each port, based on the related net-device's IPv6 link local. In contrast to the GID based on the regular IPv6 link-local (as we generate GID per IP address), the default GID is also available when the net device is down (in order to support loopback). Locking is done as follows: The patch modify the GID table code both for new RoCE drivers implementing the add_gid/del_gid callbacks and for current RoCE and IB drivers that do not. The flows for updating the table are different, so the locking requirements are too. While updating RoCE GID table, protection against multiple writers is achieved via mutex_lock(&table->lock). Since writing to a table requires us to find an entry (possible a free entry) in the table and then modify it, this mutex protects both the find_gid and write_gid ensuring the atomicity of the action. Each entry in the GID cache is protected by rwlock. In RoCE, writing (usually results from netdev notifier) involves invoking the vendor's add_gid and del_gid callbacks, which could sleep. Therefore, an invalid flag is added for each entry. Updates for RoCE are done via a workqueue, thus sleeping is permitted. In IB, updates are done in write_lock_irq(&device->cache.lock), thus write_gid isn't allowed to sleep and add_gid/del_gid are not called. When passing net-device into/out-of the GID cache, the device is always passed held (dev_hold). The code uses a single work item for updating all RDMA devices, following a netdev or inet notifier. The patch moves the cache from being a client (which was incorrect, as the cache is part of the IB infrastructure) to being explicitly initialized/freed when a device is registered/removed. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:50 -04:00
Jason Gunthorpe	55aeed0654	IB/core: Make ib_alloc_device init the kobject This gets rid of the weird in-between state where struct ib_device was allocated but the kobject didn't work. Consequently ib_device_release is now guaranteed to be called in all situations and we needn't duplicate its kfrees on error paths. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:50 -04:00
Sagi Grimberg	d9f272c523	IB/core: Drop ib_alloc_fast_reg_mr Fully replaced by a more generic and suitable ib_alloc_mr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:49 -04:00
Sagi Grimberg	1302f8452b	qib: Support ib_alloc_mr verb Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:48 -04:00
Sagi Grimberg	e02e4d554d	nes: Support ib_alloc_mr verb Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:48 -04:00
Sagi Grimberg	f683d3bdbb	cxgb3: Support ib_alloc_mr verb Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:47 -04:00
Sagi Grimberg	a21640347a	iw_cxgb4: Support ib_alloc_mr verb Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:47 -04:00
Sagi Grimberg	cacb7d59be	ocrdma: Support ib_alloc_mr verb Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:47 -04:00
Sagi Grimberg	679e34d1d0	mlx4: Support ib_alloc_mr verb Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:46 -04:00
Sagi Grimberg	b3778ba8de	mlx5: Drop mlx5_ib_alloc_fast_reg_mr Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:46 -04:00
Sagi Grimberg	563b67c5f9	IB/srp: Convert to ib_alloc_mr Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:45 -04:00
Sagi Grimberg	a89be2cc51	iser-target: Convert to ib_alloc_mr Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:45 -04:00
Sagi Grimberg	34780f012c	IB/iser: Convert to ib_alloc_mr Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:44 -04:00
Sagi Grimberg	9bee178b4f	IB: Modify ib_create_mr API Use ib_alloc_mr with specific parameters. Change the existing callers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:44 -04:00

... 5 6 7 8 9 ...

5313 Commits