No ULP uses it anymore, go ahead and remove it.
Keep only the local invalidate part of the handlers.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Support the new memory registration API by allocating a
private page list array in mlx5_ib_mr and populate it when
mlx5_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by setting the exact WQE as IB_WR_FAST_REG_MR, just take the
needed information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (mlx5_ib_mr)
- key (ib_reg_wr)
The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Just function declarations - no need for those
laying arround. If for some reason someone will want
FMR support in mlx5, it should be easy enough to restore
a few structs.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr. This dramaticly
shrinks the size of a WR for most common operations:
sizeof(struct ib_send_wr) (old): 96
sizeof(struct ib_send_wr): 48
sizeof(struct ib_rdma_wr): 64
sizeof(struct ib_atomic_wr): 96
sizeof(struct ib_ud_wr): 88
sizeof(struct ib_fast_reg_wr): 88
sizeof(struct ib_bind_mw_wr): 96
sizeof(struct ib_sig_handover_wr): 80
And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:
sizeof(struct ib_fastreg_wr): 64
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: Haggai Eran <haggaie@mellanox.com>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Since mlx5 driver cannot rely on registration using the
reserved lkey (global_dma_lkey) it used to allocate a private
physical address lkey for each allocated pd.
Commit 96249d70dd ("IB/core: Guarantee that a local_dma_lkey
is available") just does it in the core layer so we can go ahead
and use that.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since patch series "Demux IB CM requests in the rdma_cm module" the
P_Key index is taken from the work completion rather than the message
itself.
The HCA provides us with the message P_Key. In order to provide the
P_Key index, we need to look it up. Given that this is relevant only
for GSI messages (session establishments) which is less performance critical,
micro-optimize against the GSI (is_qp1) branch.
Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use ib_alloc_mr with specific parameters.
Change the existing callers.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This was added in a thought of uniting all mr allocation
and deallocation routines but the fact is we have a single
deallocation routine already, ib_dereg_mr.
And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr
(includes only signature stuff for now).
And, fixup the only callers (iser/isert) accordingly.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull networking updates from David Miller:
1) Add TX fast path in mac80211, from Johannes Berg.
2) Add TSO/GRO support to ibmveth, from Thomas Falcon
3) Move away from cached routes in ipv6, just like ipv4, from Martin
KaFai Lau.
4) Lots of new rhashtable tests, from Thomas Graf.
5) Run ingress qdisc lockless, from Alexei Starovoitov.
6) Allow servers to fetch TCP packet headers for SYN packets of new
connections, for fingerprinting. From Eric Dumazet.
7) Add mode parameter to pktgen, for testing receive. From Alexei
Starovoitov.
8) Cache access optimizations via simplifications of build_skb(), from
Alexander Duyck.
9) Move page frag allocator under mm/, also from Alexander.
10) Add xmit_more support to hv_netvsc, from KY Srinivasan.
11) Add a counter guard in case we try to perform endless reclassify
loops in the packet scheduler.
12) Extern flow dissector to be programmable and use it in new "Flower"
classifier. From Jiri Pirko.
13) AF_PACKET fanout rollover fixes, performance improvements, and new
statistics. From Willem de Bruijn.
14) Add netdev driver for GENEVE tunnels, from John W Linville.
15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.
16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.
17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
Borkmann.
18) Add tail call support to BPF, from Alexei Starovoitov.
19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.
20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.
21) Favor even port numbers for allocation to connect() requests, and
odd port numbers for bind(0), in an effort to help avoid
ip_local_port_range exhaustion. From Eric Dumazet.
22) Add Cavium ThunderX driver, from Sunil Goutham.
23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
from Alexei Starovoitov.
24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.
25) Double TCP Small Queues default to 256K to accomodate situations
like the XEN driver and wireless aggregation. From Wei Liu.
26) Add more entropy inputs to flow dissector, from Tom Herbert.
27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
Jonassen.
28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.
29) Track and act upon link status of ipv4 route nexthops, from Andy
Gospodarek.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
bridge: vlan: flush the dynamically learned entries on port vlan delete
bridge: multicast: add a comment to br_port_state_selection about blocking state
net: inet_diag: export IPV6_V6ONLY sockopt
stmmac: troubleshoot unexpected bits in des0 & des1
net: ipv4 sysctl option to ignore routes when nexthop link is down
net: track link-status of ipv4 nexthops
net: switchdev: ignore unsupported bridge flags
net: Cavium: Fix MAC address setting in shutdown state
drivers: net: xgene: fix for ACPI support without ACPI
ip: report the original address of ICMP messages
net/mlx5e: Prefetch skb data on RX
net/mlx5e: Pop cq outside mlx5e_get_cqe
net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
net/mlx5e: Remove extra spaces
net/mlx5e: Avoid TX CQE generation if more xmit packets expected
net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
...
In order to support alternate sized MADs (and variable sized MADs on OPA
devices) add in/out MAD size parameters to the process_mad core call.
In addition, add an out_mad_pkey_index to communicate the pkey index the driver
wishes the MAD stack to use when sending OPA MAD responses.
The out MAD size and the out MAD PKey index are required by the MAD
stack to generate responses on OPA devices.
Furthermore, the in and out MAD parameters are made generic by specifying them
as ib_mad_hdr rather than ib_mad.
Drivers are modified as needed and are protected by BUG_ON flags if the MAD
sizes passed to them is incorrect.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a new ib_cq_init_attr structure which contains the
previous cqe (minimum number of CQ entries) and comp_vector
(completion vector) in addition to a new flags field.
All vendors' create_cq callbacks are changed in order
to work with the new API.
This commit does not change any functionality.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ethernet functionality is only available when working in ISSI > 0 mode.
Previously, the IB driver wasn't ready to work on that mode, and hence
building both the IB driver and the Ethernet functionality in the core
driver were disallowed by Kconfigs.
Now, once we have all the pre-steps in place, we can remove this limitation.
The last steps in the IB driver for getting that setup to work are:
create dummy SRQ for the driver's use (until now we could use XRC_SRQ
as SRQ and XRC_SRQ, after moving to ISSI > 0, we separate XRC SRQs from
basic SRQs) and adapt the create QP function to be compatible with ISSI > 0.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ISSI > 0 mode, most of the MAD_IFC command features are deprecated, and can't
be used. Therefore, when in that mode, we replace all of them with other commands
that provide the required functionality.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The process_mad device function declares some parameters as "in". Make those
parameters const and adjust the call tree under process_mad in the various
drivers accordingly.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- Query all supported types of dev caps on driver load.
- Store the Cap data outbox per cap type into driver private data.
- Introduce new Macros to access/dump stored caps (using the auto
generated data types).
- Obsolete SW representation of dev caps (no need for SW copy for each
cap).
- Modify IB driver to use new macros for checking caps.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preparation for ethernet driver.
These functions will be used in drivers other than mlx5_ib.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Implement the relevant invalidation functions (zap MTTs as needed)
* Implement interlocking (and rollback in the page fault handlers) for
cases of a racing notifier and fault.
* With this patch we can now enable the capability bits for supporting RC
send/receive/RDMA read/RDMA write, and UD send.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* Refactor MR registration and cleanup, and fix reg_pages accounting.
* Create a work queue to handle page fault events in a kthread context.
* Register a fault handler to get events from the core for each QP.
The registered fault handler is empty in this patch, and only a later
patch implements it.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The new function allows updating the page tables of a memory region
after it was created. This can be used to handle page faults and page
invalidations.
Since mlx5_ib_update_mtt will need to work from within page invalidation,
so it must not block on memory allocation. It employs an atomic memory
allocation mechanism that is used as a fallback when kmalloc(GFP_ATOMIC) fails.
In order to reuse code from mlx5_ib_populate_pas, the patch splits
this function and add the needed parameters.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch wraps together several changes needed for on-demand paging support
in the mlx5_ib_populate_pas function, and when registering memory regions.
* Instead of accepting a UMR bit telling the function to enable all
access flags, the function now accepts the access flags themselves.
* For on-demand paging memory regions, fill the memory tables from the
correct list, and enable/disable the access flags per-page according
to whether the page is present.
* A new bit is set to enable writing of access flags when using the
firmware create_mkey command.
* Disable contig pages when on-demand paging is enabled.
In addition the patch changes the UMR code to use PTR_ALIGN instead of
our own macro.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The patch adds infrastructure to query ODP capabilities in the mlx5
driver. The code will read the capabilities from the device, and
enable only those capabilities that both the driver and the device
supports. At this point ODP is not supported, so no capability is
copied from the device, but the patch exposes the global ODP device
capability bit.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add a helper function mlx5_ib_read_user_wqe to read information from
user-space owned work queues. The function will be used in a later
patch by the page-fault handling code in mlx5_ib.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
[ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n
- Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
The current UMR interface doesn't allow partial updates to a memory
region's page tables. This patch changes the interface to allow that.
It also changes the way the UMR operation validates the memory
region's state. When set, IB_SEND_UMR_FAIL_IF_FREE will cause the UMR
operation to fail if the MKEY is in the free state. When it is
unchecked the operation will check that it isn't in the free state.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Since UMR code now uses its own context struct on the stack, the pas
and dma pointers for the UMR operation that remained in the mlx5_ib_mr
struct are not necessary. This patch removes them.
Fixes: a74d24168d ("IB/mlx5: Refactor UMR to have its own context struct")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
There were many places where parameters which should be u8/u16 were
integer type.
Additionally, in 2 places, a check for a non-null pointer was added
before dereferencing the pointer (this is actually a bug fix).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for a new mlx5 device which is VPI (i.e., ports can be
either IB or ETH), move the pci device functionality from mlx5_ib
to mlx5_core.
This involves the following changes:
1. Move mlx5_core_dev struct out of mlx5_ib_dev. mlx5_core_dev
is now an independent structure maintained by mlx5_core.
mlx5_ib_dev now has a pointer to that struct.
This requires changing a lot of places where the core_dev
struct was accessed via mlx5_ib_dev (now, this needs to
be a pointer dereference).
2. All PCI initializations are now done in mlx5_core. Thus,
it is now mlx5_core which does pci_register_device (and not
mlx5_ib, as was previously).
3. mlx5_ib now registers itself with mlx5_core as an "interface"
driver. This is very similar to the mechanism employed for
the mlx4 (ConnectX) driver. Once the HCA is initialized
(by mlx5_core), it invokes the interface drivers to do
their initializations.
4. There is a new event handler which the core registers:
mlx5_core_event(). This event handler invokes the
event handlers registered by the interfaces.
Based on a patch by Eli Cohen <eli@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having the UMR context part of each memory region, allocate
a struct on the stack. This allows queuing multiple UMRs that access
the same memory region.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This commit takes care of the generated signature error CQE generated
by the HW (if happened). The underlying mlx5 driver will handle
signature error completions and will mark the relevant memory region
as dirty.
Once the consumer gets the completion for the transaction, it must
check for signature errors on signature memory region using a new
lightweight verb ib_check_mr_status().
In case the user doesn't check for signature error (i.e. doesn't call
ib_check_mr_status() with status check IB_MR_CHECK_SIG_STATUS), the
memory region cannot be used for another signature operation
(REG_SIG_MR work request will fail).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If user requested signature enable we initialize relevant mlx5_ib_qp
members. We mark the qp as sig_enable and we increase the effective
SQ size, but still limit the user max_send_wr to original size
computed. We also allow the create_qp routine to accept sig_enable
create flag.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Support create_mr and destroy_mr verbs. Creating ib_mr may be done
for either ib_mr that will register regular page lists like
alloc_fast_reg_mr routine, or indirect ib_mrs that can register other
(pre-registered) ib_mrs in an indirect manner.
In addition user may request signature enable, that will mean that the
created ib_mr may be attached with signature attributes (BSF, PSVs).
Currently we only allow direct/indirect registration modes.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Implement resize CQ which is a mandatory verb in mlx5.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The variable start in struct mlx5_ib_mr is never used. Remove it.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use asynchronous commands to execute up to eight concurrent create MR
commands. This is to fill memory caches faster so we keep consuming
from there. Also, increase timeout for shrinking caches to five
minutes.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The driver is comprised of two kernel modules: mlx5_ib and mlx5_core.
This partitioning resembles what we have for mlx4, except that mlx5_ib
is the pci device driver and not mlx5_core.
mlx5_core is essentially a library that provides general functionality
that is intended to be used by other Mellanox devices that will be
introduced in the future. mlx5_ib has a similar role as any hardware
device under drivers/infiniband/hw.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
[ Merge in coccinelle fixes from Fengguang Wu <fengguang.wu@intel.com>.
- Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>