mlx5-XDP-100Mpps

This series from Tariq, mainly adds the support of mlx5 Multi Packet WQE
 (TX descriptor) - ConnectX-5 and above - for XDP TX, which allows us to
 overcome the 70Mpps PCIe bottleneck of conventional TX queues (single TX
 descriptor per packet), and achieve the 100Mpps milestone with the MPWQE
 approach.
 
 In the first five patches, Tariq did minor improvements to mlx5 tx path,
 for better debug-ability and code structuring.
 
 Next two patches lay down the foundation for MPWQE implementation to store
 the in-flight XDP TX information for multiple packets of one descriptor
 (WQE).
 
 Next: Support Enhanced Multi-Packet TX WQE for XDP
 
 In this patch we add support for the HW feature, which is supported
 starting from ConnectX-5.
 
 Performance:
 Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs.
 CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
 
 XDP_TX:
 We see a huge gain on single port ConnectX-5, and reach the 100 Mpps
 milestone.
 * Single-port HCA:
 	Before:   70 Mpps
 	After:   100 Mpps (+42.8%)
 
 * Dual-port HCA:
 	Before: 51.7 Mpps
 	After:  57.3 Mpps (+10.8%)
 
 * In both cases we tested traffic on one port and for now On Dual-port
   HCAs we see only a small gain, we are working to overcome this
   bottleneck, but for the moment only with experimental firmware on dual
   port HCAs we can reach the wanted numbers as seen on Single-port HCAs.
 
 XDP_REDIRECT:
 Redirect from (A) ConnectX-5 to (B) ConnectX-5.
 Due to a setup limitation, (A) and (B) are on different NUMA nodes,
 so absolute performance numbers are not optimal.
 - Note:
   Below is the transmit rate of (B), not the redirect rate of (A)
   which is in some cases higher.
 
 * (B) is single-port:
 	Before:   77 Mpps
 	After:    90 Mpps (+16.8%)
 
 * (B) is dual-port:
 	Before:  61 Mpps
 	After:   72 Mpps (+18%)
 
 Last patch adds a knob in mlx5 ethtool private flag to turn on/off
 XDP TX MPWQE.
 
 -Saeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcHI4cAAoJEEg/ir3gV/o+ZeAIAKiWUDN2RhG3GRo68kaMt/jm
 PCNWTcz/ufJQoRPhDjTFfTRoeO3cV/I09g0KMNqBRCNYzqx+AYbUBP7QFxJirO10
 PJbJKN30S1tGLtuXDEhHSo70OzE4Ycuk24D+OI574jsDuRT7WTYHJrf8J//bIPDZ
 FrSoX8cjPIQFyZEsyqAoOzKvsi8OnpWFbOQIdRj6cQKFyuvq5iCtn+jnXSxcYEwF
 v9T4Owp1VszZCapYg/s2wBcdFYiPoN8Ief8LljMpRWF+4umjMpKzJ77OK/aCBZxq
 P1Ru/O5hLXZN5Q9cDf8wCV66qIfVqvGGePqKM/glk07EfUGlcHSW3ukpRveLLuI=
 =lMSl
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-XDP-100Mpps' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-XDP-100Mpps

This series from Tariq, mainly adds the support of mlx5 Multi Packet WQE
(TX descriptor) - ConnectX-5 and above - for XDP TX, which allows us to
overcome the 70Mpps PCIe bottleneck of conventional TX queues (single TX
descriptor per packet), and achieve the 100Mpps milestone with the MPWQE
approach.

In the first five patches, Tariq did minor improvements to mlx5 tx path,
for better debug-ability and code structuring.

Next two patches lay down the foundation for MPWQE implementation to store
the in-flight XDP TX information for multiple packets of one descriptor
(WQE).

Next: Support Enhanced Multi-Packet TX WQE for XDP

In this patch we add support for the HW feature, which is supported
starting from ConnectX-5.

Performance:
Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

XDP_TX:
We see a huge gain on single port ConnectX-5, and reach the 100 Mpps
milestone.
* Single-port HCA:
	Before:   70 Mpps
	After:   100 Mpps (+42.8%)

* Dual-port HCA:
	Before: 51.7 Mpps
	After:  57.3 Mpps (+10.8%)

* In both cases we tested traffic on one port and for now On Dual-port
  HCAs we see only a small gain, we are working to overcome this
  bottleneck, but for the moment only with experimental firmware on dual
  port HCAs we can reach the wanted numbers as seen on Single-port HCAs.

XDP_REDIRECT:
Redirect from (A) ConnectX-5 to (B) ConnectX-5.
Due to a setup limitation, (A) and (B) are on different NUMA nodes,
so absolute performance numbers are not optimal.
- Note:
  Below is the transmit rate of (B), not the redirect rate of (A)
  which is in some cases higher.

* (B) is single-port:
	Before:   77 Mpps
	After:    90 Mpps (+16.8%)

* (B) is dual-port:
	Before:  61 Mpps
	After:   72 Mpps (+18%)

Last patch adds a knob in mlx5 ethtool private flag to turn on/off
XDP TX MPWQE.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
David S. Miller 2018-12-21 08:00:27 -08:00
commit 3715917408
9 changed files with 356 additions and 106 deletions

View File

@ -216,6 +216,7 @@ enum mlx5e_priv_flag {
MLX5E_PFLAG_RX_CQE_COMPRESS,
MLX5E_PFLAG_RX_STRIDING_RQ,
MLX5E_PFLAG_RX_NO_CSUM_COMPLETE,
MLX5E_PFLAG_XDP_TX_MPWQE,
MLX5E_NUM_PFLAGS, /* Keep last */
};
@ -344,7 +345,6 @@ enum {
MLX5E_SQ_STATE_IPSEC,
MLX5E_SQ_STATE_AM,
MLX5E_SQ_STATE_TLS,
MLX5E_SQ_STATE_REDIRECT,
};
struct mlx5e_sq_wqe_info {
@ -405,24 +405,51 @@ struct mlx5e_xdp_info {
struct mlx5e_dma_info di;
};
struct mlx5e_xdp_info_fifo {
struct mlx5e_xdp_info *xi;
u32 *cc;
u32 *pc;
u32 mask;
};
struct mlx5e_xdp_wqe_info {
u8 num_wqebbs;
u8 num_ds;
};
struct mlx5e_xdp_mpwqe {
/* Current MPWQE session */
struct mlx5e_tx_wqe *wqe;
u8 ds_count;
u8 max_ds_count;
};
struct mlx5e_xdpsq;
typedef bool (*mlx5e_fp_xmit_xdp_frame)(struct mlx5e_xdpsq*,
struct mlx5e_xdp_info*);
struct mlx5e_xdpsq {
/* data path */
/* dirtied @completion */
u32 xdpi_fifo_cc;
u16 cc;
bool redirect_flush;
/* dirtied @xmit */
u16 pc ____cacheline_aligned_in_smp;
bool doorbell;
u32 xdpi_fifo_pc ____cacheline_aligned_in_smp;
u16 pc;
struct mlx5_wqe_ctrl_seg *doorbell_cseg;
struct mlx5e_xdp_mpwqe mpwqe;
struct mlx5e_cq cq;
/* read only */
struct mlx5_wq_cyc wq;
struct mlx5e_xdpsq_stats *stats;
mlx5e_fp_xmit_xdp_frame xmit_xdp_frame;
struct {
struct mlx5e_xdp_info *xdpi;
struct mlx5e_xdp_wqe_info *wqe_info;
struct mlx5e_xdp_info_fifo xdpi_fifo;
} db;
void __iomem *uar_map;
u32 sqn;

View File

@ -47,7 +47,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_dma_info *di,
xdpi.xdpf->len, PCI_DMA_TODEVICE);
xdpi.di = *di;
return mlx5e_xmit_xdp_frame(sq, &xdpi);
return sq->xmit_xdp_frame(sq, &xdpi);
}
/* returns true if packet was consumed by xdp */
@ -102,7 +102,98 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
}
}
bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xdp_info *xdpi)
static void mlx5e_xdp_mpwqe_session_start(struct mlx5e_xdpsq *sq)
{
struct mlx5e_xdp_mpwqe *session = &sq->mpwqe;
struct mlx5_wq_cyc *wq = &sq->wq;
u8 wqebbs;
u16 pi;
mlx5e_xdpsq_fetch_wqe(sq, &session->wqe);
prefetchw(session->wqe->data);
session->ds_count = MLX5E_XDP_TX_EMPTY_DS_COUNT;
pi = mlx5_wq_cyc_ctr2ix(wq, sq->pc);
/* The mult of MLX5_SEND_WQE_MAX_WQEBBS * MLX5_SEND_WQEBB_NUM_DS
* (16 * 4 == 64) does not fit in the 6-bit DS field of Ctrl Segment.
* We use a bound lower that MLX5_SEND_WQE_MAX_WQEBBS to let a
* full-session WQE be cache-aligned.
*/
#if L1_CACHE_BYTES < 128
#define MLX5E_XDP_MPW_MAX_WQEBBS (MLX5_SEND_WQE_MAX_WQEBBS - 1)
#else
#define MLX5E_XDP_MPW_MAX_WQEBBS (MLX5_SEND_WQE_MAX_WQEBBS - 2)
#endif
wqebbs = min_t(u16, mlx5_wq_cyc_get_contig_wqebbs(wq, pi),
MLX5E_XDP_MPW_MAX_WQEBBS);
session->max_ds_count = MLX5_SEND_WQEBB_NUM_DS * wqebbs;
}
static void mlx5e_xdp_mpwqe_complete(struct mlx5e_xdpsq *sq)
{
struct mlx5_wq_cyc *wq = &sq->wq;
struct mlx5e_xdp_mpwqe *session = &sq->mpwqe;
struct mlx5_wqe_ctrl_seg *cseg = &session->wqe->ctrl;
u16 ds_count = session->ds_count;
u16 pi = mlx5_wq_cyc_ctr2ix(wq, sq->pc);
struct mlx5e_xdp_wqe_info *wi = &sq->db.wqe_info[pi];
cseg->opmod_idx_opcode =
cpu_to_be32((sq->pc << 8) | MLX5_OPCODE_ENHANCED_MPSW);
cseg->qpn_ds = cpu_to_be32((sq->sqn << 8) | ds_count);
wi->num_wqebbs = DIV_ROUND_UP(ds_count, MLX5_SEND_WQEBB_NUM_DS);
wi->num_ds = ds_count - MLX5E_XDP_TX_EMPTY_DS_COUNT;
sq->pc += wi->num_wqebbs;
sq->doorbell_cseg = cseg;
session->wqe = NULL; /* Close session */
}
static bool mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq,
struct mlx5e_xdp_info *xdpi)
{
struct mlx5e_xdp_mpwqe *session = &sq->mpwqe;
struct mlx5e_xdpsq_stats *stats = sq->stats;
dma_addr_t dma_addr = xdpi->dma_addr;
struct xdp_frame *xdpf = xdpi->xdpf;
unsigned int dma_len = xdpf->len;
if (unlikely(sq->hw_mtu < dma_len)) {
stats->err++;
return false;
}
if (unlikely(!session->wqe)) {
if (unlikely(!mlx5e_wqc_has_room_for(&sq->wq, sq->cc, sq->pc,
MLX5_SEND_WQE_MAX_WQEBBS))) {
/* SQ is full, ring doorbell */
mlx5e_xmit_xdp_doorbell(sq);
stats->full++;
return false;
}
mlx5e_xdp_mpwqe_session_start(sq);
}
mlx5e_xdp_mpwqe_add_dseg(sq, dma_addr, dma_len);
if (unlikely(session->ds_count == session->max_ds_count))
mlx5e_xdp_mpwqe_complete(sq);
mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo, xdpi);
stats->xmit++;
return true;
}
static bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xdp_info *xdpi)
{
struct mlx5_wq_cyc *wq = &sq->wq;
u16 pi = mlx5_wq_cyc_ctr2ix(wq, sq->pc);
@ -126,11 +217,8 @@ bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xdp_info *xdpi)
}
if (unlikely(!mlx5e_wqc_has_room_for(wq, sq->cc, sq->pc, 1))) {
if (sq->doorbell) {
/* SQ is full, ring doorbell */
mlx5e_xmit_xdp_doorbell(sq);
sq->doorbell = false;
}
stats->full++;
return false;
}
@ -152,23 +240,20 @@ bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xdp_info *xdpi)
cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | MLX5_OPCODE_SEND);
/* move page to reference to sq responsibility,
* and mark so it's not put back in page-cache.
*/
sq->db.xdpi[pi] = *xdpi;
sq->pc++;
sq->doorbell = true;
sq->doorbell_cseg = cseg;
mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo, xdpi);
stats->xmit++;
return true;
}
bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq, struct mlx5e_rq *rq)
{
struct mlx5e_xdp_info_fifo *xdpi_fifo;
struct mlx5e_xdpsq *sq;
struct mlx5_cqe64 *cqe;
struct mlx5e_rq *rq;
bool is_redirect;
u16 sqcc;
int i;
@ -182,8 +267,8 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
if (!cqe)
return false;
is_redirect = test_bit(MLX5E_SQ_STATE_REDIRECT, &sq->state);
rq = container_of(sq, struct mlx5e_rq, xdpsq);
is_redirect = !rq;
xdpi_fifo = &sq->db.xdpi_fifo;
/* sq->cc must be updated only after mlx5_cqwq_update_db_record(),
* otherwise a cq overrun may occur
@ -199,20 +284,33 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
wqe_counter = be16_to_cpu(cqe->wqe_counter);
if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_REQ))
netdev_WARN_ONCE(sq->channel->netdev,
"Bad OP in XDPSQ CQE: 0x%x\n",
get_cqe_opcode(cqe));
do {
u16 ci = mlx5_wq_cyc_ctr2ix(&sq->wq, sqcc);
struct mlx5e_xdp_info *xdpi = &sq->db.xdpi[ci];
struct mlx5e_xdp_wqe_info *wi;
u16 ci, j;
last_wqe = (sqcc == wqe_counter);
sqcc++;
ci = mlx5_wq_cyc_ctr2ix(&sq->wq, sqcc);
wi = &sq->db.wqe_info[ci];
sqcc += wi->num_wqebbs;
for (j = 0; j < wi->num_ds; j++) {
struct mlx5e_xdp_info xdpi =
mlx5e_xdpi_fifo_pop(xdpi_fifo);
if (is_redirect) {
xdp_return_frame(xdpi->xdpf);
dma_unmap_single(sq->pdev, xdpi->dma_addr,
xdpi->xdpf->len, DMA_TO_DEVICE);
xdp_return_frame(xdpi.xdpf);
dma_unmap_single(sq->pdev, xdpi.dma_addr,
xdpi.xdpf->len, DMA_TO_DEVICE);
} else {
/* Recycle RX page */
mlx5e_page_release(rq, &xdpi->di, true);
mlx5e_page_release(rq, &xdpi.di, true);
}
}
} while (!last_wqe);
} while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq)));
@ -228,27 +326,32 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
return (i == MLX5E_TX_CQ_POLL_BUDGET);
}
void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)
void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq)
{
struct mlx5e_rq *rq;
bool is_redirect;
is_redirect = test_bit(MLX5E_SQ_STATE_REDIRECT, &sq->state);
rq = is_redirect ? NULL : container_of(sq, struct mlx5e_rq, xdpsq);
struct mlx5e_xdp_info_fifo *xdpi_fifo = &sq->db.xdpi_fifo;
bool is_redirect = !rq;
while (sq->cc != sq->pc) {
u16 ci = mlx5_wq_cyc_ctr2ix(&sq->wq, sq->cc);
struct mlx5e_xdp_info *xdpi = &sq->db.xdpi[ci];
struct mlx5e_xdp_wqe_info *wi;
u16 ci, i;
sq->cc++;
ci = mlx5_wq_cyc_ctr2ix(&sq->wq, sq->cc);
wi = &sq->db.wqe_info[ci];
sq->cc += wi->num_wqebbs;
for (i = 0; i < wi->num_ds; i++) {
struct mlx5e_xdp_info xdpi =
mlx5e_xdpi_fifo_pop(xdpi_fifo);
if (is_redirect) {
xdp_return_frame(xdpi->xdpf);
dma_unmap_single(sq->pdev, xdpi->dma_addr,
xdpi->xdpf->len, DMA_TO_DEVICE);
xdp_return_frame(xdpi.xdpf);
dma_unmap_single(sq->pdev, xdpi.dma_addr,
xdpi.xdpf->len, DMA_TO_DEVICE);
} else {
/* Recycle RX page */
mlx5e_page_release(rq, &xdpi->di, false);
mlx5e_page_release(rq, &xdpi.di, false);
}
}
}
}
@ -292,7 +395,7 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
xdpi.xdpf = xdpf;
if (unlikely(!mlx5e_xmit_xdp_frame(sq, &xdpi))) {
if (unlikely(!sq->xmit_xdp_frame(sq, &xdpi))) {
dma_unmap_single(sq->pdev, xdpi.dma_addr,
xdpf->len, DMA_TO_DEVICE);
xdp_return_frame_rx_napi(xdpf);
@ -300,8 +403,33 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
}
}
if (flags & XDP_XMIT_FLUSH)
if (flags & XDP_XMIT_FLUSH) {
if (sq->mpwqe.wqe)
mlx5e_xdp_mpwqe_complete(sq);
mlx5e_xmit_xdp_doorbell(sq);
}
return n - drops;
}
void mlx5e_xdp_rx_poll_complete(struct mlx5e_rq *rq)
{
struct mlx5e_xdpsq *xdpsq = &rq->xdpsq;
if (xdpsq->mpwqe.wqe)
mlx5e_xdp_mpwqe_complete(xdpsq);
mlx5e_xmit_xdp_doorbell(xdpsq);
if (xdpsq->redirect_flush) {
xdp_do_flush_map();
xdpsq->redirect_flush = false;
}
}
void mlx5e_set_xmit_fp(struct mlx5e_xdpsq *sq, bool is_mpw)
{
sq->xmit_xdp_frame = is_mpw ?
mlx5e_xmit_xdp_frame_mpwqe : mlx5e_xmit_xdp_frame;
}

View File

@ -37,27 +37,62 @@
#define MLX5E_XDP_MAX_MTU ((int)(PAGE_SIZE - \
MLX5_SKB_FRAG_SZ(XDP_PACKET_HEADROOM)))
#define MLX5E_XDP_MIN_INLINE (ETH_HLEN + VLAN_HLEN)
#define MLX5E_XDP_TX_DS_COUNT \
((sizeof(struct mlx5e_tx_wqe) / MLX5_SEND_WQE_DS) + 1 /* SG DS */)
#define MLX5E_XDP_TX_EMPTY_DS_COUNT \
(sizeof(struct mlx5e_tx_wqe) / MLX5_SEND_WQE_DS)
#define MLX5E_XDP_TX_DS_COUNT (MLX5E_XDP_TX_EMPTY_DS_COUNT + 1 /* SG DS */)
bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
void *va, u16 *rx_headroom, u32 *len);
bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq);
void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq);
bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xdp_info *xdpi);
bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq, struct mlx5e_rq *rq);
void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq);
void mlx5e_set_xmit_fp(struct mlx5e_xdpsq *sq, bool is_mpw);
void mlx5e_xdp_rx_poll_complete(struct mlx5e_rq *rq);
int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
u32 flags);
static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_xdpsq *sq)
{
if (sq->doorbell_cseg) {
mlx5e_notify_hw(&sq->wq, sq->pc, sq->uar_map, sq->doorbell_cseg);
sq->doorbell_cseg = NULL;
}
}
static inline void
mlx5e_xdp_mpwqe_add_dseg(struct mlx5e_xdpsq *sq, dma_addr_t dma_addr, u16 dma_len)
{
struct mlx5e_xdp_mpwqe *session = &sq->mpwqe;
struct mlx5_wqe_data_seg *dseg =
(struct mlx5_wqe_data_seg *)session->wqe + session->ds_count++;
dseg->addr = cpu_to_be64(dma_addr);
dseg->byte_count = cpu_to_be32(dma_len);
dseg->lkey = sq->mkey_be;
}
static inline void mlx5e_xdpsq_fetch_wqe(struct mlx5e_xdpsq *sq,
struct mlx5e_tx_wqe **wqe)
{
struct mlx5_wq_cyc *wq = &sq->wq;
struct mlx5e_tx_wqe *wqe;
u16 pi = mlx5_wq_cyc_ctr2ix(wq, sq->pc - 1); /* last pi */
u16 pi = mlx5_wq_cyc_ctr2ix(wq, sq->pc);
wqe = mlx5_wq_cyc_get_wqe(wq, pi);
*wqe = mlx5_wq_cyc_get_wqe(wq, pi);
memset(*wqe, 0, sizeof(**wqe));
}
mlx5e_notify_hw(wq, sq->pc, sq->uar_map, &wqe->ctrl);
static inline void
mlx5e_xdpi_fifo_push(struct mlx5e_xdp_info_fifo *fifo,
struct mlx5e_xdp_info *xi)
{
u32 i = (*fifo->pc)++ & fifo->mask;
fifo->xi[i] = *xi;
}
static inline struct mlx5e_xdp_info
mlx5e_xdpi_fifo_pop(struct mlx5e_xdp_info_fifo *fifo)
{
return fifo->xi[(*fifo->cc)++ & fifo->mask];
}
#endif

View File

@ -1672,12 +1672,40 @@ static int set_pflag_rx_no_csum_complete(struct net_device *netdev, bool enable)
return 0;
}
static int set_pflag_xdp_tx_mpwqe(struct net_device *netdev, bool enable)
{
struct mlx5e_priv *priv = netdev_priv(netdev);
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5e_channels new_channels = {};
int err;
if (enable && !MLX5_CAP_ETH(mdev, enhanced_multi_pkt_send_wqe))
return -EOPNOTSUPP;
new_channels.params = priv->channels.params;
MLX5E_SET_PFLAG(&new_channels.params, MLX5E_PFLAG_XDP_TX_MPWQE, enable);
if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
priv->channels.params = new_channels.params;
return 0;
}
err = mlx5e_open_channels(priv, &new_channels);
if (err)
return err;
mlx5e_switch_priv_channels(priv, &new_channels, NULL);
return 0;
}
static const struct pflag_desc mlx5e_priv_flags[MLX5E_NUM_PFLAGS] = {
{ "rx_cqe_moder", set_pflag_rx_cqe_based_moder },
{ "tx_cqe_moder", set_pflag_tx_cqe_based_moder },
{ "rx_cqe_compress", set_pflag_rx_cqe_compress },
{ "rx_striding_rq", set_pflag_rx_striding_rq },
{ "rx_no_csum_complete", set_pflag_rx_no_csum_complete },
{ "xdp_tx_mpwqe", set_pflag_xdp_tx_mpwqe },
};
static int mlx5e_handle_pflag(struct net_device *netdev,

View File

@ -61,6 +61,7 @@ struct mlx5e_rq_param {
struct mlx5e_sq_param {
u32 sqc[MLX5_ST_SZ_DW(sqc)];
struct mlx5_wq_param wq;
bool is_mpw;
};
struct mlx5e_cq_param {
@ -992,18 +993,42 @@ static void mlx5e_close_rq(struct mlx5e_rq *rq)
static void mlx5e_free_xdpsq_db(struct mlx5e_xdpsq *sq)
{
kvfree(sq->db.xdpi);
kvfree(sq->db.xdpi_fifo.xi);
kvfree(sq->db.wqe_info);
}
static int mlx5e_alloc_xdpsq_fifo(struct mlx5e_xdpsq *sq, int numa)
{
struct mlx5e_xdp_info_fifo *xdpi_fifo = &sq->db.xdpi_fifo;
int wq_sz = mlx5_wq_cyc_get_size(&sq->wq);
int dsegs_per_wq = wq_sz * MLX5_SEND_WQEBB_NUM_DS;
xdpi_fifo->xi = kvzalloc_node(sizeof(*xdpi_fifo->xi) * dsegs_per_wq,
GFP_KERNEL, numa);
if (!xdpi_fifo->xi)
return -ENOMEM;
xdpi_fifo->pc = &sq->xdpi_fifo_pc;
xdpi_fifo->cc = &sq->xdpi_fifo_cc;
xdpi_fifo->mask = dsegs_per_wq - 1;
return 0;
}
static int mlx5e_alloc_xdpsq_db(struct mlx5e_xdpsq *sq, int numa)
{
int wq_sz = mlx5_wq_cyc_get_size(&sq->wq);
int err;
sq->db.xdpi = kvzalloc_node(array_size(wq_sz, sizeof(*sq->db.xdpi)),
sq->db.wqe_info = kvzalloc_node(sizeof(*sq->db.wqe_info) * wq_sz,
GFP_KERNEL, numa);
if (!sq->db.xdpi) {
mlx5e_free_xdpsq_db(sq);
if (!sq->db.wqe_info)
return -ENOMEM;
err = mlx5e_alloc_xdpsq_fifo(sq, numa);
if (err) {
mlx5e_free_xdpsq_db(sq);
return err;
}
return 0;
@ -1562,11 +1587,8 @@ static int mlx5e_open_xdpsq(struct mlx5e_channel *c,
struct mlx5e_xdpsq *sq,
bool is_redirect)
{
unsigned int ds_cnt = MLX5E_XDP_TX_DS_COUNT;
struct mlx5e_create_sq_param csp = {};
unsigned int inline_hdr_sz = 0;
int err;
int i;
err = mlx5e_alloc_xdpsq(c, params, param, sq, is_redirect);
if (err)
@ -1577,13 +1599,18 @@ static int mlx5e_open_xdpsq(struct mlx5e_channel *c,
csp.cqn = sq->cq.mcq.cqn;
csp.wq_ctrl = &sq->wq_ctrl;
csp.min_inline_mode = sq->min_inline_mode;
if (is_redirect)
set_bit(MLX5E_SQ_STATE_REDIRECT, &sq->state);
set_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
err = mlx5e_create_sq_rdy(c->mdev, param, &csp, &sq->sqn);
if (err)
goto err_free_xdpsq;
mlx5e_set_xmit_fp(sq, param->is_mpw);
if (!param->is_mpw) {
unsigned int ds_cnt = MLX5E_XDP_TX_DS_COUNT;
unsigned int inline_hdr_sz = 0;
int i;
if (sq->min_inline_mode != MLX5_INLINE_MODE_NONE) {
inline_hdr_sz = MLX5E_XDP_MIN_INLINE;
ds_cnt++;
@ -1591,6 +1618,7 @@ static int mlx5e_open_xdpsq(struct mlx5e_channel *c,
/* Pre initialize fixed WQE fields */
for (i = 0; i < mlx5_wq_cyc_get_size(&sq->wq); i++) {
struct mlx5e_xdp_wqe_info *wi = &sq->db.wqe_info[i];
struct mlx5e_tx_wqe *wqe = mlx5_wq_cyc_get_wqe(&sq->wq, i);
struct mlx5_wqe_ctrl_seg *cseg = &wqe->ctrl;
struct mlx5_wqe_eth_seg *eseg = &wqe->eth;
@ -1601,6 +1629,10 @@ static int mlx5e_open_xdpsq(struct mlx5e_channel *c,
dseg = (struct mlx5_wqe_data_seg *)cseg + (ds_cnt - 1);
dseg->lkey = sq->mkey_be;
wi->num_wqebbs = 1;
wi->num_ds = 1;
}
}
return 0;
@ -1612,7 +1644,7 @@ static int mlx5e_open_xdpsq(struct mlx5e_channel *c,
return err;
}
static void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq)
static void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq)
{
struct mlx5e_channel *c = sq->channel;
@ -1620,7 +1652,7 @@ static void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq)
napi_synchronize(&c->napi);
mlx5e_destroy_sq(c->mdev, sq->sqn);
mlx5e_free_xdpsq_descs(sq);
mlx5e_free_xdpsq_descs(sq, rq);
mlx5e_free_xdpsq(sq);
}
@ -2008,7 +2040,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
err_close_xdp_sq:
if (c->xdp)
mlx5e_close_xdpsq(&c->rq.xdpsq);
mlx5e_close_xdpsq(&c->rq.xdpsq, &c->rq);
err_close_sqs:
mlx5e_close_sqs(c);
@ -2061,10 +2093,10 @@ static void mlx5e_deactivate_channel(struct mlx5e_channel *c)
static void mlx5e_close_channel(struct mlx5e_channel *c)
{
mlx5e_close_xdpsq(&c->xdpsq);
mlx5e_close_xdpsq(&c->xdpsq, NULL);
mlx5e_close_rq(&c->rq);
if (c->xdp)
mlx5e_close_xdpsq(&c->rq.xdpsq);
mlx5e_close_xdpsq(&c->rq.xdpsq, &c->rq);
mlx5e_close_sqs(c);
mlx5e_close_icosq(&c->icosq);
napi_disable(&c->napi);
@ -2309,6 +2341,7 @@ static void mlx5e_build_xdpsq_param(struct mlx5e_priv *priv,
mlx5e_build_sq_param_common(priv, param);
MLX5_SET(wq, wq, log_wq_sz, params->log_sq_size);
param->is_mpw = MLX5E_GET_PFLAG(params, MLX5E_PFLAG_XDP_TX_MPWQE);
}
static void mlx5e_build_channel_param(struct mlx5e_priv *priv,
@ -4562,6 +4595,10 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE :
MLX5E_PARAMS_DEFAULT_LOG_SQ_SIZE;
/* XDP SQ */
MLX5E_SET_PFLAG(params, MLX5E_PFLAG_XDP_TX_MPWQE,
MLX5_CAP_ETH(mdev, enhanced_multi_pkt_send_wqe));
/* set CQE compression */
params->rx_cqe_compress_def = false;
if (MLX5_CAP_GEN(mdev, cqe_compression) &&

View File

@ -1190,7 +1190,6 @@ void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
{
struct mlx5e_rq *rq = container_of(cq, struct mlx5e_rq, cq);
struct mlx5e_xdpsq *xdpsq = &rq->xdpsq;
struct mlx5_cqe64 *cqe;
int work_done = 0;
@ -1221,15 +1220,8 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
} while ((++work_done < budget) && (cqe = mlx5_cqwq_get_cqe(&cq->wq)));
out:
if (xdpsq->doorbell) {
mlx5e_xmit_xdp_doorbell(xdpsq);
xdpsq->doorbell = false;
}
if (xdpsq->redirect_flush) {
xdp_do_flush_map();
xdpsq->redirect_flush = false;
}
if (rq->xdp_prog)
mlx5e_xdp_rx_poll_complete(rq);
mlx5_cqwq_update_db_record(&cq->wq);

View File

@ -459,9 +459,10 @@ static void mlx5e_dump_error_cqe(struct mlx5e_txqsq *sq,
u32 ci = mlx5_cqwq_get_ci(&sq->cq.wq);
netdev_err(sq->channel->netdev,
"Error cqe on cqn 0x%x, ci 0x%x, sqn 0x%x, syndrome 0x%x, vendor syndrome 0x%x\n",
sq->cq.mcq.cqn, ci, sq->sqn, err_cqe->syndrome,
err_cqe->vendor_err_synd);
"Error cqe on cqn 0x%x, ci 0x%x, sqn 0x%x, opcode 0x%x, syndrome 0x%x, vendor syndrome 0x%x\n",
sq->cq.mcq.cqn, ci, sq->sqn,
get_cqe_opcode((struct mlx5_cqe64 *)err_cqe),
err_cqe->syndrome, err_cqe->vendor_err_synd);
mlx5_dump_err_cqe(sq->cq.mdev, err_cqe);
}

View File

@ -76,6 +76,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
struct mlx5e_channel *c = container_of(napi, struct mlx5e_channel,
napi);
struct mlx5e_ch_stats *ch_stats = c->stats;
struct mlx5e_rq *rq = &c->rq;
bool busy = false;
int work_done = 0;
int i;
@ -85,17 +86,17 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
for (i = 0; i < c->num_tc; i++)
busy |= mlx5e_poll_tx_cq(&c->sq[i].cq, budget);
busy |= mlx5e_poll_xdpsq_cq(&c->xdpsq.cq);
busy |= mlx5e_poll_xdpsq_cq(&c->xdpsq.cq, NULL);
if (c->xdp)
busy |= mlx5e_poll_xdpsq_cq(&c->rq.xdpsq.cq);
busy |= mlx5e_poll_xdpsq_cq(&rq->xdpsq.cq, rq);
if (likely(budget)) { /* budget=0 means: don't poll rx rings */
work_done = mlx5e_poll_rx_cq(&c->rq.cq, budget);
work_done = mlx5e_poll_rx_cq(&rq->cq, budget);
busy |= work_done == budget;
}
busy |= c->rq.post_wqes(&c->rq);
busy |= c->rq.post_wqes(rq);
if (busy) {
if (likely(mlx5e_channel_no_affinity_change(c)))
@ -115,9 +116,9 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
mlx5e_cq_arm(&c->sq[i].cq);
}
mlx5e_handle_rx_dim(&c->rq);
mlx5e_handle_rx_dim(rq);
mlx5e_cq_arm(&c->rq.cq);
mlx5e_cq_arm(&rq->cq);
mlx5e_cq_arm(&c->icosq.cq);
mlx5e_cq_arm(&c->xdpsq.cq);

View File

@ -421,6 +421,7 @@ enum {
MLX5_OPCODE_ATOMIC_MASKED_FA = 0x15,
MLX5_OPCODE_BIND_MW = 0x18,
MLX5_OPCODE_CONFIG_CMD = 0x1f,
MLX5_OPCODE_ENHANCED_MPSW = 0x29,
MLX5_RECV_OPCODE_RDMA_WRITE_IMM = 0x00,
MLX5_RECV_OPCODE_SEND = 0x01,