This is one step in allowing phylib to make use of link_mode bitmaps,
instead of u32 for supported and advertised features. Convert the phy
drivers to use bitmaps to indicates the features they support.
Build bitmap equivalents of the u32 values at runtime, and have the
drivers point to the appropriate bitmap. These bitmaps are shared, and
we don't want a driver to modify them. So mark them __ro_after_init.
Within phylib, the features bitmap is currently turned back into a
u32. This will be removed once the whole of phylib, and the drivers
are converted to use bitmaps.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mvneta controller can handle speeds up to 2500Mbps on the SGMII
interface. This relies on serdes configuration, the lane must be
configured at 3.125Gbps and we can't use in-band autoneg at that speed.
The main issue when supporting that speed on this particular controller
is that the link partner can send ethernet frames with a shortened
preamble, which if not explicitly enabled in the controller will cause
unexpected behaviours.
This was tested on Armada 385, with the comphy configuration done in
bootloader.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Version bump conflict in batman-adv, take what's in net-next.
iavf conflict, adjustment of netdev_ops in net-next conflicting
with poll controller method removal in net.
Signed-off-by: David S. Miller <davem@davemloft.net>
With CONFIG_DMA_API_DEBUG enabled we get DMA unmapping warning in
various places of the mvneta driver, for example when putting down an
interface while traffic is passing through.
The issue is when using s/w buffer management, the Rx buffers are mapped
using dma_map_page but unmapped with dma_unmap_single. This patch fixes
this by using the right unmapping function.
Fixes: 562e2f467e ("net: mvneta: Improve the buffer allocation method for SWBM")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit allows each TXQ to be picked in a round-robin fashion by
the PPv2 transmit scheduling mechanism. This is opposed to the default
behaviour that prioritizes the highest numbered queues.
Suggested-by: Yan Markman <ymarkman@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the PPv2 controller has multiple TX queues, we can spread traffic
by assining TX queues to CPUs, allowing to use XPS to balance egress
traffic between CPUs.
Suggested-by : Yan Markman <ymarkman@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With CONFIG_DMA_API_DEBUG enabled we now get a warning when using the
mvneta driver:
mvneta d0030000.ethernet: DMA-API: device driver frees DMA memory with
wrong function [device address=0x000000001165b000] [size=4096 bytes]
[mapped as page] [unmapped as single]
This is because when using the s/w buffer management, the Rx descriptor
buffer is mapped with dma_map_page but unmapped with dma_unmap_single.
This patch fixes this by using the right unmapping function.
Fixes: 562e2f467e ("net: mvneta: Improve the buffer allocation method for SWBM")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
Found by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the explicit call to netif_carrier_off() in
mvneta_open() as this is already handled in phylink_start().
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the explicit call to netif_carrier_off() in PPv2's
open() path, as this is now handled in phylink_start().
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the mvpp2_percpu_read/write/... functions aren't really per-cpu but
per s/w thread, rename them to include 'thread' instead of 'percpu'.
This is a cosmetic patch.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Marvell PPv2 network controller has 9 internal threads. The driver
works fine when there are less CPUs available than threads. This isn't
true if more CPUs are available. As this is a valid use case, handle
this particular case.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch maps all uses of the CPU to threads. All this_cpu calls are
replaced, and all smp_processor_id() calls are wrapped into the
indirection.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reworks the Marvell PPv2 driver to stop using directly the
CPU number to access per-thread registers.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the Marvell PPv2 driver the mvpp2_read_relaxed function is only used
in a single file. Make it static and remove its prototype from the
header.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Marvell PPv2 driver has per-cpu functions. As they only are used in
the main file, make them static and remove their prototype from the
header.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updates the PPv2 driver so that all CPU variables are unsigned, as it
makes no sense to have a negative CPU number. This patch is cosmetic.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Marvell PPv2.2 engine only has 8 Rx queues per CPU, while PPv2.1 has
16 of them. This patch updates the code so that the Rx queues mask width
is selected given the version of the network controller used.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the probing function so that the queue mode isn't
updated while probing, as the driver would silently end up using a
configuration not wanted by the user. The patch adds an extra check to
validate the chosen queue mode instead, and the driver will fail to
probe if the configuration is invalid.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch renames the IRQs in the Marvell PPv2 driver as their current
names match the way they are used in software. But this will change in
the future, and those IRQs have nothing to do with Rx/Tx interrupts
(this can be configured). The new binding also describe more interrupts
as some where left out.
The old binding support is kept for backward compatibility.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch sets the number of s/w threads to 9, its maximum value,
instead of 8. This is not a fix as only 4 of the s/w threads were used
so far, but more could be used in the future.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When no Tx IRQ is available, the txq_done() routine (called from
tx_done()) shouldn't be called from the polling function, as in such
case it is already called in the Tx path thanks to an hrtimer. This
mostly occurred when using PPv2.1, as the engine then do not have Tx
IRQs.
Fixes: edc660fa09 ("net: mvpp2: replace TX coalescing interrupts with hrtimer")
Reported-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two new tls tests added in parallel in both net and net-next.
Used Stephen Rothwell's linux-next resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
Net drivers using phylink shouldn't mess with the link carrier
themselves and should let phylink manage it. The mvpp2 driver wasn't
following this best practice as the mac_config() function made calls to
change the link carrier state. This led to wrongly reported carrier link
state which then triggered other issues. This patch fixes this
behaviour.
But the PPv2 driver relied on this misbehaviour in two cases: for fixed
links and when not using phylink (ACPI mode). The later was fixed by
adding an explicit call to link_up(), which when the ACPI mode will use
phylink should be removed.
The fixed link case was relying on the mac_config() function to set the
link up, as we found an issue in phylink_start() which assumes the
carrier is off. If not, the link_up() function is never called. To fix
this, a call to netif_carrier_off() is added just before phylink_start()
so that we do not introduce a regression in the driver.
Fixes: 4bb0432628 ("net: mvpp2: phylink support")
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the loop of mvneta_tx_done_gbe(), we call the smp_processor_id()
each time, move the call out of the loop to optimize the code a bit.
Before the patch, the loop looks like(under arm64):
ldr x1, [x29,#120]
...
ldr w24, [x1,#36]
...
bl 0 <_raw_spin_lock>
str w24, [x27,#132]
...
After the patch, the loop looks like(under arm64):
...
bl 0 <_raw_spin_lock>
str w23, [x28,#132]
...
where w23 is loaded so be ready before the loop.
>From another side, mvneta_tx_done_gbe() is called from mvneta_poll()
which is in non-preemptible context, so it's safe to call the
smp_processor_id() function once.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code and HW supports NETIF_F_RXCSUM, so let's enable it by default.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi_gro_receive() checks NETIF_F_GRO bit as well, if the bit is not
set, we will go through GRO_NORMAL in napi_skb_finish(), so fall back
to netif_receive_skb_internal(), so we don't need to check NETIF_F_GRO
ourself.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without a valid of_node in struct device we can't find the mvpp2 port
device by its DT node. Specifically, this breaks
of_find_net_device_by_node().
For example, the Armada 8040 based Clearfog GT-8K uses Marvell 88E6141
switch connected to the &cp1_eth2 port:
&cp1_mdio {
...
switch0: switch0@4 {
compatible = "marvell,mv88e6085";
...
ports {
...
port@5 {
reg = <5>;
label = "cpu";
ethernet = <&cp1_eth2>;
};
};
};
};
Without this patch, dsa_register_switch() returns -EPROBE_DEFER because
of_find_net_device_by_node() can't find the device_node of the &cp1_eth2
device.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to remove the node name pointer from struct device_node,
convert printf users to use the %pOFn format specifier.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Cc: John Crispin <john@phrozen.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Nelson Chang <nelson.chang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Acked-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mvneta Ethernet driver is used on a few different Marvell SoCs.
Some SoCs have per cpu interrupts for Ethernet events, the driver uses
a per CPU napi structure for this case. Some SoCs such as armada 3700
have a single interrupt for Ethernet events, the driver uses a global
napi structure for this case.
Current mvneta_config_rss() always operates the per cpu napi structure.
Fix it by operating a global napi for "single interrupt" case, and per
cpu napi structure for remaining cases.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Fixes: 2636ac3cc2 ("net: mvneta: Add network support for Armada 3700 SoC")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
With system having a small memory (around 256MB), the state "cannot
allocate memory to refill with new buffer" is reach pretty quickly.
By this patch we changed buffer allocation method to a better handling of
this use case by avoiding memory allocation issues.
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
[gregory: extract from a larger patch]
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the checksum offload feature is not set, then there is no point to
check the status of the hardware.
[gregory: extract from a larger patch]
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of trying to allocate the exact amount of memory for each
descriptor use a page for each of them, it allows to simplify the
allocation management and increase the performance of the driver.
Based on the work of Yelena Krivosheev <yelena@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to improve the diagnostic in case of error, make the distinction
between refill error and skb allocation error. Also make the information
available through the ethtool state.
Based on the work of Yelena Krivosheev <yelena@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial values were too small leading to poor performance when using
the software buffer management.
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
[gregory: extract from a larger patch]
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On year ago Rob Herring wanted to remove the data pointer from the
device_node structure[1]. The mvneta driver seemed to be the only one
which used (abused ?) it. However, the proposal of Rob to remove this
pointer from the driver introduced a regression, and I tested and fixed an
alternative way, but it was never submitted as a proper patch.
Now here it is: Instead of using the device_node structure ->data
pointer, we store the BM private data as the driver data of the BM
platform_device. The core mvneta code can retrieve it by doing a lookup
on which platform_device corresponds to the BM device tree node using
of_find_device_by_node(), and get its driver data
[1]https://www.spinics.net/lists/netdev/msg445197.html
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is incorrect to enable TX/RX queues (call by mvneta_port_up()) for
port without link. Indeed MTU change for interface without link causes TX
queues to stuck.
Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP
network unit")
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
[gregory.clement: adding Fixes tags and rewording commit log]
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mvneta Ethernet driver is used on a few different Marvell SoCs.
Some SoCs have per cpu interrupts for Ethernet events. Some SoCs have
a single interrupt, independent of the CPU. The driver handles this by
having a per CPU napi structure when there are per CPU interrupts, and
a global napi structure when there is a single interrupt.
When the napi core calls mvneta_poll(), it passes the napi
instance. This was not being propagated through the call chain, and
instead the per-cpu napi instance was passed to napi_gro_receive()
call. This breaks when there is a single global napi instance.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 2636ac3cc2 ("net: mvneta: Add network support for Armada 3700 SoC")
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of the | operator always leads to true, which looks rather
suspect in this case.
Fix this by using & instead.
Addresses-Coverity-ID: 1471903 ("Wrong operator used")
Fixes: dba1d918da ("net: mvpp2: debugfs: add entries for classifier flows")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The classification operations that are used for RSS make use of several
lookup tables. Having hit counters for these tables is really helpful
to determine what flows were matched by ingress traffic, and see the
path of packets among all the classifier tables.
This commit adds hit counters for the 3 tables used at the moment :
- The decoding table (also called lookup_id table), that links flows
identified by the Header Parser to the flow table.
There's one entry per flow, located at :
.../mvpp2/<controller>/flows/XX/dec_hits
Note that there are 21 flows in the decoding table, whereas there are
52 flows in the Header Parser. That's because there are several kind
of traffic that will match a given flow. Reading the hit counter from
one sub-flow will clear all hit counter that have the same flow_id.
This also applies to the flow_hits.
- The flow table, that contains all the different lookups to be
performed by the classifier for each packet of a given flow. The match
is done on the first entry of the flow sequence.
- The C2 engine entries, that are used to assign the default rx queue,
and enable or disable RSS for a given port.
There's one entry per flow, located at:
.../mvpp2/<controller>/flows/XX/flow_hits
There is one C2 entry per port, so the c2 hit counter is located at :
.../mvpp2/<controller>/ethX/c2_hits
All hit counter values are 16-bits clear-on-read values.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The classifier configuration for RSS is quite complex, with several
lookup tables being used. This commit adds useful info in debugfs to
see how the different tables are configured :
Added 2 new entries in the per-port directory :
- .../eth0/default_rxq : The default rx queue on that port
- .../eth0/rss_enable : Indicates if RSS is enabled in the C2 entry
Added the 'flows' directory :
It contains one entry per sub-flow. a 'sub-flow' is a unique path from
Header Parser to the flow table. Multiple sub-flows can point to the
same 'flow' (each flow has an id from 8 to 29, which is its index in the
Lookup Id table) :
- .../flows/00/...
/01/...
...
/51/id : The flow id. There are 21 unique flows. There's one
flow per combination of the following parameters :
- L4 protocol (TCP, UDP, none)
- L3 protocol (IPv4, IPv6)
- L3 parameters (Fragmented or not)
- L2 parameters (Vlan tag presence or not)
.../type : The flow type. This is an even higher level flow,
that we manipulate with ethtool. It can be :
"udp4" "tcp4" "udp6" "tcp6" "ipv4" "ipv6" "other".
.../eth0/...
.../eth1/engine : The hash generation engine used for this
flow on the given port
.../hash_opts : The hash generation options indicating on
what data we base the hash (vlan tag, src
IP, src port, etc.)
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One helpful feature to help debug the Header Parser TCAM filter in PPv2
is to be able to see if the entries did match something when a packet
comes in. This can be done by using the built-in hit counter for TCAM
entries.
This commit implements reading the counter, and exposing its value on
debugfs for each filter entry.
The counter is a 16-bits clear-on-read value, located at:
.../mvpp2/<controller>/parser/XXX/hits
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marvell PPv2 Packer Header Parser has a TCAM based filter, that is not
trivial to configure and debug. Being able to dump TCAM entries from
userspace can be really helpful to help development of new features
and debug existing ones.
This commit adds a basic debugfs interface for the PPv2 driver, focusing
on TCAM related features.
<mnt>/mvpp2/ --- f2000000.ethernet
\- f4000000.ethernet --- parser --- 000 ...
| \- 001
| \- ...
| \- 255 --- ai
| \- header_data
| \- lookup_id
| \- sram
| \- valid
\- eth1 ...
\- eth2 --- mac_filter
\- parser_entries
\- vid_filter
There's one directory per PPv2 instance, named after pdev->name to make
sure names are uniques. In each of these directories, there's :
- one directory per interface on the controller, each containing :
- "mac_filter", which lists all filtered addresses for this port
(based on TCAM, not on the kernel's uc / mc lists)
- "parser_entries", which lists the indices of all valid TCAM
entries that have this port in their port map
- "vid_filter", which lists the vids allowed on this port, based on
TCAM
- one "parser" directory (the parser is common to all ports), containing :
- one directory per TCAM entry (256 of them, from 0 to 255), each
containing :
- "ai" : Contains the 1 byte Additional Info field from TCAM, and
- "header_data" : Contains the 8 bytes Header Data extracted from
the packet
- "lookup_id" : Contains the 4 bits LU_ID
- "sram" : contains the raw SRAM data, which is the result of the TCAM
lookup. This readonly at the moment.
- "valid" : Indicates if the entry is valid of not.
All entries are read-only, and everything is output in hex form.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the appropriate SPDX license identifiers and drop the license text.
This patch is only cosmetic.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: f9358e12a0 ("net: mvpp2: split ingress traffic into multiple flows")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit allows setting the RSS hash generation parameters from
ethtool. When setting parameters for a given flow type from ethtool
(e.g. tcp4), all the corresponding flows in the flow table are updated,
according to the supported hash parameters.
For example, when configuring TCP over IPv4 hash parameters to be
src/dst IP + src/dst port ("ethtool -N eth0 rx-flow-hash tcp4 sdfn"),
we only set the "src/dst port" hash parameters on the non-fragmented TCP
over IPv4 flows.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the classification action that can be performed is to compute a
hash of the packet header based on some header fields, and lookup a RSS
table based on this hash to determine the final RxQ.
This is done by adding one lookup entry per flow per port, so that we
can configure the hash generation parameters for each flow and each
port.
There are 2 possible engines that can be used for RSS hash generation :
- C3HA, that generates a hash based on up to 4 header-extracted fields
- C3HB, that does the same as c3HA, but also includes L4 info in the hash
There are a lot of fields that can be extracted from the header. For now,
we only use the ones that we can configure using ethtool :
- DST MAC address
- L3 info
- Source IP
- Destination IP
- Source port
- Destination port
The C3HB engine is selected when we use L4 fields (src/dst port).
Header parser Dec table
Ingress pkt +-------------+ flow id +----------------------------+
------------->| TCAM + SRAM |-------->|TCP IPv4 w/ VLAN, not frag |
+-------------+ |TCP IPv4 w/o VLAN, not frag |
|TCP IPv4 w/ VLAN, frag |--+
|etc. | |
+----------------------------+ |
|
Flow table |
+---------+ +------------+ +--------------------------+ |
| RSS tbl |<--| Classifier |<--------| flow 0: C2 lookup | |
+---------+ +------------+ | C3 lookup port 0 | |
| | | C3 lookup port 1 | |
+-----------+ +-------------+ | ... | |
| C2 engine | | C3H engines | | flow 1: C2 lookup |<--+
+-----------+ +-------------+ | C3 lookup port 0 |
| ... |
| ... |
| flow 51 : C2 lookup |
| ... |
+--------------------------+
The C2 engine also gains the role of enabling and disabling the RSS
table lookup for this packet.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPv2 classifier allows to perform classification operations on each
ingress packet, based on the flow the packet is assigned to.
The current code uses only 1 flow per port, and the only classification
action consists of assigning the rx queue to the packet, depending on the
port.
In preparation for adding RSS support, we have to split all incoming
traffic into different flows. Since RSS assigns a rx queue depending on
the hash of some header fields, we have to make sure that the hash is
generated in a consistent way for all packets in the same flow.
What we call a "flow" is actually a set of attributes attached to a
packet that depends on various L2/L3/L4 info.
This patch introduces 52 flows, wich are a combination of various L2, L3
and L4 attributes :
- Whether or not the packet has a VLAN tag
- Whether the packet is IPv4, IPv6 or something else
- Whether the packet is TCP, UDP or something else
- Whether or not the packet is fragmented at L3 level.
The flow is associated to a packet by the Header Parser. Each flow
corresponds to an entry in the decoding table. This entry then points to
the sequence of classification lookups to be performed by the
classifier, represented in the flow table.
For now, the only lookup we perform is a C2 lookup to set the default
rx queue.
Header parser Dec table
Ingress pkt +-------------+ flow id +----------------------------+
------------->| TCAM + SRAM |-------->|TCP IPv4 w/ VLAN, not frag |
+-------------+ |TCP IPv4 w/o VLAN, not frag |
|TCP IPv4 w/ VLAN, frag |--+
|etc. | |
+----------------------------+ |
|
Flow table |
+------------+ +---------------------+ |
To RxQ <---| Classifier |<-------| flow 0: C2 lookup |<--------+
+------------+ | flow 1: C2 lookup |
| | ... |
+------------+ | flow 51 : C2 lookup |
| C2 engine | +---------------------+
+------------+
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPv2 Controller has a classifier, that can perform multiple lookup
operations for each packet, using different engines.
One of these engines is the C2 engine, which performs TCAM based lookups
on data extracted from the packet header. When a packet matches an
entry, the engine sets various attributes, used to perform
classification operations.
One of these attributes is the rx queue in which the packet should be sent.
The current code uses the lookup_id table (also called decoding table)
to assign the rx queue. However, this only works if we use one entry per
port in the decoding table, which won't be the case once we add RSS
lookups.
This patch uses the C2 engine to assign the rx queue to each packet.
The C2 engine is used through the flow table, which dictates what
classification operations are done for a given flow.
Right now, we have one flow per port, which contains every ingress
packet for this port.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>