linux/net/dsa/dsa_priv.h

265 lines
7.5 KiB
C
Raw Normal View History

/* SPDX-License-Identifier: GPL-2.0-or-later */
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
/*
* net/dsa/dsa_priv.h - Hardware switch handling
dsa: add switch chip cascading support The initial version of the DSA driver only supported a single switch chip per network interface, while DSA-capable switch chips can be interconnected to form a tree of switch chips. This patch adds support for multiple switch chips on a network interface. An example topology for a 16-port device with an embedded CPU is as follows: +-----+ +--------+ +--------+ | |eth0 10| switch |9 10| switch | | CPU +----------+ +-------+ | | | | chip 0 | | chip 1 | +-----+ +---++---+ +---++---+ || || || || ||1000baseT ||1000baseT ||ports 1-8 ||ports 9-16 This requires a couple of interdependent changes in the DSA layer: - The dsa platform driver data needs to be extended: there is still only one netdevice per DSA driver instance (eth0 in the example above), but each of the switch chips in the tree needs its own mii_bus device pointer, MII management bus address, and port name array. (include/net/dsa.h) The existing in-tree dsa users need some small changes to deal with this. (arch/arm) - The DSA and Ethertype DSA tagging modules need to be extended to use the DSA device ID field on receive and demultiplex the packet accordingly, and fill in the DSA device ID field on transmit according to which switch chip the packet is heading to. (net/dsa/tag_{dsa,edsa}.c) - The concept of "CPU port", which is the switch chip port that the CPU is connected to (port 10 on switch chip 0 in the example), needs to be extended with the concept of "upstream port", which is the port on the switch chip that will bring us one hop closer to the CPU (port 10 for both switch chips in the example above). - The dsa platform data needs to specify which ports on which switch chips are links to other switch chips, so that we can enable DSA tagging mode on them. (For inter-switch links, we always use non-EtherType DSA tagging, since it has lower overhead. The CPU link uses dsa or edsa tagging depending on what the 'root' switch chip supports.) This is done by specifying "dsa" for the given port in the port array. - The dsa platform data needs to be extended with information on via which port to reach any given switch chip from any given switch chip. This info is specified via the per-switch chip data struct ->rtable[] array, which gives the nexthop ports for each of the other switches in the tree. For the example topology above, the dsa platform data would look something like this: static struct dsa_chip_data sw[2] = { { .mii_bus = &foo, .sw_addr = 1, .port_names[0] = "p1", .port_names[1] = "p2", .port_names[2] = "p3", .port_names[3] = "p4", .port_names[4] = "p5", .port_names[5] = "p6", .port_names[6] = "p7", .port_names[7] = "p8", .port_names[9] = "dsa", .port_names[10] = "cpu", .rtable = (s8 []){ -1, 9, }, }, { .mii_bus = &foo, .sw_addr = 2, .port_names[0] = "p9", .port_names[1] = "p10", .port_names[2] = "p11", .port_names[3] = "p12", .port_names[4] = "p13", .port_names[5] = "p14", .port_names[6] = "p15", .port_names[7] = "p16", .port_names[10] = "dsa", .rtable = (s8 []){ 10, -1, }, }, }, static struct dsa_platform_data pd = { .netdev = &foo, .nr_switches = 2, .sw = sw, }; Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Gary Thomas <gary@mlbassoc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-20 17:52:09 +08:00
* Copyright (c) 2008-2009 Marvell Semiconductor
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
*/
#ifndef __DSA_PRIV_H
#define __DSA_PRIV_H
net: dsa: untag the bridge pvid from rx skbs Currently the bridge untags VLANs present in its VLAN groups in __allowed_ingress() only when VLAN filtering is enabled. But when a skb is seen on the RX path as tagged with the bridge's pvid, and that bridge has vlan_filtering=0, and there isn't any 8021q upper with that VLAN either, then we have a problem. The bridge will not untag it (since it is supposed to remain VLAN-unaware), and pvid-tagged communication will be broken. There are 2 situations where we can end up like that: 1. When installing a pvid in egress-tagged mode, like this: ip link add dev br0 type bridge vlan_filtering 0 ip link set swp0 master br0 bridge vlan del dev swp0 vid 1 bridge vlan add dev swp0 vid 1 pvid This happens because DSA configures the VLAN membership of the CPU port using the same flags as swp0 (in this case "pvid and not untagged"), in an attempt to copy the frame as-is from ingress to the CPU. However, in this case, the packet may arrive untagged on ingress, it will be pvid-tagged by the ingress port, and will be sent as egress-tagged towards the CPU. Otherwise stated, the CPU will see a VLAN tag where there was none to speak of on ingress. When vlan_filtering is 1, this is not a problem, as stated in the first paragraph, because __allowed_ingress() will pop it. But currently, when vlan_filtering is 0 and we have such a VLAN configuration, we need an 8021q upper (br0.1) to be able to ping over that VLAN, which is not symmetrical with the vlan_filtering=1 case, and therefore, confusing for users. Basically what DSA attempts to do is simply an approximation: try to copy the skb with (or without) the same VLAN all the way up to the CPU. But DSA drivers treat CPU port VLAN membership in various ways (which is a good segue into situation 2). And some of those drivers simply tell the CPU port to copy the frame unmodified, which is the golden standard when it comes to VLAN processing (therefore, any driver which can configure the hardware to do that, should do that, and discard the VLAN flags requested by DSA on the CPU port). 2. Some DSA drivers always configure the CPU port as egress-tagged, in an attempt to recover the classified VLAN from the skb. These drivers cannot work at all with untagged traffic when bridged in vlan_filtering=0 mode. And they can't go for the easy "just keep the pvid as egress-untagged towards the CPU" route, because each front port can have its own pvid, and that might require conflicting VLAN membership settings on the CPU port (swp1 is pvid for VID 1 and egress-tagged for VID 2; swp2 is egress-taggeed for VID 1 and pvid for VID 2; with this simplistic approach, the CPU port, which is really a separate hardware entity and has its own VLAN membership settings, would end up being egress-untagged in both VID 1 and VID 2, therefore losing the VLAN tags of ingress traffic). So the only thing we can do is to create a helper function for resolving the problematic case (that is, a function which untags the bridge pvid when that is in vlan_filtering=0 mode), which taggers in need should call. It isn't called from the generic DSA receive path because there are drivers that fall neither in the first nor second category. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 05:40:37 +08:00
#include <linux/if_bridge.h>
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
#include <linux/phy.h>
#include <linux/netdevice.h>
#include <linux/netpoll.h>
#include <net/dsa.h>
net: dsa: add GRO support via gro_cells gro_cells lib is used by different encapsulating netdevices, such as geneve, macsec, vxlan etc. to speed up decapsulated traffic processing. CPU tag is a sort of "encapsulation", and we can use the same mechs to greatly improve overall DSA performance. skbs are passed to the GRO layer after removing CPU tags, so we don't need any new packet offload types as it was firstly proposed by me in the first GRO-over-DSA variant [1]. The size of struct gro_cells is sizeof(void *), so hot struct dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields remain in one 32-byte cacheline. The other positive side effect is that drivers for network devices that can be shipped as CPU ports of DSA-driven switches can now use napi_gro_frags() to pass skbs to kernel. Packets built that way are completely non-linear and are likely being dropped without GRO. This was tested on to-be-mainlined-soon Ethernet driver that uses napi_gro_frags(), and the overall performance was on par with the variant from [1], sometimes even better due to minimal overhead. net.core.gro_normal_batch tuning may help to push it to the limit on particular setups and platforms. iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup on 1.2 GHz MIPS board: 5.7-rc2 baseline: [ID] Interval Transfer Bitrate Retr [ 5] 0.00-120.01 sec 9.00 GBytes 644 Mbits/sec 413 sender [ 5] 0.00-120.00 sec 8.99 GBytes 644 Mbits/sec receiver Iface RX packets TX packets eth0 7097731 7097702 port0 426050 6671829 port1 6671681 425862 port1.218 6671677 425851 With this patch: [ID] Interval Transfer Bitrate Retr [ 5] 0.00-120.01 sec 12.2 GBytes 870 Mbits/sec 122 sender [ 5] 0.00-120.00 sec 12.2 GBytes 870 Mbits/sec receiver Iface RX packets TX packets eth0 9474792 9474777 port0 455200 353288 port1 9019592 455035 port1.218 353144 455024 v2: - Add some performance examples in the commit message; - No functional changes. [1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@dlink.ru/ Signed-off-by: Alexander Lobakin <bloodyreaper@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-21 21:41:08 +08:00
#include <net/gro_cells.h>
enum {
DSA_NOTIFIER_AGEING_TIME,
DSA_NOTIFIER_BRIDGE_JOIN,
DSA_NOTIFIER_BRIDGE_LEAVE,
DSA_NOTIFIER_FDB_ADD,
DSA_NOTIFIER_FDB_DEL,
DSA_NOTIFIER_MDB_ADD,
DSA_NOTIFIER_MDB_DEL,
DSA_NOTIFIER_VLAN_ADD,
DSA_NOTIFIER_VLAN_DEL,
net: dsa: configure the MTU for switch ports It is useful be able to configure port policers on a switch to accept frames of various sizes: - Increase the MTU for better throughput from the default of 1500 if it is known that there is no 10/100 Mbps device in the network. - Decrease the MTU to limit the latency of high-priority frames under congestion, or work around various network segments that add extra headers to packets which can't be fragmented. For DSA slave ports, this is mostly a pass-through callback, called through the regular ndo ops and at probe time (to ensure consistency across all supported switches). The CPU port is called with an MTU equal to the largest configured MTU of the slave ports. The assumption is that the user might want to sustain a bidirectional conversation with a partner over any switch port. The DSA master is configured the same as the CPU port, plus the tagger overhead. Since the MTU is by definition L2 payload (sans Ethernet header), it is up to each individual driver to figure out if it needs to do anything special for its frame tags on the CPU port (it shouldn't except in special cases). So the MTU does not contain the tagger overhead on the CPU port. However the MTU of the DSA master, minus the tagger overhead, is used as a proxy for the MTU of the CPU port, which does not have a net device. This is to avoid uselessly calling the .change_mtu function on the CPU port when nothing should change. So it is safe to assume that the DSA master and the CPU port MTUs are apart by exactly the tagger's overhead in bytes. Some changes were made around dsa_master_set_mtu(), function which was now removed, for 2 reasons: - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to do the same thing in DSA - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method That is to say, there's no need for this function in DSA, we can safely call dev_set_mtu() directly, take the rtnl lock when necessary, and just propagate whatever errors get reported (since the user probably wants to be informed). Some inspiration (mainly in the MTU DSA notifier) was taken from a vaguely similar patch from Murali and Florian, who are credited as co-developers down below. Co-developed-by: Murali Krishna Policharla <murali.policharla@broadcom.com> Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com> Co-developed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-28 03:55:42 +08:00
DSA_NOTIFIER_MTU,
};
/* DSA_NOTIFIER_AGEING_TIME */
struct dsa_notifier_ageing_time_info {
struct switchdev_trans *trans;
unsigned int ageing_time;
};
/* DSA_NOTIFIER_BRIDGE_* */
struct dsa_notifier_bridge_info {
struct net_device *br;
net: dsa: permit cross-chip bridging between all trees in the system One way of utilizing DSA is by cascading switches which do not all have compatible taggers. Consider the following real-life topology: +---------------------------------------------------------------+ | LS1028A | | +------------------------------+ | | | DSA master for Felix | | | |(internal ENETC port 2: eno2))| | | +------------+------------------------------+-------------+ | | | Felix embedded L2 switch | | | | | | | | +--------------+ +--------------+ +--------------+ | | | | |DSA master for| |DSA master for| |DSA master for| | | | | | SJA1105 1 | | SJA1105 2 | | SJA1105 3 | | | | | |(Felix port 1)| |(Felix port 2)| |(Felix port 3)| | | +--+-+--------------+---+--------------+---+--------------+--+--+ +-----------------------+ +-----------------------+ +-----------------------+ | SJA1105 switch 1 | | SJA1105 switch 2 | | SJA1105 switch 3 | +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+ |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3| +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+ The above can be described in the device tree as follows (obviously not complete): mscc_felix { dsa,member = <0 0>; ports { port@4 { ethernet = <&enetc_port2>; }; }; }; sja1105_switch1 { dsa,member = <1 1>; ports { port@4 { ethernet = <&mscc_felix_port1>; }; }; }; sja1105_switch2 { dsa,member = <2 2>; ports { port@4 { ethernet = <&mscc_felix_port2>; }; }; }; sja1105_switch3 { dsa,member = <3 3>; ports { port@4 { ethernet = <&mscc_felix_port3>; }; }; }; Basically we instantiate one DSA switch tree for every hardware switch in the system, but we still give them globally unique switch IDs (will come back to that later). Having 3 disjoint switch trees makes the tagger drivers "just work", because net devices are registered for the 3 Felix DSA master ports, and they are also DSA slave ports to the ENETC port. So packets received on the ENETC port are stripped of their stacked DSA tags one by one. Currently, hardware bridging between ports on the same sja1105 chip is possible, but switching between sja1105 ports on different chips is handled by the software bridge. This is fine, but we can do better. In fact, the dsa_8021q tag used by sja1105 is compatible with cascading. In other words, a sja1105 switch can correctly parse and route a packet containing a dsa_8021q tag. So if we could enable hardware bridging on the Felix DSA master ports, cross-chip bridging could be completely offloaded. Such as system would be used as follows: ip link add dev br0 type bridge && ip link set dev br0 up for port in sw0p0 sw0p1 sw0p2 sw0p3 \ sw1p0 sw1p1 sw1p2 sw1p3 \ sw2p0 sw2p1 sw2p2 sw2p3; do ip link set dev $port master br0 done The above makes switching between ports on the same row be performed in hardware, and between ports on different rows in software. Now assume the Felix switch ports are called swp0, swp1, swp2. By running the following extra commands: ip link add dev br1 type bridge && ip link set dev br1 up for port in swp0 swp1 swp2; do ip link set dev $port master br1 done the CPU no longer sees packets which traverse sja1105 switch boundaries and can be forwarded directly by Felix. The br1 bridge would not be used for any sort of traffic termination. For this to work, we need to give drivers an opportunity to listen for bridging events on DSA trees other than their own, and pass that other tree index as argument. I have made the assumption, for the moment, that the other existing DSA notifiers don't need to be broadcast to other trees. That assumption might turn out to be incorrect. But in the meantime, introduce a dsa_broadcast function, similar in purpose to dsa_port_notify, which is used only by the bridging notifiers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-11 00:37:41 +08:00
int tree_index;
int sw_index;
int port;
};
/* DSA_NOTIFIER_FDB_* */
struct dsa_notifier_fdb_info {
int sw_index;
int port;
const unsigned char *addr;
u16 vid;
};
/* DSA_NOTIFIER_MDB_* */
struct dsa_notifier_mdb_info {
const struct switchdev_obj_port_mdb *mdb;
struct switchdev_trans *trans;
int sw_index;
int port;
};
/* DSA_NOTIFIER_VLAN_* */
struct dsa_notifier_vlan_info {
const struct switchdev_obj_port_vlan *vlan;
struct switchdev_trans *trans;
int sw_index;
int port;
};
net: dsa: configure the MTU for switch ports It is useful be able to configure port policers on a switch to accept frames of various sizes: - Increase the MTU for better throughput from the default of 1500 if it is known that there is no 10/100 Mbps device in the network. - Decrease the MTU to limit the latency of high-priority frames under congestion, or work around various network segments that add extra headers to packets which can't be fragmented. For DSA slave ports, this is mostly a pass-through callback, called through the regular ndo ops and at probe time (to ensure consistency across all supported switches). The CPU port is called with an MTU equal to the largest configured MTU of the slave ports. The assumption is that the user might want to sustain a bidirectional conversation with a partner over any switch port. The DSA master is configured the same as the CPU port, plus the tagger overhead. Since the MTU is by definition L2 payload (sans Ethernet header), it is up to each individual driver to figure out if it needs to do anything special for its frame tags on the CPU port (it shouldn't except in special cases). So the MTU does not contain the tagger overhead on the CPU port. However the MTU of the DSA master, minus the tagger overhead, is used as a proxy for the MTU of the CPU port, which does not have a net device. This is to avoid uselessly calling the .change_mtu function on the CPU port when nothing should change. So it is safe to assume that the DSA master and the CPU port MTUs are apart by exactly the tagger's overhead in bytes. Some changes were made around dsa_master_set_mtu(), function which was now removed, for 2 reasons: - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to do the same thing in DSA - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method That is to say, there's no need for this function in DSA, we can safely call dev_set_mtu() directly, take the rtnl lock when necessary, and just propagate whatever errors get reported (since the user probably wants to be informed). Some inspiration (mainly in the MTU DSA notifier) was taken from a vaguely similar patch from Murali and Florian, who are credited as co-developers down below. Co-developed-by: Murali Krishna Policharla <murali.policharla@broadcom.com> Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com> Co-developed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-28 03:55:42 +08:00
/* DSA_NOTIFIER_MTU */
struct dsa_notifier_mtu_info {
bool propagate_upstream;
int sw_index;
int port;
int mtu;
};
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
struct dsa_slave_priv {
/* Copy of CPU port xmit for faster access in slave transmit hot path */
struct sk_buff * (*xmit)(struct sk_buff *skb,
struct net_device *dev);
dsa: add switch chip cascading support The initial version of the DSA driver only supported a single switch chip per network interface, while DSA-capable switch chips can be interconnected to form a tree of switch chips. This patch adds support for multiple switch chips on a network interface. An example topology for a 16-port device with an embedded CPU is as follows: +-----+ +--------+ +--------+ | |eth0 10| switch |9 10| switch | | CPU +----------+ +-------+ | | | | chip 0 | | chip 1 | +-----+ +---++---+ +---++---+ || || || || ||1000baseT ||1000baseT ||ports 1-8 ||ports 9-16 This requires a couple of interdependent changes in the DSA layer: - The dsa platform driver data needs to be extended: there is still only one netdevice per DSA driver instance (eth0 in the example above), but each of the switch chips in the tree needs its own mii_bus device pointer, MII management bus address, and port name array. (include/net/dsa.h) The existing in-tree dsa users need some small changes to deal with this. (arch/arm) - The DSA and Ethertype DSA tagging modules need to be extended to use the DSA device ID field on receive and demultiplex the packet accordingly, and fill in the DSA device ID field on transmit according to which switch chip the packet is heading to. (net/dsa/tag_{dsa,edsa}.c) - The concept of "CPU port", which is the switch chip port that the CPU is connected to (port 10 on switch chip 0 in the example), needs to be extended with the concept of "upstream port", which is the port on the switch chip that will bring us one hop closer to the CPU (port 10 for both switch chips in the example above). - The dsa platform data needs to specify which ports on which switch chips are links to other switch chips, so that we can enable DSA tagging mode on them. (For inter-switch links, we always use non-EtherType DSA tagging, since it has lower overhead. The CPU link uses dsa or edsa tagging depending on what the 'root' switch chip supports.) This is done by specifying "dsa" for the given port in the port array. - The dsa platform data needs to be extended with information on via which port to reach any given switch chip from any given switch chip. This info is specified via the per-switch chip data struct ->rtable[] array, which gives the nexthop ports for each of the other switches in the tree. For the example topology above, the dsa platform data would look something like this: static struct dsa_chip_data sw[2] = { { .mii_bus = &foo, .sw_addr = 1, .port_names[0] = "p1", .port_names[1] = "p2", .port_names[2] = "p3", .port_names[3] = "p4", .port_names[4] = "p5", .port_names[5] = "p6", .port_names[6] = "p7", .port_names[7] = "p8", .port_names[9] = "dsa", .port_names[10] = "cpu", .rtable = (s8 []){ -1, 9, }, }, { .mii_bus = &foo, .sw_addr = 2, .port_names[0] = "p9", .port_names[1] = "p10", .port_names[2] = "p11", .port_names[3] = "p12", .port_names[4] = "p13", .port_names[5] = "p14", .port_names[6] = "p15", .port_names[7] = "p16", .port_names[10] = "dsa", .rtable = (s8 []){ 10, -1, }, }, }, static struct dsa_platform_data pd = { .netdev = &foo, .nr_switches = 2, .sw = sw, }; Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Gary Thomas <gary@mlbassoc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-20 17:52:09 +08:00
struct pcpu_sw_netstats __percpu *stats64;
net: dsa: add GRO support via gro_cells gro_cells lib is used by different encapsulating netdevices, such as geneve, macsec, vxlan etc. to speed up decapsulated traffic processing. CPU tag is a sort of "encapsulation", and we can use the same mechs to greatly improve overall DSA performance. skbs are passed to the GRO layer after removing CPU tags, so we don't need any new packet offload types as it was firstly proposed by me in the first GRO-over-DSA variant [1]. The size of struct gro_cells is sizeof(void *), so hot struct dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields remain in one 32-byte cacheline. The other positive side effect is that drivers for network devices that can be shipped as CPU ports of DSA-driven switches can now use napi_gro_frags() to pass skbs to kernel. Packets built that way are completely non-linear and are likely being dropped without GRO. This was tested on to-be-mainlined-soon Ethernet driver that uses napi_gro_frags(), and the overall performance was on par with the variant from [1], sometimes even better due to minimal overhead. net.core.gro_normal_batch tuning may help to push it to the limit on particular setups and platforms. iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup on 1.2 GHz MIPS board: 5.7-rc2 baseline: [ID] Interval Transfer Bitrate Retr [ 5] 0.00-120.01 sec 9.00 GBytes 644 Mbits/sec 413 sender [ 5] 0.00-120.00 sec 8.99 GBytes 644 Mbits/sec receiver Iface RX packets TX packets eth0 7097731 7097702 port0 426050 6671829 port1 6671681 425862 port1.218 6671677 425851 With this patch: [ID] Interval Transfer Bitrate Retr [ 5] 0.00-120.01 sec 12.2 GBytes 870 Mbits/sec 122 sender [ 5] 0.00-120.00 sec 12.2 GBytes 870 Mbits/sec receiver Iface RX packets TX packets eth0 9474792 9474777 port0 455200 353288 port1 9019592 455035 port1.218 353144 455024 v2: - Add some performance examples in the commit message; - No functional changes. [1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@dlink.ru/ Signed-off-by: Alexander Lobakin <bloodyreaper@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-21 21:41:08 +08:00
struct gro_cells gcells;
/* DSA port data, such as switch, port index, etc. */
struct dsa_port *dp;
dsa: add switch chip cascading support The initial version of the DSA driver only supported a single switch chip per network interface, while DSA-capable switch chips can be interconnected to form a tree of switch chips. This patch adds support for multiple switch chips on a network interface. An example topology for a 16-port device with an embedded CPU is as follows: +-----+ +--------+ +--------+ | |eth0 10| switch |9 10| switch | | CPU +----------+ +-------+ | | | | chip 0 | | chip 1 | +-----+ +---++---+ +---++---+ || || || || ||1000baseT ||1000baseT ||ports 1-8 ||ports 9-16 This requires a couple of interdependent changes in the DSA layer: - The dsa platform driver data needs to be extended: there is still only one netdevice per DSA driver instance (eth0 in the example above), but each of the switch chips in the tree needs its own mii_bus device pointer, MII management bus address, and port name array. (include/net/dsa.h) The existing in-tree dsa users need some small changes to deal with this. (arch/arm) - The DSA and Ethertype DSA tagging modules need to be extended to use the DSA device ID field on receive and demultiplex the packet accordingly, and fill in the DSA device ID field on transmit according to which switch chip the packet is heading to. (net/dsa/tag_{dsa,edsa}.c) - The concept of "CPU port", which is the switch chip port that the CPU is connected to (port 10 on switch chip 0 in the example), needs to be extended with the concept of "upstream port", which is the port on the switch chip that will bring us one hop closer to the CPU (port 10 for both switch chips in the example above). - The dsa platform data needs to specify which ports on which switch chips are links to other switch chips, so that we can enable DSA tagging mode on them. (For inter-switch links, we always use non-EtherType DSA tagging, since it has lower overhead. The CPU link uses dsa or edsa tagging depending on what the 'root' switch chip supports.) This is done by specifying "dsa" for the given port in the port array. - The dsa platform data needs to be extended with information on via which port to reach any given switch chip from any given switch chip. This info is specified via the per-switch chip data struct ->rtable[] array, which gives the nexthop ports for each of the other switches in the tree. For the example topology above, the dsa platform data would look something like this: static struct dsa_chip_data sw[2] = { { .mii_bus = &foo, .sw_addr = 1, .port_names[0] = "p1", .port_names[1] = "p2", .port_names[2] = "p3", .port_names[3] = "p4", .port_names[4] = "p5", .port_names[5] = "p6", .port_names[6] = "p7", .port_names[7] = "p8", .port_names[9] = "dsa", .port_names[10] = "cpu", .rtable = (s8 []){ -1, 9, }, }, { .mii_bus = &foo, .sw_addr = 2, .port_names[0] = "p9", .port_names[1] = "p10", .port_names[2] = "p11", .port_names[3] = "p12", .port_names[4] = "p13", .port_names[5] = "p14", .port_names[6] = "p15", .port_names[7] = "p16", .port_names[10] = "dsa", .rtable = (s8 []){ 10, -1, }, }, }, static struct dsa_platform_data pd = { .netdev = &foo, .nr_switches = 2, .sw = sw, }; Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Gary Thomas <gary@mlbassoc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-20 17:52:09 +08:00
#ifdef CONFIG_NET_POLL_CONTROLLER
struct netpoll *netpoll;
#endif
/* TC context */
struct list_head mall_tc_list;
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
};
/* dsa.c */
const struct dsa_device_ops *dsa_tag_driver_get(int tag_protocol);
void dsa_tag_driver_put(const struct dsa_device_ops *ops);
bool dsa_schedule_work(struct work_struct *work);
const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
int dsa_legacy_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
struct net_device *dev,
const unsigned char *addr, u16 vid,
u16 flags,
struct netlink_ext_ack *extack);
int dsa_legacy_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
struct net_device *dev,
const unsigned char *addr, u16 vid);
/* master.c */
int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp);
void dsa_master_teardown(struct net_device *dev);
static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
int device, int port)
{
struct dsa_port *cpu_dp = dev->dsa_ptr;
struct dsa_switch_tree *dst = cpu_dp->dst;
struct dsa_port *dp;
list_for_each_entry(dp, &dst->ports, list)
if (dp->ds->index == device && dp->index == port &&
dp->type == DSA_PORT_TYPE_USER)
return dp->slave;
return NULL;
}
/* port.c */
int dsa_port_set_state(struct dsa_port *dp, u8 state,
struct switchdev_trans *trans);
int dsa_port_enable_rt(struct dsa_port *dp, struct phy_device *phy);
int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy);
void dsa_port_disable_rt(struct dsa_port *dp);
void dsa_port_disable(struct dsa_port *dp);
int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br);
void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br);
int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
struct switchdev_trans *trans);
bool dsa_port_skip_vlan_configuration(struct dsa_port *dp);
int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock,
struct switchdev_trans *trans);
net: dsa: configure the MTU for switch ports It is useful be able to configure port policers on a switch to accept frames of various sizes: - Increase the MTU for better throughput from the default of 1500 if it is known that there is no 10/100 Mbps device in the network. - Decrease the MTU to limit the latency of high-priority frames under congestion, or work around various network segments that add extra headers to packets which can't be fragmented. For DSA slave ports, this is mostly a pass-through callback, called through the regular ndo ops and at probe time (to ensure consistency across all supported switches). The CPU port is called with an MTU equal to the largest configured MTU of the slave ports. The assumption is that the user might want to sustain a bidirectional conversation with a partner over any switch port. The DSA master is configured the same as the CPU port, plus the tagger overhead. Since the MTU is by definition L2 payload (sans Ethernet header), it is up to each individual driver to figure out if it needs to do anything special for its frame tags on the CPU port (it shouldn't except in special cases). So the MTU does not contain the tagger overhead on the CPU port. However the MTU of the DSA master, minus the tagger overhead, is used as a proxy for the MTU of the CPU port, which does not have a net device. This is to avoid uselessly calling the .change_mtu function on the CPU port when nothing should change. So it is safe to assume that the DSA master and the CPU port MTUs are apart by exactly the tagger's overhead in bytes. Some changes were made around dsa_master_set_mtu(), function which was now removed, for 2 reasons: - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to do the same thing in DSA - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method That is to say, there's no need for this function in DSA, we can safely call dev_set_mtu() directly, take the rtnl lock when necessary, and just propagate whatever errors get reported (since the user probably wants to be informed). Some inspiration (mainly in the MTU DSA notifier) was taken from a vaguely similar patch from Murali and Florian, who are credited as co-developers down below. Co-developed-by: Murali Krishna Policharla <murali.policharla@broadcom.com> Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com> Co-developed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-28 03:55:42 +08:00
int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
bool propagate_upstream);
int dsa_port_fdb_add(struct dsa_port *dp, const unsigned char *addr,
u16 vid);
int dsa_port_fdb_del(struct dsa_port *dp, const unsigned char *addr,
u16 vid);
int dsa_port_fdb_dump(struct dsa_port *dp, dsa_fdb_dump_cb_t *cb, void *data);
int dsa_port_mdb_add(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb,
struct switchdev_trans *trans);
int dsa_port_mdb_del(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb);
int dsa_port_pre_bridge_flags(const struct dsa_port *dp, unsigned long flags,
struct switchdev_trans *trans);
int dsa_port_bridge_flags(const struct dsa_port *dp, unsigned long flags,
struct switchdev_trans *trans);
int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
struct switchdev_trans *trans);
int dsa_port_vlan_add(struct dsa_port *dp,
const struct switchdev_obj_port_vlan *vlan,
struct switchdev_trans *trans);
int dsa_port_vlan_del(struct dsa_port *dp,
const struct switchdev_obj_port_vlan *vlan);
int dsa_port_link_register_of(struct dsa_port *dp);
void dsa_port_link_unregister_of(struct dsa_port *dp);
extern const struct phylink_mac_ops dsa_port_phylink_mac_ops;
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
/* slave.c */
extern const struct dsa_device_ops notag_netdev_ops;
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
void dsa_slave_mii_bus_init(struct dsa_switch *ds);
int dsa_slave_create(struct dsa_port *dp);
void dsa_slave_destroy(struct net_device *slave_dev);
bool dsa_slave_dev_check(const struct net_device *dev);
int dsa_slave_suspend(struct net_device *slave_dev);
int dsa_slave_resume(struct net_device *slave_dev);
int dsa_slave_register_notifier(void);
void dsa_slave_unregister_notifier(void);
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
static inline struct dsa_port *dsa_slave_to_port(const struct net_device *dev)
{
struct dsa_slave_priv *p = netdev_priv(dev);
return p->dp;
}
static inline struct net_device *
dsa_slave_to_master(const struct net_device *dev)
{
struct dsa_port *dp = dsa_slave_to_port(dev);
return dp->cpu_dp->master;
}
net: dsa: untag the bridge pvid from rx skbs Currently the bridge untags VLANs present in its VLAN groups in __allowed_ingress() only when VLAN filtering is enabled. But when a skb is seen on the RX path as tagged with the bridge's pvid, and that bridge has vlan_filtering=0, and there isn't any 8021q upper with that VLAN either, then we have a problem. The bridge will not untag it (since it is supposed to remain VLAN-unaware), and pvid-tagged communication will be broken. There are 2 situations where we can end up like that: 1. When installing a pvid in egress-tagged mode, like this: ip link add dev br0 type bridge vlan_filtering 0 ip link set swp0 master br0 bridge vlan del dev swp0 vid 1 bridge vlan add dev swp0 vid 1 pvid This happens because DSA configures the VLAN membership of the CPU port using the same flags as swp0 (in this case "pvid and not untagged"), in an attempt to copy the frame as-is from ingress to the CPU. However, in this case, the packet may arrive untagged on ingress, it will be pvid-tagged by the ingress port, and will be sent as egress-tagged towards the CPU. Otherwise stated, the CPU will see a VLAN tag where there was none to speak of on ingress. When vlan_filtering is 1, this is not a problem, as stated in the first paragraph, because __allowed_ingress() will pop it. But currently, when vlan_filtering is 0 and we have such a VLAN configuration, we need an 8021q upper (br0.1) to be able to ping over that VLAN, which is not symmetrical with the vlan_filtering=1 case, and therefore, confusing for users. Basically what DSA attempts to do is simply an approximation: try to copy the skb with (or without) the same VLAN all the way up to the CPU. But DSA drivers treat CPU port VLAN membership in various ways (which is a good segue into situation 2). And some of those drivers simply tell the CPU port to copy the frame unmodified, which is the golden standard when it comes to VLAN processing (therefore, any driver which can configure the hardware to do that, should do that, and discard the VLAN flags requested by DSA on the CPU port). 2. Some DSA drivers always configure the CPU port as egress-tagged, in an attempt to recover the classified VLAN from the skb. These drivers cannot work at all with untagged traffic when bridged in vlan_filtering=0 mode. And they can't go for the easy "just keep the pvid as egress-untagged towards the CPU" route, because each front port can have its own pvid, and that might require conflicting VLAN membership settings on the CPU port (swp1 is pvid for VID 1 and egress-tagged for VID 2; swp2 is egress-taggeed for VID 1 and pvid for VID 2; with this simplistic approach, the CPU port, which is really a separate hardware entity and has its own VLAN membership settings, would end up being egress-untagged in both VID 1 and VID 2, therefore losing the VLAN tags of ingress traffic). So the only thing we can do is to create a helper function for resolving the problematic case (that is, a function which untags the bridge pvid when that is in vlan_filtering=0 mode), which taggers in need should call. It isn't called from the generic DSA receive path because there are drivers that fall neither in the first nor second category. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 05:40:37 +08:00
/* If under a bridge with vlan_filtering=0, make sure to send pvid-tagged
* frames as untagged, since the bridge will not untag them.
*/
static inline struct sk_buff *dsa_untag_bridge_pvid(struct sk_buff *skb)
{
struct dsa_port *dp = dsa_slave_to_port(skb->dev);
struct net_device *br = dp->bridge_dev;
struct net_device *dev = skb->dev;
struct net_device *upper_dev;
u16 vid, pvid, proto;
int err;
if (!br || br_vlan_enabled(br))
return skb;
err = br_vlan_get_proto(br, &proto);
if (err)
return skb;
/* Move VLAN tag from data to hwaccel */
if (!skb_vlan_tag_present(skb) && skb->protocol == htons(proto)) {
net: dsa: untag the bridge pvid from rx skbs Currently the bridge untags VLANs present in its VLAN groups in __allowed_ingress() only when VLAN filtering is enabled. But when a skb is seen on the RX path as tagged with the bridge's pvid, and that bridge has vlan_filtering=0, and there isn't any 8021q upper with that VLAN either, then we have a problem. The bridge will not untag it (since it is supposed to remain VLAN-unaware), and pvid-tagged communication will be broken. There are 2 situations where we can end up like that: 1. When installing a pvid in egress-tagged mode, like this: ip link add dev br0 type bridge vlan_filtering 0 ip link set swp0 master br0 bridge vlan del dev swp0 vid 1 bridge vlan add dev swp0 vid 1 pvid This happens because DSA configures the VLAN membership of the CPU port using the same flags as swp0 (in this case "pvid and not untagged"), in an attempt to copy the frame as-is from ingress to the CPU. However, in this case, the packet may arrive untagged on ingress, it will be pvid-tagged by the ingress port, and will be sent as egress-tagged towards the CPU. Otherwise stated, the CPU will see a VLAN tag where there was none to speak of on ingress. When vlan_filtering is 1, this is not a problem, as stated in the first paragraph, because __allowed_ingress() will pop it. But currently, when vlan_filtering is 0 and we have such a VLAN configuration, we need an 8021q upper (br0.1) to be able to ping over that VLAN, which is not symmetrical with the vlan_filtering=1 case, and therefore, confusing for users. Basically what DSA attempts to do is simply an approximation: try to copy the skb with (or without) the same VLAN all the way up to the CPU. But DSA drivers treat CPU port VLAN membership in various ways (which is a good segue into situation 2). And some of those drivers simply tell the CPU port to copy the frame unmodified, which is the golden standard when it comes to VLAN processing (therefore, any driver which can configure the hardware to do that, should do that, and discard the VLAN flags requested by DSA on the CPU port). 2. Some DSA drivers always configure the CPU port as egress-tagged, in an attempt to recover the classified VLAN from the skb. These drivers cannot work at all with untagged traffic when bridged in vlan_filtering=0 mode. And they can't go for the easy "just keep the pvid as egress-untagged towards the CPU" route, because each front port can have its own pvid, and that might require conflicting VLAN membership settings on the CPU port (swp1 is pvid for VID 1 and egress-tagged for VID 2; swp2 is egress-taggeed for VID 1 and pvid for VID 2; with this simplistic approach, the CPU port, which is really a separate hardware entity and has its own VLAN membership settings, would end up being egress-untagged in both VID 1 and VID 2, therefore losing the VLAN tags of ingress traffic). So the only thing we can do is to create a helper function for resolving the problematic case (that is, a function which untags the bridge pvid when that is in vlan_filtering=0 mode), which taggers in need should call. It isn't called from the generic DSA receive path because there are drivers that fall neither in the first nor second category. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 05:40:37 +08:00
skb = skb_vlan_untag(skb);
if (!skb)
return NULL;
}
if (!skb_vlan_tag_present(skb))
return skb;
vid = skb_vlan_tag_get_id(skb);
/* We already run under an RCU read-side critical section since
* we are called from netif_receive_skb_list_internal().
*/
err = br_vlan_get_pvid_rcu(dev, &pvid);
if (err)
return skb;
if (vid != pvid)
return skb;
/* The sad part about attempting to untag from DSA is that we
* don't know, unless we check, if the skb will end up in
* the bridge's data path - br_allowed_ingress() - or not.
* For example, there might be an 8021q upper for the
* default_pvid of the bridge, which will steal VLAN-tagged traffic
* from the bridge's data path. This is a configuration that DSA
* supports because vlan_filtering is 0. In that case, we should
* definitely keep the tag, to make sure it keeps working.
*/
upper_dev = __vlan_find_dev_deep_rcu(br, htons(proto), vid);
if (upper_dev)
return skb;
net: dsa: untag the bridge pvid from rx skbs Currently the bridge untags VLANs present in its VLAN groups in __allowed_ingress() only when VLAN filtering is enabled. But when a skb is seen on the RX path as tagged with the bridge's pvid, and that bridge has vlan_filtering=0, and there isn't any 8021q upper with that VLAN either, then we have a problem. The bridge will not untag it (since it is supposed to remain VLAN-unaware), and pvid-tagged communication will be broken. There are 2 situations where we can end up like that: 1. When installing a pvid in egress-tagged mode, like this: ip link add dev br0 type bridge vlan_filtering 0 ip link set swp0 master br0 bridge vlan del dev swp0 vid 1 bridge vlan add dev swp0 vid 1 pvid This happens because DSA configures the VLAN membership of the CPU port using the same flags as swp0 (in this case "pvid and not untagged"), in an attempt to copy the frame as-is from ingress to the CPU. However, in this case, the packet may arrive untagged on ingress, it will be pvid-tagged by the ingress port, and will be sent as egress-tagged towards the CPU. Otherwise stated, the CPU will see a VLAN tag where there was none to speak of on ingress. When vlan_filtering is 1, this is not a problem, as stated in the first paragraph, because __allowed_ingress() will pop it. But currently, when vlan_filtering is 0 and we have such a VLAN configuration, we need an 8021q upper (br0.1) to be able to ping over that VLAN, which is not symmetrical with the vlan_filtering=1 case, and therefore, confusing for users. Basically what DSA attempts to do is simply an approximation: try to copy the skb with (or without) the same VLAN all the way up to the CPU. But DSA drivers treat CPU port VLAN membership in various ways (which is a good segue into situation 2). And some of those drivers simply tell the CPU port to copy the frame unmodified, which is the golden standard when it comes to VLAN processing (therefore, any driver which can configure the hardware to do that, should do that, and discard the VLAN flags requested by DSA on the CPU port). 2. Some DSA drivers always configure the CPU port as egress-tagged, in an attempt to recover the classified VLAN from the skb. These drivers cannot work at all with untagged traffic when bridged in vlan_filtering=0 mode. And they can't go for the easy "just keep the pvid as egress-untagged towards the CPU" route, because each front port can have its own pvid, and that might require conflicting VLAN membership settings on the CPU port (swp1 is pvid for VID 1 and egress-tagged for VID 2; swp2 is egress-taggeed for VID 1 and pvid for VID 2; with this simplistic approach, the CPU port, which is really a separate hardware entity and has its own VLAN membership settings, would end up being egress-untagged in both VID 1 and VID 2, therefore losing the VLAN tags of ingress traffic). So the only thing we can do is to create a helper function for resolving the problematic case (that is, a function which untags the bridge pvid when that is in vlan_filtering=0 mode), which taggers in need should call. It isn't called from the generic DSA receive path because there are drivers that fall neither in the first nor second category. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 05:40:37 +08:00
__vlan_hwaccel_clear_tag(skb);
return skb;
}
/* switch.c */
int dsa_switch_register_notifier(struct dsa_switch *ds);
void dsa_switch_unregister_notifier(struct dsa_switch *ds);
net: dsa: implement auto-normalization of MTU for bridge hardware datapath Many switches don't have an explicit knob for configuring the MTU (maximum transmission unit per interface). Instead, they do the length-based packet admission checks on the ingress interface, for reasons that are easy to understand (why would you accept a packet in the queuing subsystem if you know you're going to drop it anyway). So it is actually the MRU that these switches permit configuring. In Linux there only exists the IFLA_MTU netlink attribute and the associated dev_set_mtu function. The comments like to play blind and say that it's changing the "maximum transfer unit", which is to say that there isn't any directionality in the meaning of the MTU word. So that is the interpretation that this patch is giving to things: MTU == MRU. When 2 interfaces having different MTUs are bridged, the bridge driver MTU auto-adjustment logic kicks in: what br_mtu_auto_adjust() does is it adjusts the MTU of the bridge net device itself (and not that of the slave net devices) to the minimum value of all slave interfaces, in order for forwarded packets to not exceed the MTU regardless of the interface they are received and send on. The idea behind this behavior, and why the slave MTUs are not adjusted, is that normal termination from Linux over the L2 forwarding domain should happen over the bridge net device, which _is_ properly limited by the minimum MTU. And termination over individual slave devices is possible even if those are bridged. But that is not "forwarding", so there's no reason to do normalization there, since only a single interface sees that packet. The problem with those switches that can only control the MRU is with the offloaded data path, where a packet received on an interface with MRU 9000 would still be forwarded to an interface with MRU 1500. And the br_mtu_auto_adjust() function does not really help, since the MTU configured on the bridge net device is ignored. In order to enforce the de-facto MTU == MRU rule for these switches, we need to do MTU normalization, which means: in order for no packet larger than the MTU configured on this port to be sent, then we need to limit the MRU on all ports that this packet could possibly come from. AKA since we are configuring the MRU via MTU, it means that all ports within a bridge forwarding domain should have the same MTU. And that is exactly what this patch is trying to do. >From an implementation perspective, we try to follow the intent of the user, otherwise there is a risk that we might livelock them (they try to change the MTU on an already-bridged interface, but we just keep changing it back in an attempt to keep the MTU normalized). So the MTU that the bridge is normalized to is either: - The most recently changed one: ip link set dev swp0 master br0 ip link set dev swp1 master br0 ip link set dev swp0 mtu 1400 This sequence will make swp1 inherit MTU 1400 from swp0. - The one of the most recently added interface to the bridge: ip link set dev swp0 master br0 ip link set dev swp1 mtu 1400 ip link set dev swp1 master br0 The above sequence will make swp0 inherit MTU 1400 as well. Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-28 03:55:43 +08:00
/* dsa2.c */
extern struct list_head dsa_tree_list;
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
#endif