linux

Commit Graph

Author	SHA1	Message	Date
Vladimir Oltean	b7a9e0da2d	net: switchdev: remove vid_begin -> vid_end range from VLAN objects The call path of a switchdev VLAN addition to the bridge looks something like this today: nbp_vlan_init \| __br_vlan_set_default_pvid \| \| \| \| \| br_afspec \| \| \| \| \| \| \| v \| \| \| br_process_vlan_info \| \| \| \| \| \| \| v \| \| \| br_vlan_info \| \| \| / \ / \| \| / \ / \| \| / \ / \| \| / \ / v v v v v nbp_vlan_add br_vlan_add ------+ \| ^ ^ \| \| \| / \| \| \| \| / / / \| \ br_vlan_get_master/ / v \ ^ / / br_vlan_add_existing \ \| / / \| \ \| / / / \ \| / / / \ \| / / / \ \| / / / v \| \| v / __vlan_add / / \| / / \| / v \| / __vlan_vid_add \| / \ \| / v v v br_switchdev_port_vlan_add The ranges UAPI was introduced to the bridge in commit `bdced7ef78` ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec) have always been passed one by one, through struct bridge_vlan_info tmp_vinfo, to br_vlan_info. So the range never went too far in depth. Then Scott Feldman introduced the switchdev_port_bridge_setlink function in commit `47f8328bb1` ("switchdev: add new switchdev bridge setlink"). That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made full use of the range. But switchdev_port_bridge_setlink was called like this: br_setlink -> br_afspec -> switchdev_port_bridge_setlink Basically, the switchdev and the bridge code were not tightly integrated. Then commit `41c498b935` ("bridge: restore br_setlink back to original") came, and switchdev drivers were required to implement .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while. In the meantime, commits such as `0944d6b5a2` ("bridge: try switchdev op first in __vlan_vid_add/del") finally made switchdev penetrate the br_vlan_info() barrier and start to develop the call path we have today. But remember, br_vlan_info() still receives VLANs one by one. Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit `29ab586c3d` ("net: switchdev: Remove bridge bypass support from switchdev") so that drivers would not implement .ndo_bridge_setlink any longer. The switchdev_port_bridge_setlink also got deleted. This refactoring removed the parallel bridge_setlink implementation from switchdev, and left the only switchdev VLAN objects to be the ones offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add (the latter coming from commit `9c86ce2c1a` ("net: bridge: Notify about bridge VLANs")). That is to say, today the switchdev VLAN object ranges are not used in the kernel. Refactoring the above call path is a bit complicated, when the bridge VLAN call path is already a bit complicated. Let's go off and finish the job of commit `29ab586c3d` by deleting the bogus iteration through the VLAN ranges from the drivers. Some aspects of this feature never made too much sense in the first place. For example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID flag supposed to mean, when a port can obviously have a single pvid? This particular configuration _is_ denied as of commit `6623c60dc2` ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API perspective, the driver still has to play pretend, and only offload the vlan->vid_end as pvid. And the addition of a switchdev VLAN object can modify the flags of another, completely unrelated, switchdev VLAN object! (a VLAN that is PVID will invalidate the PVID flag from whatever other VLAN had previously been offloaded with switchdev and had that flag. Yet switchdev never notifies about that change, drivers are supposed to guess). Nonetheless, having a VLAN range in the API makes error handling look scarier than it really is - unwinding on errors and all of that. When in reality, no one really calls this API with more than one VLAN. It is all unnecessary complexity. And despite appearing pretentious (two-phase transactional model and all), the switchdev API is really sloppy because the VLAN addition and removal operations are not paired with one another (you can add a VLAN 100 times and delete it just once). The bridge notifies through switchdev of a VLAN addition not only when the flags of an existing VLAN change, but also when nothing changes. There are switchdev drivers out there who don't like adding a VLAN that has already been added, and those checks don't really belong at driver level. But the fact that the API contains ranges is yet another factor that prevents this from being addressed in the future. Of the existing switchdev pieces of hardware, it appears that only Mellanox Spectrum supports offloading more than one VLAN at a time, through mlxsw_sp_port_vlan_set. I have kept that code internal to the driver, because there is some more bookkeeping that makes use of it, but I deleted it from the switchdev API. But since the switchdev support for ranges has already been de facto deleted by a Mellanox employee and nobody noticed for 4 years, I'm going to assume it's not a biggie. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-11 16:00:56 -08:00
Vladimir Oltean	4b9c935898	net: dsa: dsa_legacy_fdb_{add,del} can be static Introduced in commit `37b8da1a3c` ("net: dsa: Move FDB add/del implementation inside DSA") in net/dsa/legacy.c, these functions were moved again to slave.c as part of commit `2a93c1a365` ("net: dsa: Allow compiling out legacy support"), before actually deleting net/dsa/slave.c in `93e86b3bc8` ("net: dsa: Remove legacy probing support"). Along with that movement there should have been a deletion of the prototypes from dsa_priv.h, they are not useful. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210108233054.1222278-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-09 18:05:52 -08:00
Vladimir Oltean	1dbb130281	net: dsa: remove the DSA specific notifiers This effectively reverts commit `60724d4bae` ("net: dsa: Add support for DSA specific notifiers"). The reason is that since commit `2f1e8ea726` ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), it appears that there is a generic way to achieve the same purpose. The only user thus far, the Broadcom SYSTEMPORT driver, was converted to use the generic notifiers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:42:07 -08:00
Vladimir Oltean	a5e3c9ba92	net: dsa: export dsa_slave_dev_check Using the NETDEV_CHANGEUPPER notifications, drivers can be aware when they are enslaved to e.g. a bridge by calling netif_is_bridge_master(). Export this helper from DSA to get the equivalent functionality of determining whether the upper interface of a CHANGEUPPER notifier is a DSA switch interface or not. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:42:07 -08:00
Vladimir Oltean	f46b9b8ee8	net: dsa: move the Broadcom tag information in a separate header file It is a bit strange to see something as specific as Broadcom SYSTEMPORT bits in the main DSA include file. Move these away into a separate header, and have the tagger and the SYSTEMPORT driver include them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:42:07 -08:00
Vladimir Oltean	d5f19486ce	net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors Some DSA switches (and not only) cannot learn source MAC addresses from packets injected from the CPU. They only perform hardware address learning from inbound traffic. This can be problematic when we have a bridge spanning some DSA switch ports and some non-DSA ports (which we'll call "foreign interfaces" from DSA's perspective). There are 2 classes of problems created by the lack of learning on CPU-injected traffic: - excessive flooding, due to the fact that DSA treats those addresses as unknown - the risk of stale routes, which can lead to temporary packet loss To illustrate the second class, consider the following situation, which is common in production equipment (wireless access points, where there is a WLAN interface and an Ethernet switch, and these form a single bridging domain). AP 1: +------------------------------------------------------------------------+ \| br0 \| +------------------------------------------------------------------------+ +------------+ +------------+ +------------+ +------------+ +------------+ \| swp0 \| \| swp1 \| \| swp2 \| \| swp3 \| \| wlan0 \| +------------+ +------------+ +------------+ +------------+ +------------+ \| ^ ^ \| \| \| \| \| \| \| Client A Client B \| \| \| +------------+ +------------+ +------------+ +------------+ +------------+ \| swp0 \| \| swp1 \| \| swp2 \| \| swp3 \| \| wlan0 \| +------------+ +------------+ +------------+ +------------+ +------------+ +------------------------------------------------------------------------+ \| br0 \| +------------------------------------------------------------------------+ AP 2 - br0 of AP 1 will know that Clients A and B are reachable via wlan0 - the hardware fdb of a DSA switch driver today is not kept in sync with the software entries on other bridge ports, so it will not know that clients A and B are reachable via the CPU port UNLESS the hardware switch itself performs SA learning from traffic injected from the CPU. Nonetheless, a substantial number of switches don't. - the hardware fdb of the DSA switch on AP 2 may autonomously learn that Client A and B are reachable through swp0. Therefore, the software br0 of AP 2 also may or may not learn this. In the example we're illustrating, some Ethernet traffic has been going on, and br0 from AP 2 has indeed learnt that it can reach Client B through swp0. One of the wireless clients, say Client B, disconnects from AP 1 and roams to AP 2. The topology now looks like this: AP 1: +------------------------------------------------------------------------+ \| br0 \| +------------------------------------------------------------------------+ +------------+ +------------+ +------------+ +------------+ +------------+ \| swp0 \| \| swp1 \| \| swp2 \| \| swp3 \| \| wlan0 \| +------------+ +------------+ +------------+ +------------+ +------------+ \| ^ \| \| \| Client A \| \| \| Client B \| \| \| v +------------+ +------------+ +------------+ +------------+ +------------+ \| swp0 \| \| swp1 \| \| swp2 \| \| swp3 \| \| wlan0 \| +------------+ +------------+ +------------+ +------------+ +------------+ +------------------------------------------------------------------------+ \| br0 \| +------------------------------------------------------------------------+ AP 2 - br0 of AP 1 still knows that Client A is reachable via wlan0 (no change) - br0 of AP 1 will (possibly) know that Client B has left wlan0. There are cases where it might never find out though. Either way, DSA today does not process that notification in any way. - the hardware FDB of the DSA switch on AP 1 may learn autonomously that Client B can be reached via swp0, if it receives any packet with Client 1's source MAC address over Ethernet. - the hardware FDB of the DSA switch on AP 2 still thinks that Client B can be reached via swp0. It does not know that it has roamed to wlan0, because it doesn't perform SA learning from the CPU port. Now Client A contacts Client B. AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet segment. AP 2 sees a frame on swp0 and its fdb says that the destination is swp0. Hairpinning is disabled => drop. This problem comes from the fact that these switches have a 'blind spot' for addresses coming from software bridging. The generic solution is not to assume that hardware learning can be enabled somehow, but to listen to more bridge learning events. It turns out that the bridge driver does learn in software from all inbound frames, in __br_handle_local_finish. A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the addresses serviced by the bridge on 'foreign' interfaces. The software bridge also does the right thing on migration, by notifying that the old entry is deleted, so that does not need to be special-cased in DSA. When it is deleted, we just need to delete our static FDB entry towards the CPU too, and wait. The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE events received on its own interfaces, such as static FDB entries. Luckily we can change that, and DSA can listen to all switchdev FDB add/del events in the system and figure out if those events were emitted by a bridge that spans at least one of DSA's own ports. In case that is true, DSA will also offload that address towards its own CPU port, in the eventuality that there might be bridge clients attached to the DSA switch who want to talk to the station connected to the foreign interface. In terms of implementation, we need to keep the fdb_info->added_by_user check for the case where the switchdev event was targeted directly at a DSA switch port. But we don't need to look at that flag for snooped events. So the check is currently too late, we need to move it earlier. This also simplifies the code a bit, since we avoid uselessly allocating and freeing switchdev_work. We could probably do some improvements in the future. For example, multi-bridge support is rudimentary at the moment. If there are two bridges spanning a DSA switch's ports, and both of them need to service the same MAC address, then what will happen is that the migration of one of those stations will trigger the deletion of the FDB entry from the CPU port while it is still used by other bridge. That could be improved with reference counting but is left for another time. This behavior needs to be enabled at driver level by setting ds->assisted_learning_on_cpu_port = true. This is because we don't want to inflict a potential performance penalty (accesses through MDIO/I2C/SPI are expensive) to hardware that really doesn't need it because address learning on the CPU port works there. Reported-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:46 -08:00
Vladimir Oltean	5fb4a451a8	net: dsa: exit early in dsa_slave_switchdev_event if we can't program the FDB Right now, the following would happen for a switch driver that does not implement .port_fdb_add or .port_fdb_del. dsa_slave_switchdev_event returns NOTIFY_OK and schedules: -> dsa_slave_switchdev_event_work -> dsa_port_fdb_add -> dsa_port_notify(DSA_NOTIFIER_FDB_ADD) -> dsa_switch_fdb_add -> if (!ds->ops->port_fdb_add) return -EOPNOTSUPP; -> an error is printed with dev_dbg, and dsa_fdb_offload_notify(switchdev_work) is not called. We can avoid scheduling the worker for nothing and say NOTIFY_DONE. Because we don't call dsa_fdb_offload_notify, the static FDB entry will remain just in the software bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:46 -08:00
Vladimir Oltean	447d290a58	net: dsa: move switchdev event implementation under the same switch/case statement We'll need to start listening to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events even for interfaces where dsa_slave_dev_check returns false, so we need that check inside the switch-case statement for SWITCHDEV_FDB_*. This movement also avoids a useless allocation / free of switchdev_work on the untreated "default event" case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:46 -08:00
Vladimir Oltean	c4bb76a9a0	net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work Currently DSA doesn't add FDB entries on the CPU port, because it only does so through switchdev, which is associated with a net_device, and there are none of those for the CPU port. But actually FDB addresses on the CPU port have some use cases of their own, if the switchdev operations are initiated from within the DSA layer. There is just one problem with the existing code: it passes a structure in dsa_switchdev_event_work which was retrieved directly from switchdev, so it contains a net_device. We need to generalize the contents to something that covers the CPU port as well: the "ds, port" tuple is fine for that. Note that the new procedure for notifying the successful FDB offload is inspired from the rocker model. Also, nothing was being done if added_by_user was false. Let's check for that a lot earlier, and don't actually bother to schedule the worker for nothing. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:45 -08:00
Vladimir Oltean	2fd186501b	net: dsa: be louder when a non-legacy FDB operation fails The dev_close() call was added in commit `c9eb3e0f87` ("net: dsa: Add support for learning FDB through notification") "to indicate inconsistent situation" when we could not delete an FDB entry from the port. bridge fdb del d8:58:d7:00:ca:6d dev swp0 self master It is a bit drastic and at the same time not helpful if the above fails to only print with netdev_dbg log level, but on the other hand to bring the interface down. So increase the verbosity of the error message, and drop dev_close(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:45 -08:00
Rafał Miłecki	8209f5bc3b	net: dsa: print error on invalid port index Looking for an -EINVAL all over the dsa code could take hours for inexperienced DSA users. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210106090915.21439-1-zajec5@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-06 16:21:08 -08:00
Rasmus Villemoes	bdc40a3f4b	net: dsa: print the MTU value that could not be set These warnings become somewhat more informative when they include the MTU value that could not be set and not just the errno. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Link: https://lore.kernel.org/r/20201205133944.10182-1-rasmus.villemoes@prevas.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-08 11:24:07 -08:00
Kurt Kanzenbach	8551fad63c	net: dsa: tag_hellcreek: Cleanup includes Remove unused and add needed includes. No functional change. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 16:57:21 -08:00
Christian Eggers	30abc9cd9c	net: dsa: avoid potential use-after-free error If dsa_switch_ops::port_txtstamp() returns false, clone will be freed immediately. Shouldn't store a pointer to freed memory. Signed-off-by: Christian Eggers <ceggers@arri.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201119110906.25558-1-ceggers@arri.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:02:50 -08:00
Tobias Waldekranz	13f49b6f26	net: dsa: tag_dsa: Use a consistent comment style Use a consistent style of one-line/multi-line comments throughout the file. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-17 09:16:12 -08:00
Tobias Waldekranz	469ee5fe73	net: dsa: tag_dsa: Unify regular and ethertype DSA taggers Ethertype DSA encodes exactly the same information in the DSA tag as the non-ethertype variety. So refactor out the common parts and reuse them for both protocols. This is ensures tag parsing and generation is always consistent across all mv88e6xxx chips. While we are at it, explicitly deal with all possible CPU codes on receive, making sure to set offload_fwd_mark as appropriate. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-17 09:16:12 -08:00
Tobias Waldekranz	e468d141b9	net: dsa: tag_dsa: Allow forwarding of redirected IGMP traffic When receiving an IGMP/MLD frame with a TO_CPU tag, the switch has not performed any forwarding of it. This means that we should not set the offload_fwd_mark on the skb, in case a software bridge wants it forwarded. This is a port of: `1ed9ec9b08` ("dsa: Allow forwarding of redirected IGMP traffic") Which corrected the issue for chips using EDSA tags, but not for those using regular DSA tags. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-17 09:16:11 -08:00
Heiner Kallweit	6a90062879	net: dsa: use net core stats64 handling Use netdev->tstats instead of a member of dsa_slave_priv for storing a pointer to the per-cpu counters. This allows us to use core functionality for statistics handling. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-09 17:50:27 -08:00
Vladimir Oltean	e358bef7c3	net: dsa: Give drivers the chance to veto certain upper devices Some switches rely on unique pvids to ensure port separation in standalone mode, because they don't have a port forwarding matrix configurable in hardware. So, setups like a group of 2 uppers with the same VLAN, swp0.100 and swp1.100, will cause traffic tagged with VLAN 100 to be autonomously forwarded between these switch ports, in spite of there being no bridge between swp0 and swp1. These drivers need to prevent this from happening. They need to have VLAN filtering enabled in standalone mode (so they'll drop frames tagged with unknown VLANs) and they can only accept an 8021q upper on a port as long as it isn't installed on any other port too. So give them the chance to veto bad user requests. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> [Kurt: Pass info instead of ptr] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-05 14:04:49 -08:00
Kurt Kanzenbach	01ef09caad	net: dsa: Add tag handling for Hirschmann Hellcreek switches The Hirschmann Hellcreek TSN switches have a special tagging protocol for frames exchanged between the CPU port and the master interface. The format is a one byte trailer indicating the destination or origin port. It's quite similar to the Micrel KSZ tagging. That's why the implementation is based on that code. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-05 14:04:49 -08:00
Vladimir Oltean	86c4ad9a78	net: dsa: tag_ar9331: let DSA core deal with TX reallocation Now that we have a central TX reallocation procedure that accounts for the tagger's needed headroom in a generic way, we can remove the skb_cow_head call. Cc: Per Forlin <per.forlin@axis.com> Cc: Oleksij Rempel <linux@rempel-privat.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Oleksij Rempel <linux@rempel-privat.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:17 -08:00
Vladimir Oltean	9b9826ae11	net: dsa: tag_gswip: let DSA core deal with TX reallocation Now that we have a central TX reallocation procedure that accounts for the tagger's needed headroom in a generic way, we can remove the skb_cow_head call. This one is interesting, the DSA tag is 8 bytes on RX and 4 bytes on TX. Because DSA is unaware of asymmetrical tag lengths, the overhead/needed headroom is declared as 8 bytes and therefore 4 bytes larger than it needs to be. If this becomes a problem, and the GSWIP driver can't be converted to a uniform header length, we might need to make DSA aware of separate RX/TX overhead values. Cc: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:16 -08:00
Vladimir Oltean	952a063450	net: dsa: tag_dsa: let DSA core deal with TX reallocation Now that we have a central TX reallocation procedure that accounts for the tagger's needed headroom in a generic way, we can remove the skb_cow_head call. Similar to the EtherType DSA tagger, the old Marvell tagger can transform an 802.1Q header if present into a DSA tag, so there is no headroom required in that case. But we are ensuring that it exists, regardless (practically speaking, the headroom must be 4 bytes larger than it needs to be). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:16 -08:00
Vladimir Oltean	2f0d030c5f	net: dsa: tag_brcm: let DSA core deal with TX reallocation Now that we have a central TX reallocation procedure that accounts for the tagger's needed headroom in a generic way, we can remove the skb_cow_head call. Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:16 -08:00
Vladimir Oltean	c6c4e1237d	net: dsa: tag_edsa: let DSA core deal with TX reallocation Now that we have a central TX reallocation procedure that accounts for the tagger's needed headroom in a generic way, we can remove the skb_cow_head call. Note that the VLAN code path needs a smaller extra headroom than the regular EtherType DSA path. That isn't a problem, because this tagger declares the larger tag length (8 bytes vs 4) as the protocol overhead, so we are covered in both cases. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:16 -08:00
Vladimir Oltean	6ed94135f5	net: dsa: tag_lan9303: let DSA core deal with TX reallocation Now that we have a central TX reallocation procedure that accounts for the tagger's needed headroom in a generic way, we can remove the skb_cow_head call. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:16 -08:00
Vladimir Oltean	941f66beb7	net: dsa: tag_mtk: let DSA core deal with TX reallocation Now that we have a central TX reallocation procedure that accounts for the tagger's needed headroom in a generic way, we can remove the skb_cow_head call. Cc: DENG Qingfang <dqfext@gmail.com> Cc: Sean Wang <sean.wang@mediatek.com> Cc: John Crispin <john@phrozen.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:16 -08:00
Vladimir Oltean	9c5c3bd005	net: dsa: tag_ocelot: let DSA core deal with TX reallocation Now that we have a central TX reallocation procedure that accounts for the tagger's needed headroom in a generic way, we can remove the skb_cow_head call. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:16 -08:00
Vladimir Oltean	9bbda29ae1	net: dsa: tag_qca: let DSA core deal with TX reallocation Now that we have a central TX reallocation procedure that accounts for the tagger's needed headroom in a generic way, we can remove the skb_cow_head call. Cc: John Crispin <john@phrozen.org> Cc: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:16 -08:00
Christian Eggers	ef3f72fee2	net: dsa: trailer: don't allocate additional memory for padding/tagging The caller (dsa_slave_xmit) guarantees that the frame length is at least ETH_ZLEN and that enough memory for tail tagging is available. Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:16 -08:00
Christian Eggers	88fda8eefd	net: dsa: tag_ksz: don't allocate additional memory for padding/tagging The caller (dsa_slave_xmit) guarantees that the frame length is at least ETH_ZLEN and that enough memory for tail tagging is available. Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:16 -08:00
Vladimir Oltean	a3b0b64797	net: dsa: implement a central TX reallocation procedure At the moment, taggers are left with the task of ensuring that the skb headers are writable (which they aren't, if the frames were cloned for TX timestamping, for flooding by the bridge, etc), and that there is enough space in the skb data area for the DSA tag to be pushed. Moreover, the life of tail taggers is even harder, because they need to ensure that short frames have enough padding, a problem that normal taggers don't have. The principle of the DSA framework is that everything except for the most intimate hardware specifics (like in this case, the actual packing of the DSA tag bits) should be done inside the core, to avoid having code paths that are very rarely tested. So provide a TX reallocation procedure that should cover the known needs of DSA today. Note that this patch also gives the network stack a good hint about the headroom/tailroom it's going to need. Up till now it wasn't doing that. So the reallocation procedure should really be there only for the exceptional cases, and for cloned packets which need to be unshared. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Christian Eggers <ceggers@arri.de> # For tail taggers only Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:41:15 -08:00
Christian Eggers	bc7e343dbd	net: dsa: tag_ksz: KSZ8795 and KSZ9477 also use tail tags The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and KSZ9893 switches also use tail tags. Fixes: `7a6ffe764b` ("net: dsa: point out the tail taggers") Signed-off-by: Christian Eggers <ceggers@arri.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201016171603.10587-1-ceggers@arri.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-19 17:32:50 -07:00
Heiner Kallweit	a0d2698101	net: dsa: use new function dev_fetch_sw_netstats Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/b6047017-8226-6b7e-a3cd-064e69fdfa27@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-13 17:33:49 -07:00
Vladimir Oltean	ea440cd2d9	net: dsa: tag_ocelot: use VLAN information from tagging header when available When the Extraction Frame Header contains a valid classified VLAN, use that instead of the VLAN header present in the packet. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-11 11:19:04 -07:00
Vladimir Oltean	2e554a7a5d	net: dsa: propagate switchdev vlan_filtering prepare phase to drivers A driver may refuse to enable VLAN filtering for any reason beyond what the DSA framework cares about, such as: - having tc-flower rules that rely on the switch being VLAN-aware - the particular switch does not support VLAN, even if the driver does (the DSA framework just checks for the presence of the .port_vlan_add and .port_vlan_del pointers) - simply not supporting this configuration to be toggled at runtime Currently, when a driver rejects a configuration it cannot support, it does this from the commit phase, which triggers various warnings in switchdev. So propagate the prepare phase to drivers, to give them the ability to refuse invalid configurations cleanly and avoid the warnings. Since we need to modify all function prototypes and check for the prepare phase from within the drivers, take that opportunity and move the existing driver restrictions within the prepare phase where that is possible and easy. Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: Woojung Huh <woojung.huh@microchip.com> Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com> Cc: Sean Wang <sean.wang@mediatek.com> Cc: Landen Chao <Landen.Chao@mediatek.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Jonathan McDowell <noodles@earth.li> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-05 05:56:48 -07:00
Andrew Lunn	08156ba430	net: dsa: Add devlink port regions support to DSA Allow DSA drivers to make use of devlink port regions, via simple wrappers. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:38:53 -07:00
Andrew Lunn	3122433eb5	net: dsa: Register devlink ports before calling DSA driver setup() DSA drivers want to create regions on devlink ports as well as the devlink device instance, in order to export registers and other tables per port. To keep all this code together in the drivers, have the devlink ports registered early, so the setup() method can setup both device and port devlink regions. v3: Remove dp->setup Move common code out of switch statement. Fix wrong goto Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:38:53 -07:00
Andrew Lunn	f15ec13a96	net: dsa: Make use of devlink port flavour unused If a port is unused, still create a devlink port for it, but set the flavour to unused. This allows us to attach devlink regions to the port, etc. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:38:52 -07:00
Florian Fainelli	3a68844dd2	net: dsa: Utilize __vlan_find_dev_deep_rcu() Now that we are guaranteed that dsa_untag_bridge_pvid() is called after eth_type_trans() we can utilize __vlan_find_dev_deep_rcu() which will take care of finding an 802.1Q upper on top of a bridge master. A common use case, prior to 12a1526d067 ("net: dsa: untag the bridge pvid from rx skbs") was to configure a bridge 802.1Q upper like this: ip link add name br0 type bridge vlan_filtering 0 ip link add link br0 name br0.1 type vlan id 1 in order to pop the default_pvid VLAN tag. With this change we restore that behavior while still allowing the DSA receive path to automatically pop the VLAN tag. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:36:07 -07:00
Florian Fainelli	a348292b63	net: dsa: Obtain VLAN protocol from skb->protocol Now that dsa_untag_bridge_pvid() is called after eth_type_trans() we are guaranteed that skb->protocol will be set to a correct value, thus allowing us to avoid calling vlan_eth_hdr(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:36:07 -07:00
Florian Fainelli	1c5ad5a940	net: dsa: b53: Set untag_bridge_pvid Indicate to the DSA receive path that we need to untage the bridge PVID, this allows us to remove the dsa_untag_bridge_pvid() calls from net/dsa/tag_brcm.c. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:36:07 -07:00
Florian Fainelli	1dc0408cdf	net: dsa: Call dsa_untag_bridge_pvid() from dsa_switch_rcv() When a DSA switch driver needs to call dsa_untag_bridge_pvid(), it can set dsa_switch::untag_brige_pvid to indicate this is necessary. This is a pre-requisite to making sure that we are always calling dsa_untag_bridge_pvid() after eth_type_trans() has been called. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:36:07 -07:00
Vladimir Oltean	300fd579b2	net: dsa: tag_rtl4_a: use the generic flow dissector procedure Remove the .flow_dissect procedure, so the flow dissector will call the generic variant which works for this tagging protocol. Cc: Linus Walleij <linus.walleij@linaro.org> Cc: DENG Qingfang <dqfext@gmail.com> Cc: Mauri Sandberg <sandberg@mailfence.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	e665297983	net: dsa: tag_sja1105: use a custom flow dissector procedure The sja1105 is a bit of a special snowflake, in that not all frames are transmitted/received in the same way. L2 link-local frames are received with the source port/switch ID information put in the destination MAC address. For the rest, a tag_8021q header is used. So only the latter frames displace the rest of the headers and need to use the generic flow dissector procedure. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	6b04f171dc	net: dsa: tag_qca: use the generic flow dissector procedure Remove the .flow_dissect procedure, so the flow dissector will call the generic variant which works for this tagging protocol. Cc: John Crispin <john@phrozen.org> Cc: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	b1af365637	net: dsa: tag_mtk: use the generic flow dissector procedure Remove the .flow_dissect procedure, so the flow dissector will call the generic variant which works for this tagging protocol. Cc: DENG Qingfang <dqfext@gmail.com> Cc: Sean Wang <sean.wang@mediatek.com> Cc: John Crispin <john@phrozen.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	742b2e1951	net: dsa: tag_edsa: use the generic flow dissector procedure Remove the .flow_dissect procedure, so the flow dissector will call the generic variant which works for this tagging protocol. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	11f5011189	net: dsa: tag_dsa: use the generic flow dissector procedure Remove the .flow_dissect procedure, so the flow dissector will call the generic variant which works for this tagging protocol. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	f569ad5257	net: dsa: tag_brcm: use generic flow dissector procedure There are 2 Broadcom tags in use, one places the DSA tag before the Ethernet destination MAC address, and the other before the EtherType. Nonetheless, both displace the rest of the headers, so this tagger can use the generic flow dissector procedure which accounts for that. The ASCII art drawing is a good reference though, so keep it but move it somewhere else. Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	7a6ffe764b	net: dsa: point out the tail taggers The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and KSZ9893 switches also use tail tags. Tell that to the DSA core, since this makes a difference for the flow dissector. Most switches break the parsing of frame headers, but these ones don't, so no flow dissector adjustment needs to be done for them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	2e8cb1b3db	net: dsa: make the .flow_dissect tagger callback return void There is no tagger that returns anything other than zero, so just change the return type appropriately. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	5124197ce5	net: dsa: tag_ocelot: use a short prefix on both ingress and egress There are 2 goals that we follow: - Reduce the header size - Make the header size equal between RX and TX The issue that required long prefix on RX was the fact that the ocelot DSA tag, being put before Ethernet as it is, would overlap with the area that a DSA master uses for RX filtering (destination MAC address mainly). Now that we can ask DSA to put the master in promiscuous mode, in theory we could remove the prefix altogether and call it a day, but it looks like we can't. Using no prefix on ingress, some packets (such as ICMP) would be received, while others (such as PTP) would not be received. This is because the DSA master we use (enetc) triggers parse errors ("MAC rx frame errors") presumably because it sees Ethernet frames with a bad length. And indeed, when using no prefix, the EtherType (bytes 12-13 of the frame, bits 96-111) falls over the REW_VAL field from the extraction header, aka the PTP timestamp. When turning the short (32-bit) prefix on, the EtherType overlaps with bits 64-79 of the extraction header, which are a reserved area transmitted as zero by the switch. The packets are not dropped by the DSA master with a short prefix. Actually, the frames look like this in tcpdump (below is a PTP frame, with an extra dsa_8021q tag - dadb 0482 - added by a downstream sja1105). 89:0c:a9:f2:01:00 > 88:80:00:0a:00:1d, 802.3, length 0: LLC, \ dsap Unknown (0x10) Individual, ssap ProWay NM (0x0e) Response, \ ctrl 0x0004: Information, send seq 2, rcv seq 0, \ Flags [Response], length 78 0x0000: 8880 000a 001d 890c a9f2 0100 0000 100f ................ 0x0010: 0400 0000 0180 c200 000e 001f 7b63 0248 ............{c.H 0x0020: dadb 0482 88f7 1202 0036 0000 0000 0000 .........6...... 0x0030: 0000 0000 0000 0000 0000 001f 7bff fe63 ............{..c 0x0040: 0248 0001 1f81 0500 0000 0000 0000 0000 .H.............. 0x0050: 0000 0000 0000 0000 0000 0000 ............ So the short prefix is our new default: we've shortened our RX frames by 12 octets, increased TX by 4, and headers are now equal between RX and TX. Note that we still need promiscuous mode for the DSA master to not drop it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:58 -07:00
Vladimir Oltean	707091eb26	net: dsa: tag_sja1105: request promiscuous mode for master Currently PTP is broken when ports are in standalone mode (the tagger keeps printing this message): sja1105 spi0.1: Expected meta frame, is 01-80-c2-00-00-0e in the DSA master multicast filter? Sure, one might say "simply add 01-80-c2-00-00-0e to the master's RX filter" but things become more complicated because: - Actually all frames in the 01-80-c2-xx-xx-xx and 01-1b-19-xx-xx-xx range are trapped to the CPU automatically - The switch mangles bytes 3 and 4 of the MAC address via the incl_srcpt ("include source port [in the DMAC]") option, which is how source port and switch id identification is done for link-local traffic on RX. But this means that an address installed to the RX filter would, at the end of the day, not correspond to the final address seen by the DSA master. Assume RX filtering lists on DSA masters are typically too small to include all necessary addresses for PTP to work properly on sja1105, and just request promiscuous mode unconditionally. Just an example: Assuming the following addresses are trapped to the CPU: 01-80-c2-00-00-00 to 01-80-c2-00-00-ff 01-1b-19-00-00-00 to 01-1b-19-00-00-ff These are 512 addresses. Now let's say this is a board with 3 switches, and 4 ports per switch. The 512 addresses become 6144 addresses that must be managed by the DSA master's RX filtering lists. This may be refined in the future, but for now, it is simply not worth it to add the additional addresses to the master's RX filter, so simply request it to become promiscuous as soon as the driver probes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:58 -07:00
Vladimir Oltean	c3975400c8	net: dsa: allow drivers to request promiscuous mode on master Currently DSA assumes that taggers don't mess with the destination MAC address of the frames on RX. That is not always the case. Some DSA headers are placed before the Ethernet header (ocelot), and others simply mangle random bytes from the destination MAC address (sja1105 with its incl_srcpt option). Currently the DSA master goes to promiscuous mode automatically when the slave devices go too (such as when enslaved to a bridge), but in standalone mode this is a problem that needs to be dealt with. So give drivers the possibility to signal that their tagging protocol will get randomly dropped otherwise, and let DSA deal with fixing that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:58 -07:00
Vladimir Oltean	e2f9a8fe73	net: mscc: ocelot: always pass skb clone to ocelot_port_add_txtstamp_skb Currently, ocelot switchdev passes the skb directly to the function that enqueues it to the list of skb's awaiting a TX timestamp. Whereas the felix DSA driver first clones the skb, then passes the clone to this queue. This matters because in the case of felix, the common IRQ handler, which is ocelot_get_txtstamp(), currently clones the clone, and frees the original clone. This is useless and can be simplified by using skb_complete_tx_timestamp() instead of skb_tstamp_tx(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:47:56 -07:00
Florian Fainelli	ed409f3bba	net: dsa: b53: Configure VLANs while not filtering Update the B53 driver to support VLANs while not filtering. This requires us to enable VLAN globally within the switch upon driver initial configuration (dev->vlan_enabled). We also need to remove the code that dealt with PVID re-configuration in b53_vlan_filtering() since that function worked under the assumption that it would only be called to make a bridge VLAN filtering, or not filtering, and we would attempt to move the port's PVID accordingly. Now that VLANs are programmed all the time, even in the case of a non-VLAN filtering bridge, we would be programming a default_pvid for the bridged switch ports. We need the DSA receive path to pop the VLAN tag if it is the bridge's default_pvid because the CPU port is always programmed tagged in the programmed VLANs. In order to do so we utilize the dsa_untag_bridge_pvid() helper introduced in the commit before within net/dsa/tag_brcm.c. Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 18:13:45 -07:00
Vladimir Oltean	412a1526d0	net: dsa: untag the bridge pvid from rx skbs Currently the bridge untags VLANs present in its VLAN groups in __allowed_ingress() only when VLAN filtering is enabled. But when a skb is seen on the RX path as tagged with the bridge's pvid, and that bridge has vlan_filtering=0, and there isn't any 8021q upper with that VLAN either, then we have a problem. The bridge will not untag it (since it is supposed to remain VLAN-unaware), and pvid-tagged communication will be broken. There are 2 situations where we can end up like that: 1. When installing a pvid in egress-tagged mode, like this: ip link add dev br0 type bridge vlan_filtering 0 ip link set swp0 master br0 bridge vlan del dev swp0 vid 1 bridge vlan add dev swp0 vid 1 pvid This happens because DSA configures the VLAN membership of the CPU port using the same flags as swp0 (in this case "pvid and not untagged"), in an attempt to copy the frame as-is from ingress to the CPU. However, in this case, the packet may arrive untagged on ingress, it will be pvid-tagged by the ingress port, and will be sent as egress-tagged towards the CPU. Otherwise stated, the CPU will see a VLAN tag where there was none to speak of on ingress. When vlan_filtering is 1, this is not a problem, as stated in the first paragraph, because __allowed_ingress() will pop it. But currently, when vlan_filtering is 0 and we have such a VLAN configuration, we need an 8021q upper (br0.1) to be able to ping over that VLAN, which is not symmetrical with the vlan_filtering=1 case, and therefore, confusing for users. Basically what DSA attempts to do is simply an approximation: try to copy the skb with (or without) the same VLAN all the way up to the CPU. But DSA drivers treat CPU port VLAN membership in various ways (which is a good segue into situation 2). And some of those drivers simply tell the CPU port to copy the frame unmodified, which is the golden standard when it comes to VLAN processing (therefore, any driver which can configure the hardware to do that, should do that, and discard the VLAN flags requested by DSA on the CPU port). 2. Some DSA drivers always configure the CPU port as egress-tagged, in an attempt to recover the classified VLAN from the skb. These drivers cannot work at all with untagged traffic when bridged in vlan_filtering=0 mode. And they can't go for the easy "just keep the pvid as egress-untagged towards the CPU" route, because each front port can have its own pvid, and that might require conflicting VLAN membership settings on the CPU port (swp1 is pvid for VID 1 and egress-tagged for VID 2; swp2 is egress-taggeed for VID 1 and pvid for VID 2; with this simplistic approach, the CPU port, which is really a separate hardware entity and has its own VLAN membership settings, would end up being egress-untagged in both VID 1 and VID 2, therefore losing the VLAN tags of ingress traffic). So the only thing we can do is to create a helper function for resolving the problematic case (that is, a function which untags the bridge pvid when that is in vlan_filtering=0 mode), which taggers in need should call. It isn't called from the generic DSA receive path because there are drivers that fall neither in the first nor second category. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 18:13:45 -07:00
David S. Miller	3ab0a7a0c3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-22 16:45:34 -07:00
Vladimir Oltean	88525fc01c	net: dsa: tag_sja1105: add compatibility with hwaccel VLAN tags Check whether there is any hwaccel VLAN tag on RX, and if there is, treat it as the tag_8021q header. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:01:34 -07:00
Vladimir Oltean	bbed0bbddd	net: dsa: tag_8021q: add VLANs to the master interface too The whole purpose of tag_8021q is to send VLAN-tagged traffic to the CPU, from which the driver can decode the source port and switch id. Currently this only works if the VLAN filtering on the master is disabled. Change that by explicitly adding code to tag_8021q.c to add the VLANs corresponding to the tags to the filter of the master interface. Because we now need to call vlan_vid_add, then we also need to hold the RTNL mutex. Propagate that requirement to the callers of dsa_8021q_setup and modify the existing call sites as appropriate. Note that one call path, sja1105_best_effort_vlan_filtering_set -> sja1105_vlan_filtering -> sja1105_setup_8021q_tagging, was already holding this lock. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:01:34 -07:00
Vladimir Oltean	2209158c90	net: dsa: install VLANs into the master's RX filter too Most DSA switch tags shift the EtherType to the right, causing the master to not parse the VLAN as VLAN. However, not all switches do that (example: tail tags, tag_8021q etc), and if the DSA master has "rx-vlan-filter: on" in ethtool -k, then we have a problem. Therefore, we could populate the VLAN table of the master, just in case (for some switches it will not make a difference), so that network I/O can work even with a VLAN filtering master. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:01:34 -07:00
Vladimir Oltean	adb256eb17	net: dsa: allow 8021q uppers while the bridge has vlan_filtering=0 When the bridge has VLAN awareness disabled there isn't any duplication of functionality, since the bridge does not process VLAN. Don't deny adding 8021q uppers to DSA switch ports in that case. The switch is supposed to simply pass traffic leaving the VLAN tag as-is, and the stack will happily strip the VLAN tag for all 8021q uppers that exist. We need to ensure that there are no 8021q uppers when the user attempts to enable bridge vlan_filtering. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:01:34 -07:00
Vladimir Oltean	707ec383b3	net: dsa: refuse configuration in prepare phase of dsa_port_vlan_filtering() The current logic beats me a little bit. The comment that "bridge skips -EOPNOTSUPP, so skip the prepare phase" was introduced in commit `fb2dabad69` ("net: dsa: support VLAN filtering switchdev attr"). I'm not sure: (a) ok, the bridge skips -EOPNOTSUPP, but, so what, where are we returning -EOPNOTSUPP? (b) even if we are, and I'm just not seeing it, what is the causality relationship between the bridge skipping -EOPNOTSUPP and DSA skipping the prepare phase, and just returning zero? One thing is certain beyond doubt though, and that is that DSA currently refuses VLAN filtering from the "commit" phase instead of "prepare", and that this is not a good thing: ip link add br0 type bridge ip link add br1 type bridge vlan_filtering 1 ip link set swp2 master br0 ip link set swp3 master br1 [ 3790.379389] 001: sja1105 spi0.1: VLAN filtering is a global setting [ 3790.379399] 001: ------------[ cut here ]------------ [ 3790.379403] 001: WARNING: CPU: 1 PID: 515 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4 [ 3790.379420] 001: swp3: Commit of attribute (id=6) failed. [ 3790.379533] 001: [<c11ff588>] (switchdev_port_attr_set_now) from [<c11b62e4>] (nbp_vlan_init+0x84/0x148) [ 3790.379544] 001: [<c11b62e4>] (nbp_vlan_init) from [<c11a2ff0>] (br_add_if+0x514/0x670) [ 3790.379554] 001: [<c11a2ff0>] (br_add_if) from [<c1031b5c>] (do_setlink+0x38c/0xab0) [ 3790.379565] 001: [<c1031b5c>] (do_setlink) from [<c1036fe8>] (__rtnl_newlink+0x44c/0x748) [ 3790.379573] 001: [<c1036fe8>] (__rtnl_newlink) from [<c1037328>] (rtnl_newlink+0x44/0x60) [ 3790.379580] 001: [<c1037328>] (rtnl_newlink) from [<c10315fc>] (rtnetlink_rcv_msg+0x124/0x2f8) [ 3790.379590] 001: [<c10315fc>] (rtnetlink_rcv_msg) from [<c10926b8>] (netlink_rcv_skb+0xb8/0x110) [ 3790.379806] 001: ---[ end trace 0000000000000002 ]--- [ 3790.379819] 001: sja1105 spi0.1 swp3: failed to initialize vlan filtering on this port So move the current logic that may fail (except ds->ops->port_vlan_filtering, that is way harder) into the prepare stage of the switchdev transaction. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:01:34 -07:00
Vladimir Oltean	1ce39f0ee8	net: dsa: convert denying bridge VLAN with existing 8021q upper to PRECHANGEUPPER This is checking for the following order of operations, and makes sure to deny that configuration: ip link add link swp2 name swp2.100 type vlan id 100 ip link add br0 type bridge vlan_filtering 1 ip link set swp2 master br0 bridge vlan add dev swp2 vid 100 Instead of using vlan_for_each(), which looks at the VLAN filters installed with vlan_vid_add(), just track the 8021q uppers. This has the advantage of freeing up the vlan_vid_add() call for actual VLAN filtering. There is another change in this patch. The check is moved in slave.c, from switch.c. I don't think it makes sense to have this 8021q upper check for each switch port that gets notified of that VLAN addition (these include DSA links and CPU ports, we know those can't have 8021q uppers because they don't have a net_device registered for them), so just do it in slave.c, for that one slave interface. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:01:33 -07:00
Vladimir Oltean	2b13840672	net: dsa: convert check for 802.1Q upper when bridged into PRECHANGEUPPER DSA tries to prevent having a VLAN added by a bridge and by an 802.1Q upper at the same time. It does that by checking the VID in .ndo_vlan_rx_add_vid(), since that's something that the 8021q module calls, via vlan_vid_add(). When a VLAN matches in both subsystems, this check returns -EBUSY. However the vlan_vid_add() function isn't specific to the 8021q module in any way at all. It is simply the kernel's way to tell an interface to add a VLAN to its RX filter and not drop that VLAN. So there's no reason to return -EBUSY when somebody tries to call vlan_vid_add() for a VLAN that was installed by the bridge. The proper behavior is to accept that configuration. So what's wrong is how DSA checks that it has an 8021q upper. It should look at the actual uppers for that, not just assume that the 8021q module was somewhere in the call stack of .ndo_vlan_rx_add_vid(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:01:33 -07:00
Vladimir Oltean	eb46e8da1d	net: dsa: rename dsa_slave_upper_vlan_check to something more suggestive We'll be adding a new check in the PRECHANGEUPPER notifier, where we'll need to check some VLAN uppers. It is hard to do that when there is already a function named dsa_slave_upper_vlan_check. So rename this one. Not to mention that this function probably shouldn't have started with "dsa_slave_" in the first place, since the struct net_device argument isn't a DSA slave, but an 8021q upper of one. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:01:33 -07:00
Vladimir Oltean	8350129930	net: dsa: deny enslaving 802.1Q upper to VLAN-aware bridge from PRECHANGEUPPER There doesn't seem to be any strong technical reason for doing it this way, but we'll be adding more checks for invalid upper device configurations, and it will be easier to have them all grouped under PRECHANGEUPPER. Tested that it still works: ip link set br0 type bridge vlan_filtering 1 ip link add link swp2 name swp2.100 type vlan id 100 ip link set swp2.100 master br0 [ 20.321312] br0: port 5(swp2.100) entered blocking state [ 20.326711] br0: port 5(swp2.100) entered disabled state Error: dsa_core: Cannot enslave VLAN device into VLAN aware bridge. [ 20.346549] br0: port 5(swp2.100) entered blocking state [ 20.351957] br0: port 5(swp2.100) entered disabled state Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:01:33 -07:00
Andrew Lunn	0f06b855a9	net: dsa: wire up devlink info get Allow the DSA drivers to implement the devlink call to get info info, e.g. driver name, firmware version, ASIC ID, etc. v2: Combine declaration and the assignment on a single line. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-18 18:18:30 -07:00
Andrew Lunn	97c82c2313	net: dsa: Add devlink regions support to DSA Allow DSA drivers to make use of devlink regions, via simple wrappers. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-18 18:17:45 -07:00
Andrew Lunn	ccc3e6b019	net: dsa: Add helper to convert from devlink to ds Given a devlink instance, return the dsa switch it is associated to. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-18 18:17:45 -07:00
Vladimir Oltean	6565243c06	net: mscc: ocelot: add locking for the port TX timestamp ID The ocelot_port->ts_id is used to: (a) populate skb->cb[0] for matching the TX timestamp in the PTP IRQ with an skb. (b) populate the REW_OP from the injection header of the ongoing skb. Only then is ocelot_port->ts_id incremented. This is a problem because, at least theoretically, another timestampable skb might use the same ocelot_port->ts_id before that is incremented. Normally all transmit calls are serialized by the netdev transmit spinlock, but in this case, ocelot_port_add_txtstamp_skb() is also called by DSA, which has started declaring the NETIF_F_LLTX feature since commit `2b86cb8299` ("net: dsa: declare lockless TX feature for slave ports"). So the logic of using and incrementing the timestamp id should be atomic per port. The solution is to use the global ocelot_port->ts_id only while protected by the associated ocelot_port->ts_id_lock. That's where we populate skb->cb[0]. Note that for ocelot, ocelot_port_add_txtstamp_skb is called for the actual skb, but for felix, it is called for the skb's clone. That is something which will also be changed in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-18 13:52:33 -07:00
Vladimir Oltean	88236591ec	Revert "net: dsa: Add more convenient functions for installing port VLANs" This reverts commit `314f76d7a6`. Citing that commit message, the call graph was: dsa_slave_vlan_rx_add_vid dsa_port_setup_8021q_tagging \| \| \| \| \| +-------------+ \| \| v v dsa_port_vid_add dsa_slave_port_obj_add \| \| +-------+ +-------+ \| \| v v dsa_port_vlan_add Now that tag_8021q has its own ops structure, it no longer relies on dsa_port_vid_add, and therefore on the dsa_switch_ops to install its VLANs. So dsa_port_vid_add now only has one single caller. So we can simplify the call graph to what it was before, aka: dsa_slave_vlan_rx_add_vid dsa_slave_port_obj_add \| \| +-------+ +-------+ \| \| v v dsa_port_vlan_add Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-11 17:30:43 -07:00
Vladimir Oltean	5899ee367a	net: dsa: tag_8021q: add a context structure While working on another tag_8021q driver implementation, some things became apparent: - It is not mandatory for a DSA driver to offload the tag_8021q VLANs by using the VLAN table per se. For example, it can add custom TCAM rules that simply encapsulate RX traffic, and redirect & decapsulate rules for TX traffic. For such a driver, it makes no sense to receive the tag_8021q configuration through the same callback as it receives the VLAN configuration from the bridge and the 8021q modules. - Currently, sja1105 (the only tag_8021q user) sets a priv->expect_dsa_8021q variable to distinguish between the bridge calling, and tag_8021q calling. That can be improved, to say the least. - The crosschip bridging operations are, in fact, stateful already. The list of crosschip_links must be kept by the caller and passed to the relevant tag_8021q functions. So it would be nice if the tag_8021q configuration was more self-contained. This patch attempts to do that. Create a struct dsa_8021q_context which encapsulates a struct dsa_switch, and has 2 function pointers for adding and deleting a VLAN. These will replace the previous channel to the driver, which was through the .port_vlan_add and .port_vlan_del callbacks of dsa_switch_ops. Also put the list of crosschip_links into this dsa_8021q_context. Drivers that don't support cross-chip bridging can simply omit to initialize this list, as long as they dont call any cross-chip function. The sja1105_vlan_add and sja1105_vlan_del functions are refactored into a smaller sja1105_vlan_add_one, which now has 2 entry points: - sja1105_vlan_add, from struct dsa_switch_ops - sja1105_dsa_8021q_vlan_add, from the tag_8021q ops But even this change is fairly trivial. It just reflects the fact that for sja1105, the VLANs from these 2 channels end up in the same hardware table. However that is not necessarily true in the general sense (and that's the reason for making this change). The rest of the patch is mostly plain refactoring of "ds" -> "ctx". The dsa_8021q_context structure needs to be propagated because adding a VLAN is now done through the ops function pointers inside of it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-11 17:30:43 -07:00
Vladimir Oltean	7e092af2f3	net: dsa: tag_8021q: setup tagging via a single function call There is no point in calling dsa_port_setup_8021q_tagging for each individual port. Additionally, it will become more difficult to do that when we'll have a context structure to tag_8021q (next patch). So refactor this now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-11 17:30:43 -07:00
Vladimir Oltean	2f1e8ea726	net: dsa: link interfaces with the DSA master to get rid of lockdep warnings Since commit `845e0ebb44` ("net: change addr_list_lock back to static key"), cascaded DSA setups (DSA switch port as DSA master for another DSA switch port) are emitting this lockdep warning: ============================================ WARNING: possible recursive locking detected 5.8.0-rc1-00133-g923e4b5032dd-dirty #208 Not tainted -------------------------------------------- dhcpcd/323 is trying to acquire lock: ffff000066dd4268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90 but task is already holding lock: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&dsa_master_addr_list_lock_key/1); lock(&dsa_master_addr_list_lock_key/1); * DEADLOCK * May be due to missing lock nesting notation 3 locks held by dhcpcd/323: #0: ffffdbd1381dda18 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30 #1: ffff00006614b268 (_xmit_ETHER){+...}-{2:2}, at: dev_set_rx_mode+0x28/0x48 #2: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90 stack backtrace: Call trace: dump_backtrace+0x0/0x1e0 show_stack+0x20/0x30 dump_stack+0xec/0x158 __lock_acquire+0xca0/0x2398 lock_acquire+0xe8/0x440 _raw_spin_lock_nested+0x64/0x90 dev_mc_sync+0x44/0x90 dsa_slave_set_rx_mode+0x34/0x50 __dev_set_rx_mode+0x60/0xa0 dev_mc_sync+0x84/0x90 dsa_slave_set_rx_mode+0x34/0x50 __dev_set_rx_mode+0x60/0xa0 dev_set_rx_mode+0x30/0x48 __dev_open+0x10c/0x180 __dev_change_flags+0x170/0x1c8 dev_change_flags+0x2c/0x70 devinet_ioctl+0x774/0x878 inet_ioctl+0x348/0x3b0 sock_do_ioctl+0x50/0x310 sock_ioctl+0x1f8/0x580 ksys_ioctl+0xb0/0xf0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common.constprop.0+0x7c/0x180 do_el0_svc+0x2c/0x98 el0_sync_handler+0x9c/0x1b8 el0_sync+0x158/0x180 Since DSA never made use of the netdev API for describing links between upper devices and lower devices, the dev->lower_level value of a DSA switch interface would be 1, which would warn when it is a DSA master. We can use netdev_upper_dev_link() to describe the relationship between a DSA slave and a DSA master. To be precise, a DSA "slave" (switch port) is an "upper" to a DSA "master" (host port). The relationship is "many uppers to one lower", like in the case of VLAN. So, for that reason, we use the same function as VLAN uses. There might be a chance that somebody will try to take hold of this interface and use it immediately after register_netdev() and before netdev_upper_dev_link(). To avoid that, we do the registration and linkage while holding the RTNL, and we use the RTNL-locked cousin of register_netdev(), which is register_netdevice(). Since this warning was not there when lockdep was using dynamic keys for addr_list_lock, we are blaming the lockdep patch itself. The network stack _has_ been using static lockdep keys before, and it _is_ likely that stacked DSA setups have been triggering these lockdep warnings since forever, however I can't test very old kernels on this particular stacked DSA setup, to ensure I'm not in fact introducing regressions. Fixes: `845e0ebb44` ("net: change addr_list_lock back to static key") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 19:40:09 -07:00
Vladimir Oltean	4349abdb40	net: dsa: don't print non-fatal MTU error if not supported Commit `72579e14a1` ("net: dsa: don't fail to probe if we couldn't set the MTU") changed, for some reason, the "err && err != -EOPNOTSUPP" check into a simple "err". This causes the MTU warning to be printed even for drivers that don't have the MTU operations implemented. Fix that. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 21:01:50 -07:00
Vladimir Oltean	c9ebf126f1	net: dsa: change PHY error message again slave_dev->name is only populated at this stage if it was specified through a label in the device tree. However that is not mandatory. When it isn't, the error message looks like this: [ 5.037057] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d [ 5.044672] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d [ 5.052275] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d [ 5.059877] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d which is especially confusing since the error gets printed on behalf of the DSA master (fsl_enetc in this case). Printing an error message that contains a valid reference to the DSA port's name is difficult at this point in the initialization stage, so at least we should print some info that is more reliable, even if less user-friendly. That may be the driver name and the hardware port index. After this change, the error is printed as: [ 6.051587] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 0 [ 6.061192] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 1 [ 6.070765] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 2 [ 6.080324] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 3 Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 21:00:53 -07:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Vladimir Oltean	5df5661a13	net: dsa: stop overriding master's ndo_get_phys_port_name The purpose of this override is to give the user an indication of what the number of the CPU port is (in DSA, the CPU port is a hardware implementation detail and not a network interface capable of traffic). However, it has always failed (by design) at providing this information to the user in a reliable fashion. Prior to commit `3369afba1e` ("net: Call into DSA netdevice_ops wrappers"), the behavior was to only override this callback if it was not provided by the DSA master. That was its first failure: if the DSA master itself was a DSA port or a switchdev, then the user would not see the number of the CPU port in /sys/class/net/eth0/phys_port_name, but the number of the DSA master port within its respective physical switch. But that was actually ok in a way. The commit mentioned above changed that behavior, and now overrides the master's ndo_get_phys_port_name unconditionally. That comes with problems of its own, which are worse in a way. The idea is that it's typical for switchdev users to have udev rules for consistent interface naming. These are based, among other things, on the phys_port_name attribute. If we let the DSA switch at the bottom to start randomly overriding ndo_get_phys_port_name with its own CPU port, we basically lose any predictability in interface naming, or even uniqueness, for that matter. So, there are reasons to let DSA override the master's callback (to provide a consistent interface, a number which has a clear meaning and must not be interpreted according to context), and there are reasons to not let DSA override it (it breaks udev matching for the DSA master). But, there is an alternative method for users to retrieve the number of the CPU port of each DSA switch in the system: $ devlink port pci/0000:00:00.5/0: type eth netdev swp0 flavour physical port 0 pci/0000:00:00.5/2: type eth netdev swp2 flavour physical port 2 pci/0000:00:00.5/4: type notset flavour cpu port 4 spi/spi2.0/0: type eth netdev sw0p0 flavour physical port 0 spi/spi2.0/1: type eth netdev sw0p1 flavour physical port 1 spi/spi2.0/2: type eth netdev sw0p2 flavour physical port 2 spi/spi2.0/4: type notset flavour cpu port 4 spi/spi2.1/0: type eth netdev sw1p0 flavour physical port 0 spi/spi2.1/1: type eth netdev sw1p1 flavour physical port 1 spi/spi2.1/2: type eth netdev sw1p2 flavour physical port 2 spi/spi2.1/3: type eth netdev sw1p3 flavour physical port 3 spi/spi2.1/4: type notset flavour cpu port 4 So remove this duplicated, unreliable and troublesome method. From this patch on, the phys_port_name attribute of the DSA master will only contain information about itself (if at all). If the users need reliable information about the CPU port they're probably using devlink anyway. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-23 15:14:58 -07:00
Kurt Kanzenbach	85e05d263e	net: dsa: of: Allow ethernet-ports as encapsulating node Due to unified Ethernet Switch Device Tree Bindings allow for ethernet-ports as encapsulating node as well. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-22 16:56:43 -07:00
Vladimir Oltean	71d4364abd	net: dsa: use the ETH_MIN_MTU and ETH_DATA_LEN default values Now that DSA supports MTU configuration, undo the effects of commit `8b1efc0f83` ("net: remove MTU limits on a few ether_setup callers") and let DSA interfaces use the default min_mtu and max_mtu specified by ether_setup(). This is more important for min_mtu: since DSA is Ethernet, the minimum MTU is the same as of any other Ethernet interface, and definitely not zero. For the max_mtu, we have a callback through which drivers can override that, if they want to. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 18:35:04 -07:00
Florian Fainelli	9c0c7014f3	net: dsa: Setup dsa_netdev_ops Now that we have all the infrastructure in place for calling into the dsa_ptr->netdev_ops function pointers, install them when we configure the DSA CPU/management interface and tear them down. The flow is unchanged from before, but now we preserve equality of tests when network device drivers do tests like dev->netdev_ops == &foo_ops which was not the case before since we were allocating an entirely new structure. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-20 16:48:22 -07:00
Vladimir Oltean	67c2404922	net: dsa: felix: create a template for the DSA tags on xmit With this patch we try to kill 2 birds with 1 stone. First of all, some switches that use tag_ocelot.c don't have the exact same bitfield layout for the DSA tags. The destination ports field is different for Seville VSC9953 for example. So the choices are to either duplicate tag_ocelot.c into a new tag_seville.c (sub-optimal) or somehow take into account a supposed ocelot->dest_ports_offset when packing this field into the DSA injection header (again not ideal). Secondly, tag_ocelot.c already needs to memset a 128-bit area to zero and call some packing() functions of dubious performance in the fastpath. And most of the values it needs to pack are pretty much constant (BYPASS=1, SRC_PORT=CPU, DEST=port index). So it would be good if we could improve that. The proposed solution is to allocate a memory area per port at probe time, initialize that with the statically defined bits as per chip hardware revision, and just perform a simpler memcpy in the fastpath. Other alternatives have been analyzed, such as: - Create a separate tag_seville.c: too much code duplication for just 1 bit field difference. - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like tag_brcm.c, which would have a separate .xmit function. Again, too much code duplication for just 1 bit field difference. - Allocate the template from the init function of the tag_ocelot.c module, instead of from the driver: couldn't figure out a method of accessing the correct port template corresponding to the correct tagger in the .xmit function. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-13 17:40:01 -07:00
Danielle Ratson	71ad8d55f8	devlink: Replace devlink_port_attrs_set parameters with a struct Currently, devlink_port_attrs_set accepts a long list of parameters, that most of them are devlink port's attributes. Use the devlink_port_attrs struct to replace the relevant parameters. Signed-off-by: Danielle Ratson <danieller@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-09 13:15:29 -07:00
Linus Walleij	efd7fe68f0	net: dsa: tag_rtl4_a: Implement Realtek 4 byte A tag This implements the known parts of the Realtek 4 byte tag protocol version 0xA, as found in the RTL8366RB DSA switch. It is designated as protocol version 0xA as a different Realtek 4 byte tag format with protocol version 0x9 is known to exist in the Realtek RTL8306 chips. The tag and switch chip lacks public documentation, so the tag format has been reverse-engineered from packet dumps. As only ingress traffic has been available for analysis an egress tag has not been possible to develop (even using educated guesses about bit fields) so this is as far as it gets. It is not known if the switch even supports egress tagging. Excessive attempts to figure out the egress tag format was made. When nothing else worked, I just tried all bit combinations with 0xannp where a is protocol and p is port. I looped through all values several times trying to get a response from ping, without any positive result. Using just these ingress tags however, the switch functionality is vastly improved and the packets find their way into the destination port without any tricky VLAN configuration. On the D-Link DIR-685 the LAN ports now come up and respond to ping without any command line configuration so this is a real improvement for users. Egress packets need to be restricted to the proper target ports using VLAN, which the RTL8366RB DSA switch driver already sets up. Cc: DENG Qingfang <dqfext@gmail.com> Cc: Mauri Sandberg <sandberg@mailfence.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-08 15:36:19 -07:00
Andrew Lunn	99bac53d06	net: dsa: tag_qca.c: Fix warning for __be16 vs u16 net/dsa/tag_qca.c:48:15: warning: incorrect type in assignment (different base types) net/dsa/tag_qca.c:48:15: expected unsigned short [usertype] net/dsa/tag_qca.c:48:15: got restricted __be16 [usertype] net/dsa/tag_qca.c:68:13: warning: incorrect type in assignment (different base types) net/dsa/tag_qca.c:68:13: expected restricted __be16 [usertype] hdr net/dsa/tag_qca.c:68:13: got int net/dsa/tag_qca.c:71:16: warning: restricted __be16 degrades to integer net/dsa/tag_qca.c:81:17: warning: restricted __be16 degrades to integer Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-05 15:31:58 -07:00
Andrew Lunn	04d63f9d91	net: dsa: tag_mtk: Fix warnings for __be16 net/dsa/tag_mtk.c:84:13: warning: incorrect type in assignment (different base types) net/dsa/tag_mtk.c:84:13: expected restricted __be16 [usertype] hdr net/dsa/tag_mtk.c:84:13: got int net/dsa/tag_mtk.c:94:17: warning: restricted __be16 degrades to integer The result of a ntohs() is not __be16, but u16. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-05 15:31:58 -07:00
Andrew Lunn	802734ad77	net: dsa: tag_lan9303: Fix __be16 warnings net/dsa/tag_lan9303.c:76:24: warning: incorrect type in assignment (different base types) net/dsa/tag_lan9303.c:76:24: expected unsigned short [usertype] net/dsa/tag_lan9303.c:76:24: got restricted __be16 [usertype] net/dsa/tag_lan9303.c:80:24: warning: incorrect type in assignment (different base types) net/dsa/tag_lan9303.c:80:24: expected unsigned short [usertype] net/dsa/tag_lan9303.c:80:24: got restricted __be16 [usertype] net/dsa/tag_lan9303.c:106:31: warning: restricted __be16 degrades to integer net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16 net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16 net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16 net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16 Make use of __be16 where appropriate to fix these warnings. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-05 15:31:58 -07:00
Andrew Lunn	ed6444ea03	net: dsa: tag_ksz: Fix __be16 warnings cpu_to_be16 returns a __be16 value. So what it is assigned to needs to have the same type to avoid warnings. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-05 15:31:58 -07:00
Andrew Lunn	a61bf20831	net: dsa: Add __percpu property to prevent warnings net/dsa/slave.c:505:13: warning: incorrect type in initializer (different address spaces) net/dsa/slave.c:505:13: expected void const [noderef] <asn:3> __vpp_verify net/dsa/slave.c:505:13: got struct pcpu_sw_netstats Add the needed _percpu property to prevent this warning. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-05 15:31:58 -07:00
Florian Fainelli	65951a9eb6	net: dsa: Improve subordinate PHY error message It is not very informative to know the DSA master device when a subordinate network device fails to get its PHY setup. Provide the device name and capitalize PHY while we are it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 12:44:20 -07:00
Daniel Mack	1ed9ec9b08	dsa: Allow forwarding of redirected IGMP traffic The driver for Marvell switches puts all ports in IGMP snooping mode which results in all IGMP/MLD frames that ingress on the ports to be forwarded to the CPU only. The bridge code in the kernel can then interpret these frames and act upon them, for instance by updating the mdb in the switch to reflect multicast memberships of stations connected to the ports. However, the IGMP/MLD frames must then also be forwarded to other ports of the bridge so external IGMP queriers can track membership reports, and external multicast clients can receive query reports from foreign IGMP queriers. Currently, this is impossible as the EDSA tagger sets offload_fwd_mark on the skb when it unwraps the tagged frames, and that will make the switchdev layer prevent the skb from egressing on any other port of the same switch. To fix that, look at the To_CPU code in the DSA header and make forwarding of the frame possible for trapped IGMP packets. Introduce some #defines for the frame types to make the code a bit more comprehensive. This was tested on a Marvell 88E6352 variant. Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-24 14:39:43 -07:00
Linus Torvalds	96144c58ab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Fix cfg80211 deadlock, from Johannes Berg. 2) RXRPC fails to send norigications, from David Howells. 3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from Geliang Tang. 4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu. 5) The ucc_geth driver needs __netdev_watchdog_up exported, from Valentin Longchamp. 6) Fix hashtable memory leak in dccp, from Wang Hai. 7) Fix how nexthops are marked as FDB nexthops, from David Ahern. 8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni. 9) Fix crashes in tipc_disc_rcv(), from Tuong Lien. 10) Fix link speed reporting in iavf driver, from Brett Creeley. 11) When a channel is used for XSK and then reused again later for XSK, we forget to clear out the relevant data structures in mlx5 which causes all kinds of problems. Fix from Maxim Mikityanskiy. 12) Fix memory leak in genetlink, from Cong Wang. 13) Disallow sockmap attachments to UDP sockets, it simply won't work. From Lorenz Bauer. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) net: ethernet: ti: ale: fix allmulti for nu type ale net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init net: atm: Remove the error message according to the atomic context bpf: Undo internal BPF_PROBE_MEM in BPF insns dump libbpf: Support pre-initializing .bss global variables tools/bpftool: Fix skeleton codegen bpf: Fix memlock accounting for sock_hash bpf: sockmap: Don't attach programs to UDP sockets bpf: tcp: Recv() should return 0 when the peer socket is closed ibmvnic: Flush existing work items before device removal genetlink: clean up family attributes allocations net: ipa: header pad field only valid for AP->modem endpoint net: ipa: program upper nibbles of sequencer type net: ipa: fix modem LAN RX endpoint id net: ipa: program metadata mask differently ionic: add pcie_print_link_status rxrpc: Fix race between incoming ACK parser and retransmitter net/mlx5: E-Switch, Fix some error pointer dereferences net/mlx5: Don't fail driver on failure to create debugfs net/mlx5e: CT: Fix ipv6 nat header rewrite actions ...	2020-06-13 16:27:13 -07:00
Masahiro Yamada	a7f7f6248d	treewide: replace '---help---' in Kconfig files with 'help' Since commit `84af7a6194` ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig' \| xargs sed -i 's/^[[:space:]]---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-06-14 01:57:21 +09:00
Cong Wang	845e0ebb44	net: change addr_list_lock back to static key The dynamic key update for addr_list_lock still causes troubles, for example the following race condition still exists: CPU 0: CPU 1: (RCU read lock) (RTNL lock) dev_mc_seq_show() netdev_update_lockdep_key() -> lockdep_unregister_key() -> netif_addr_lock_bh() because lockdep doesn't provide an API to update it atomically. Therefore, we have to move it back to static keys and use subclass for nest locking like before. In commit `1a33e10e4a` ("net: partially revert dynamic lockdep key changes"), I already reverted most parts of commit `ab92d68fc2` ("net: core: add generic lockdep keys"). This patch reverts the rest and also part of commit `f3b0a18bb6` ("net: remove unnecessary variables and callback"). After this patch, addr_list_lock changes back to using static keys and subclasses to satisfy lockdep. Thanks to dev->lower_level, we do not have to change back to ->ndo_get_lock_subclass(). And hopefully this reduces some syzbot lockdep noises too. Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com Cc: Taehee Yoo <ap420073@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-09 12:59:45 -07:00
David S. Miller	1806c13dc2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-31 17:48:46 -07:00
Vladimir Oltean	04198499b2	net: dsa: tag_8021q: stop restoring VLANs from bridge Right now, our only tag_8021q user, sja1105, has the ability to restore bridge VLANs on its own, so this logic is unnecessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 16:45:46 -07:00
Vladimir Oltean	2b86cb8299	net: dsa: declare lockless TX feature for slave ports Be there a platform with the following layout: Regular NIC \| +----> DSA master for switch port \| +----> DSA master for another switch port After changing DSA back to static lockdep class keys in commit `1a33e10e4a` ("net: partially revert dynamic lockdep key changes"), this kernel splat can be seen: [ 13.361198] ============================================ [ 13.366524] WARNING: possible recursive locking detected [ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted [ 13.377874] -------------------------------------------- [ 13.383201] swapper/0/0 is trying to acquire lock: [ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.397879] [ 13.397879] but task is already holding lock: [ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.413593] [ 13.413593] other info that might help us debug this: [ 13.420140] Possible unsafe locking scenario: [ 13.420140] [ 13.426075] CPU0 [ 13.428523] ---- [ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.440924] [ 13.440924] * DEADLOCK * [ 13.440924] [ 13.446860] May be due to missing lock nesting notation [ 13.446860] [ 13.453668] 6 locks held by swapper/0/0: [ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400 [ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560 [ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10 [ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.512000] [ 13.512000] stack backtrace: [ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 [ 13.530421] Call trace: [ 13.532871] dump_backtrace+0x0/0x1d8 [ 13.536539] show_stack+0x24/0x30 [ 13.539862] dump_stack+0xe8/0x150 [ 13.543271] __lock_acquire+0x1030/0x1678 [ 13.547290] lock_acquire+0xf8/0x458 [ 13.550873] _raw_spin_lock+0x44/0x58 [ 13.554543] __dev_queue_xmit+0x84c/0xbe0 [ 13.558562] dev_queue_xmit+0x24/0x30 [ 13.562232] dsa_slave_xmit+0xe0/0x128 [ 13.565988] dev_hard_start_xmit+0xf4/0x448 [ 13.570182] __dev_queue_xmit+0x808/0xbe0 [ 13.574200] dev_queue_xmit+0x24/0x30 [ 13.577869] neigh_resolve_output+0x15c/0x220 [ 13.582237] ip6_finish_output2+0x244/0xb10 [ 13.586430] __ip6_finish_output+0x1dc/0x298 [ 13.590709] ip6_output+0x84/0x358 [ 13.594116] mld_sendpack+0x2bc/0x560 [ 13.597786] mld_ifc_timer_expire+0x210/0x390 [ 13.602153] call_timer_fn+0xcc/0x400 [ 13.605822] run_timer_softirq+0x588/0x6e0 [ 13.609927] __do_softirq+0x118/0x590 [ 13.613597] irq_exit+0x13c/0x148 [ 13.616918] __handle_domain_irq+0x6c/0xc0 [ 13.621023] gic_handle_irq+0x6c/0x160 [ 13.624779] el1_irq+0xbc/0x180 [ 13.627927] cpuidle_enter_state+0xb4/0x4d0 [ 13.632120] cpuidle_enter+0x3c/0x50 [ 13.635703] call_cpuidle+0x44/0x78 [ 13.639199] do_idle+0x228/0x2c8 [ 13.642433] cpu_startup_entry+0x2c/0x48 [ 13.646363] rest_init+0x1ac/0x280 [ 13.649773] arch_call_rest_init+0x14/0x1c [ 13.653878] start_kernel+0x490/0x4bc Lockdep keys themselves were added in commit `ab92d68fc2` ("net: core: add generic lockdep keys"), and it's very likely that this splat existed since then, but I have no real way to check, since this stacked platform wasn't supported by mainline back then. >From Taehee's own words: This patch was considered that all stackable devices have LLTX flag. But the dsa doesn't have LLTX, so this splat happened. After this patch, dsa shares the same lockdep class key. On the nested dsa interface architecture, which you illustrated, the same lockdep class key will be used in __dev_queue_xmit() because dsa doesn't have LLTX. So that lockdep detects deadlock because the same lockdep class key is used recursively although actually the different locks are used. There are some ways to fix this problem. 1. using NETIF_F_LLTX flag. If possible, using the LLTX flag is a very clear way for it. But I'm so sorry I don't know whether the dsa could have LLTX or not. 2. using dynamic lockdep again. It means that each interface uses a separate lockdep class key. So, lockdep will not detect recursive locking. But this way has a problem that it could consume lockdep class key too many. Currently, lockdep can have 8192 lockdep class keys. - you can see this number with the following command. cat /proc/lockdep_stats lock-classes: 1251 [max: 8192] ... The [max: 8192] means that the maximum number of lockdep class keys. If too many lockdep class keys are registered, lockdep stops to work. So, using a dynamic(separated) lockdep class key should be considered carefully. In addition, updating lockdep class key routine might have to be existing. (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key()) 3. Using lockdep subclass. A lockdep class key could have 8 subclasses. The different subclass is considered different locks by lockdep infrastructure. But "lock-classes" is not counted by subclasses. So, it could avoid stopping lockdep infrastructure by an overflow of lockdep class keys. This approach should also have an updating lockdep class key routine. (lockdep_set_subclass()) 4. Using nonvalidate lockdep class key. The lockdep infrastructure supports nonvalidate lockdep class key type. It means this lockdep is not validated by lockdep infrastructure. So, the splat will not happen but lockdep couldn't detect real deadlock case because lockdep really doesn't validate it. I think this should be used for really special cases. (lockdep_set_novalidate_class()) Further discussion here: https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/ There appears to be no negative side-effect to declaring lockless TX for the DSA virtual interfaces, which means they handle their own locking. So that's what we do to make the splat go away. Patch tested in a wide variety of cases: unicast, multicast, PTP, etc. Fixes: `ab92d68fc2` ("net: core: add generic lockdep keys") Suggested-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 15:09:28 -07:00
David S. Miller	13209a8f73	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 13:47:27 -07:00

1 2 3 4 5 ...

961 Commits